Our approach effectively eliminates high communication and partitioning overheads, and couples it with a new pipelined push-pull parallelism based execution strategy for fast model training. Amy Tai, VMware Research; Igor Smolyar, Technion Israel Institute of Technology; Michael Wei, VMware Research; Dan Tsafrir, Technion Israel Institute of Technology and VMware Research. Papers accompanied by nondisclosure agreement forms will not be considered. See the Preview Session page for an overview of the topics covered in the program. We focus on NVMe storage devices and show that it is natural to express these semantics in the kernel and the application and only requires a modest two-bit change to the device interface. SC is being increasingly adopted by industry for a variety of applications. PET then automatically corrects results to restore full equivalence. However, memory allocation decisions also impact overall application performance via data placement, offering opportunities to improve fleetwide productivity by completing more units of application work using fewer hardware resources. We describe Fluffy, a multi-transaction differential fuzzer for finding consensus bugs in Ethereum. Based on the observation that real-world workloads always feature skewed access patterns, Nap introduces a NUMA-aware layer (NAL) on the top of existing concurrent PM indexes, and steers accesses to hot items to this layer. We present case studies and end-to-end applications that show how Storm lets developers specify diverse policies while centralizing the trusted code to under 1% of the application, and statically enforces security with modest type annotation overhead, and no run-time cost. Radia Perlman is a Fellow at Dell Technologies. In this talk, I'll speculate on how we came to this unfortunate state of affairs, and what might be done to fix it. We particularly encourage contributions containing highly original ideas, new approaches, and/or groundbreaking results. In experiments with real DL jobs and with trace-driven simulations, Pollux reduces average job completion times by 37-50% relative to state-of-the-art DL schedulers, even when they are provided with ideal resource and training configurations for every job. OSDI 2021 papers summary. Author Response Period The key insight guiding our design is computation separation. GoJournals goal is to bring the advantages of journaling for code to specs and proofs. Advisor: You have a past or present association as thesis advisor or advisee. We conclude with a discussion of additional techniques for improving the allocator development process and potential optimization strategies for future memory allocators. We demonstrate that KEVIN reduces the amount of I/O traffic between the host and the device, and remains particularly robust as the system ages and the data become fragmented. Tao Luo, Mingen Pan, Pierre Tholoniat, Asaf Cidon, and Roxana Geambasu, Columbia University; Mathias Lcuyer, Microsoft Research. Our evaluation shows that, compared to existing participant selection mechanisms, Oort improves time-to-accuracy performance by 1.2X-14.1X and final model accuracy by 1.3%-9.8%, while efficiently enforcing developer-specified model testing criteria at the scale of millions of clients. Authors are also encouraged to contact the program co-chairs, osdi21chairs@usenix.org, if needed to relate their OSDI submissions to relevant submissions of their own that are simultaneously under review or awaiting publication at other venues. As a result, data characteristics and device capabilities vary widely across clients. We demonstrate the above using design, implementation and evaluation of blk-switch, a new Linux kernel storage stack architecture. The file system performance of the proposed ZNS+ storage system was 1.33--2.91 times better than that of the normal ZNS-based storage system. Foreshadow was chosen as an IEEE Micro Top Pick. The OSDI Symposium emphasizes innovative research as well as quantified or insightful experiences in systems design and implementation. Submission of a response is optional. Most existing schedulers expect users to specify the number of resources for each job, often leading to inefficient resource use. To enable FL developers to interpret their results in model testing, Oort enforces their requirements on the distribution of participant data while improving the duration of federated testing by cherry-picking clients. To help more profitably utilize sanitizers, we introduce SanRazor, a practical tool aiming to effectively detect and remove redundant sanitizer checks. Professor Veloso has been recognized with a multiple honors, including being a Fellow of the ACM, IEEE, AAAS, and AAAI. Sponsored by USENIX in cooperation with ACM SIGOPS. For general conference information, see https://www.usenix.org/conference/osdi22. HotCRP.com signin Sign in using your HotCRP.com account. blk-switch uses this insight to adapt techniques from the computer networking literature (e.g., multiple egress queues, prioritized processing of individual requests, load balancing, and switch scheduling) to the Linux kernel storage stack. Message from the Program Co-Chairs. We first introduce two new hardware primitives: 1) Guarded Page Table (GPT), which protects page table pages to support page-level secure memory isolation; 2) Mountable Merkle Tree (MMT), which supports scalable integrity protection for secure memory. Our evaluation on the SPEC benchmarks shows that SanRazor can reduce the overhead of sanitizers significantly, from 73.8% to 28.062.0% for AddressSanitizer, and from 160.1% to 36.6124.4% for UndefinedBehaviorSanitizer (depending on the applied reduction scheme). One important reason for the high cost is, as we observe in this paper, that many sanitizer checks are redundant the same safety property is repeatedly checked leading to unnecessarily wasted computing resources. Using selective profiling, we build DMon, a system that can automatically locate data locality problems in production, identify access patterns that hurt locality, and repair such patterns using targeted optimizations. Cores can safely and concurrently read from their local kernel replica, eliminating remote NUMA accesses. However, your OSDI submission must use an anonymized name for your project or system that differs from any used in such contexts. The NVMe zoned namespace (ZNS) is emerging as a new storage interface, where the logical address space is divided into fixed-sized zones, and each zone must be written sequentially for flash-memory-friendly access. Proceedings Cover | The program co-chairs will use this information at their discretion to preserve the anonymity of the review process without jeopardizing the outcome of the current OSDI submission. Four months after we reported the bugs to Geth developers, one of the bugs was triggered on the mainnet, and caused nodes using a stale version of Geth to hard fork the Ethereum blockchain. This fast path contains programmable hardware support for low latency transport and congestion control as well as hardware support for efficient load balancing of RPCs to cores. The key insight in blk-switch is that Linux's multi-queue storage design, along with multi-queue network and storage hardware, makes the storage stack conceptually similar to a network switch. Last year, 70% of accepted OSDI papers participated in the . PLDI seeks outstanding research that extends and/or applies programming-language concepts to advance the field of computing. 64 papers accepted out of 341 submitted. Writing a correct operating system kernel is notoriously hard. The 15th USENIX Symposium on Operating Systems Design and Implementation seeks to present innovative, exciting research in computer systems. Forgot your password? The novel aspect of the nanoPU is the design of a fast path between the network and applications---bypassing the cache and memory hierarchy, and placing arriving messages directly into the CPU register file. In this paper, we propose Oort to improve the performance of federated training and testing with guided participant selection. AI enables principled representation of knowledge, complex strategy optimization, learning from data, and support to human decision making. Pages should be numbered, and figures and tables should be legible in black and white, without requiring magnification. Manuela M. Veloso is the Head of J.P. Morgan AI Research, which pursues fundamental research in areas of core relevance to financial services, including data mining and cryptography, machine learning, explainability, and human-AI interaction. As increasingly more sensitive data is being collected to gain valuable insights, the need to natively integrate privacy controls in data analytics frameworks is growing in importance. Password Fan Lai, Xiangfeng Zhu, Harsha V. Madhyastha, and Mosharaf Chowdhury, University of Michigan. Extensive experiments show that GNNAdvisor outperforms the state-of-the-art GNN computing frameworks, such as Deep Graph Library (3.02 faster on average) and NeuGraph (up to 4.10 faster), on mainstream GNN architectures across various datasets. PC members are not required to read supplementary material when reviewing the paper, so each paper should stand alone without it. Using this property, MAGE calculates the memory access pattern ahead of time and uses it to produce a memory management plan. KEVIN combines a fast, lightweight, and POSIX compliant file system with a key-value storage device that performs in-storage indexing. Pollux promotes fairness among DL jobs competing for resources based on a more meaningful measure of useful job progress, and reveals a new opportunity for reducing DL cost in cloud environments. A graph embedding is a fixed length vector representation for each node (and/or edge-type) in a graph and has emerged as the de-facto approach to apply modern machine learning on graphs. PET discovers and applies program transformations that improve computation efficiency but only maintain partial functional equivalence. USENIX discourages program co-chairs from submitting papers to the conferences they organize, although they are allowed to do so. Session Chairs: Ryan Huang, Johns Hopkins University, and Manos Kapritsos, University of Michigan, Jianan Yao, Runzhou Tao, Ronghui Gu, Jason Nieh, Suman Jana, and Gabriel Ryan, Columbia University. Session Chairs: Gennady Pekhimenko, University of Toronto / Vector Institute, and Shivaram Venkataraman, University of WisconsinMadison, Aurick Qiao, Petuum, Inc. and Carnegie Mellon University; Sang Keun Choe and Suhas Jayaram Subramanya, Carnegie Mellon University; Willie Neiswanger, Petuum, Inc. and Carnegie Mellon University; Qirong Ho, Petuum, Inc.; Hao Zhang, Petuum, Inc. and UC Berkeley; Gregory R. Ganger, Carnegie Mellon University; Eric P. Xing, MBZUAI, Petuum, Inc., and Carnegie Mellon University. The wire-to-wire RPC response time through the nanoPU is just 69ns, an order of magnitude quicker than the best-of-breed, low latency, commercial NICs. For example, talks may be shorter than in prior years, or some parts of the conference may be multi-tracked. She developed the technology for making network routing self-stabilizing, largely self-managing, and scalable. Weak Links in Authentication Chains: A Large-scale Analysis of Email Sender Spoofing Attacks Marius is open-sourced at www.marius-project.org. However, the existing one-size-fits-all GNN implementations are insufficient to catch up with the evolving GNN architectures, the ever-increasing graph size, and the diverse node embedding dimensionality. Petuum Awarded OSDI 2021 Best Paper for Goodput-Optimized Deep Learning Research Petuum CASL research and engineering team's Pollux technical paper on adaptive scheduling for optimized. Despite having the same end goals as traditional ML, FL executions differ significantly in scale, spanning thousands to millions of participating devices. Swapnil Gandhi and Anand Padmanabha Iyer, Microsoft Research. The papers will be available online to everyone beginning on the first day of the conference, July 14, 2021. Taking place in Carlsbad, CA from 11-13 July, OSDI is a highly selective flagship conference in computer science, especially on the topic of computer systems. This paper presents Dorylus: a distributed system for training GNNs. We have made Fluffy publicly available at https://github.com/snuspl/fluffy to contribute to the security of Ethereum. This yielded 6% fewer TLB miss stalls, and 26% reduction in memory wasted due to fragmentation. Sep 2021 - Present 1 year 7 months. Graph Neural Networks (GNNs) have gained significant attention in the recent past, and become one of the fastest growing subareas in deep learning. This motivates the need for a new approach to data privacy that can provide strong assurance and control to users. USENIX new Date().getFullYear()>document.write(new Date().getFullYear()); Grants for Black Computer Science Students Application, Propose an interesting, compelling solution, Demonstrate the practicality and benefits of the solution, Clearly describe the paper's contributions, Clearly articulate the advances beyond previous work. Authors of each accepted paper must ensure that at least one author registers for the conference, and that their paper is presented in-person at the conference. will work with the steering committee to ensure that the symposium program will accommodate presentations for all accepted papers. Authors must make a good faith effort to anonymize their submissions, and they should not identify themselves or their institutions either explicitly or by implication (e.g., through the references or acknowledgments). Differential privacy (DP) enables model training with a guaranteed bound on this leakage. Jiang Zhang, University of Southern California; Shuai Wang, HKUST; Manuel Rigger, Pinjia He, and Zhendong Su, ETH Zurich. Perennial 2.0 makes this possible by introducing several techniques to formalize GoJournals specification and to manage the complexity in the proof of GoJournals implementation. Fluffy found two new consensus bugs in the most popular Geth Ethereum client which were exploitable on the live Ethereum mainnet. Prepublication versions of the accepted papers from the summer submission deadline are available below. Notification of conditional accept/reject for revisions: 3 March 2022. The 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI '21) will take place as a virtual event on July 14-16, 2021. These scripts often make pages slow to load, partly due to a fundamental inefficiency in how browsers process JavaScript content: browsers make it easy for web developers to reason about page state by serially executing all scripts on any frame in a page, but as a result, fail to leverage the multiple CPU cores that are readily available even on low-end phones. Novel system designs, thorough empirical work, well-motivated theoretical results, and new application areas are all . The main contribution of this paper is GoJournal, a verified, concurrent journaling system that provides atomicity for storage applications, together with Perennial 2.0, a framework for formally specifying and verifying concurrent crash-safe systems. In this paper, we show how to address this inefficiency without requiring pages to be rewritten or browsers to be modified. The experimental results show that Penglai can support 1,000s enclave instances running concurrently and scale up to 512GB secure memory with both encryption and integrity protection. Our further evaluation on 38 CVEs from 10 commonly-used programs shows that SanRazor reduced checks suffice to detect at least 33 out of the 38 CVEs. Indeed, it is a prime target for powerful adversaries such as nation states. If in doubt about whether your submission to OSDI 2021 and your upcoming submission to SOSP are the same paper or not, please contact the PC chairs by email. Compared to existing baselines, DPF allows training more models under the same global privacy guarantee. Even the little publishable OS work that is not based on Linux still assumes the same simplistic hardware model (essentially a multiprocessor VAX) that bears little resemblance to modern reality. Hence, kernel developers are constantly refining synchronization within OS kernels to improve scalability at the risk of introducing subtle bugs. Secure Computation (SC) is a family of cryptographic primitives for computing on encrypted data in single-party and multi-party settings. First, Fluffy mutates and executes multi-transaction test cases to find consensus bugs which cannot be found using existing fuzzers for Ethereum. A hardware-accelerated thread scheduler makes sub-nanosecond decisions, leading to high CPU utilization and low tail response time for RPCs. In addition, CLP outperforms Elasticsearch and Splunk Enterprise's log ingestion performance by over 13x, and we show CLP scales to petabytes of logs.