Skip to main content

SOSP 2009, Day One

I'm in Big Sky, Montana for SOSP 2009 -- widely considered the premier conference on systems. The location is stunning although it's bitterly cold here; I was not ready for this kind of weather for a couple more months. This seems to be the biggest SOSP ever with more than 500 people registered; the last SOSP had a mere 471 attendees. I am not sure what this means. The number of paper submissions has not been going up dramatically, so it's hard to tell if this represents an increasing interest in systems as a field. This year there are a number of co-located workshops, and perhaps with the weak economy people are concentrating their conference travel to focus on fewer high-impact events.

Best paper awards went to three papers: FAWN, RouteBricks, and seL4. These papers all represented a substantial amount of work and have a remarkable number of authors; they are not the kind of papers that were dashed off in two weeks by one or two grad students!

Below are some of the highlights from Day One.

Barbara Liskov - ACM Turing Award Lecture, "The Power of Abstraction"

Barbara used her Turing Lecture as an opportunity to survey the development of Abstract Data Types, CLU, and type hierarchy. It is remarkable how much impact these ideas have had on modern programming languages and practice; Java, C#, Python, and many other languages are the direct descendants of this work. She spend a considerable amount of time talking about the context of programming practice before ADTs were developed, and the issues that computer scientists were concerned about (program structure, control flow, and global state). She surveyed a lot of classic papers -- Dijstra, Wirth, Parnas, and others -- that led to the development of ADTs.

After talking about the mechanisms in CLU, Barbara spent some time reflecting on the current state of programming practice. According to her, modern languages (like Java and C#), but the state of programming is "lousy" -- many untrained programmers developing Internet and browser-based systems rife with things like global state. She wrapped up the lecture by pointing the way to some future opportunities in large multicore systems and reasoning about the semantics of programs running across the Internet.

FAWN - A Fast Array of Wimpy Nodes

The premise of this paper is that it is possible to reduce the power consumption of data centers by an order of magnitude by leveraging much more power-efficient hardware. The idea is to focus on
queries per joule as the metric, rather than simply peak performance. They propose a cluster of inexpensive, power-efficient nodes (based on the AMD Geode platform with a small amount of memory and Compact Flash rather than hard disks) and develop a key-value store application. The system has to address the limited memory on the nodes, the need to minimize random writes to the flash (using a log-structured data store), and the need to add or remove nodes from the cluster (using consistent hashing for lookups and migrating data in background tasks). Among other things, the paper shows that the FAWN prototype can sustain around 350 queries per second per Watt, compared to around 50 queries/sec/W for conventional hardware. They also do a nice analysis of total cost of ownership and talk about the regime (in terms of load and data set size) where FAWN makes the most sense.

The Multikernel: A New OS Architecture for Scalable Multicore Systems

This paper is about revamping the design of OS kernels to account for massively multicore systems with heterogeneous hardware (CPUs, GPUs, programmable NICs, and FPGAs). The key idea is to shift away from a single kernel with shared memory and locks on shared data structures over to running a separate kernel on each core, using message passing as the only coordination primitive between cores. This approach allows one to design interprocessor communication protocols that leverage the topology of the underlying interconnect, which can vary substantially between platforms. In the MultiKernel design, all shared state is treated as replicated across cores and mechanisms for explicit replica management (which are typical in distributed systems) become necessary. The authors have implemented a prototype of this design, called
BarrelFish, and demonstrated its scalability as the number of cores is increased. This is a really nice piece of systems work and is a classic example of "from scratch" OS design to leverage new hardware trends.

Both the MultiKernel and FAWN papers were presented in an earlier form at HotOS 2009, which I have blogged about earlier.

Debugging in the (Very) Large: Ten Years of Implementation and Experience

This paper from Microsoft deals with Windows Error Reporting, which is the largest client-server system in the world by number of installs. WER receives 100 million error reports a day, across 17 million different programs, across a billion or so installed clients. It's been in operation for 10 years and has led to thousands of bugs being fixed, and has fundamentally changed how Microsoft approaches software development. Galen Hunt from MSR gave a great talk on this system.

The system automatically collects "minidumps" from clients when errors occur, which are then classified into buckets by the back-end servers using a variety of statistical analyses. The bucketing strategy takes into account the program name, version, and symbol in which the program crashed. Doing statistical analysis on so many error reports allows them to narrow in on the bugs that affect the most users and therefore concentrate bug fixing efforts. Galen talked about some of the successes of this system over the years, including identifying 5-year-old heisenbugs in the Windows kernel; detection of widespread malware attacks; and even finding bugs in hardware (such as a broken USB controller that did not implement DMA correctly). This is a great paper on a large, real-world system.

I'll blog about the BOFs, banquet, and award ceremony tomorrow.


  1. How different did the FAWN and MultiKernel papers need to be to appear in both conferences. I'm not criticizing, just curious as this is an issue in my field for workshop to conference papers (why I no longer submit ANYTHING to a workshop) and I'm curious about conference to conference.

  2. Well, the HotOS papers were technically "position papers" without a lot of technical detail or evaluation, but it's true that there is a lot of overlap between them. SIGOPS' policy towards HotOS is that it is generally acceptable for a HotOS paper to represent an early form of a later SOSP/OSDI/etc. submission. Indeed, by the time HotOS was happening, the SOSP papers would have already been written and submitted.

    I happen to be the program chair for HotOS in 2011 and one thing I plan to look at is this issue!

  3. This is a great report for those unable to attend.
    I plan on visiting regularly the next couple of days.

    Thank you and please keep up the work!

  4. Thanks for your post! Looking forward to read more.

  5. Very nice - easy to follow, simple, and working. Thanks for the knowledge!
    More templates easy to download


Post a Comment

Popular posts from this blog

Why I'm leaving Harvard

The word is out that I have decided to resign my tenured faculty job at Harvard to remain at Google. Obviously this will be a big change in my career, and one that I have spent a tremendous amount of time mulling over the last few months.

Rather than let rumors spread about the reasons for my move, I think I should be pretty direct in explaining my thinking here.

I should say first of all that I'm not leaving because of any problems with Harvard. On the contrary, I love Harvard, and will miss it a lot. The computer science faculty are absolutely top-notch, and the students are the best a professor could ever hope to work with. It is a fantastic environment, very supportive, and full of great people. They were crazy enough to give me tenure, and I feel no small pang of guilt for leaving now. I joined Harvard because it offered the opportunity to make a big impact on a great department at an important school, and I have no regrets about my decision to go there eight years ago. But m…

Rewriting a large production system in Go

My team at Google is wrapping up an effort to rewrite a large production system (almost) entirely in Go. I say "almost" because one component of the system -- a library for transcoding between image formats -- works perfectly well in C++, so we decided to leave it as-is. But the rest of the system is 100% Go, not just wrappers to existing modules in C++ or another language. It's been a fun experience and I thought I'd share some lessons learned.

Why rewrite?

The first question we must answer is why we considered a rewrite in the first place. When we started this project, we adopted an existing C++ based system, which had been developed over the course of a couple of years by two of our sister teams at Google. It's a good system and does its job remarkably well. However, it has been used in several different projects with vastly different goals, leading to a nontrivial accretion of cruft. Over time, it became apparent that for us to continue to innovate rapidly wo…

Running a software team at Google

I'm often asked what my job is like at Google since I left academia. I guess going from tenured professor to software engineer sounds like a big step down. Job titles aside, I'm much happier and more productive in my new role than I was in the 8 years at Harvard, though there are actually a lot of similarities between being a professor and running a software team.

I lead a team at Google's Seattle office which is responsible for a range of projects in the mobile web performance area (for more background on my team's work see my earlier blog post on the topic). One of our projects is the recently-announced data compression proxy support in Chrome Mobile. We also work on the PageSpeed suite of technologies, specifically focusing on mobile web optimization, as well as a bunch of other cool stuff that I can't talk about just yet.

My official job title is just "software engineer," which is the most common (and coveted) role at Google. (I say "coveted&quo…