Skip to main content

SOSP 2009, Day Two

It's Day Two here in Big Sky and we have a few fresh inches of snow on the ground. Apparently Mike Freedman hit an elk while driving here from Salt Lake City on Sunday (he is OK, the elk is likely not).

Last night at the banquet and awards ceremony, Eric Brewer won the Mark Weiser Award -- well deserved! Eric represents exactly the kind of person this award was designed for, and just before they made the announcement, I was thinking, "Why hasn't Eric won this yet?" Three papers won the SIGOPS Hall of Fame Award, given to the paper that is at least 10 years old that has had substantial influence on the community. I don't recall the papers (they are not posted on the above page yet) but David Cheriton, Butler Lampson, and Hank Levy were all coauthors of the awarded papers.

A few general comments on the SOSP program, and some trends. The hot topics this year continue to be in the realm of debugging and preventing or managing faults. Roughly half of the papers fall into this broad category. There are essentially no papers on distributed systems, at least not in the large scale. (The Microsoft paper on WER is more about debugging than the distributed systems aspect.) Peer to peer, DHTs, etc. are so 2003. Finally, I am quite surprised that there are no papers on virtual machines or cloud computing. This is one area that seems to be getting a lot of attention lately but it's not represented at SOSP this year. Finally, only one BFT paper (gasp!).

Some highlights from Day Two.

Better I/O Through Byte-Addressable, Persistent Memory

This paper from MSR is about rethinking the design of filesystems around the concept of a (hypothetical) byte-addressible, persistent memory, which the authors call BPRAM. Phase Change Memory is now in large-scale production and could provide this capability with reasonable cost within the next few years. BPRAM is expected to be much more durable than flash and much faster (with access times on the order of nanoseconds). The paper proposes BPFS, a new filesystem design that takes advantage of these properties. The key contribution is an atomicity and durability technique called short-circuit shadow paging, which is similar to shadow paging but has a number of optimizations since BPRAM allows one to partially update blocks in place (e.g., one can update just one pointer in an indirect block without rewriting the whole block). The paper also proposes hardware support for atomicity and durability: a new kind of barrier for caches, and adding capacitors to the DIMM so that pending writes complete even if power is lost. What I liked about this paper is that it was looking out beyond existing commercially-available hardware and considering how BPRAM would impact storage system design.

Work-in-Progress Talks

The WIPs are always a lot of fun. Some highlights:

John Ousterhout from Stanford gave an overview on a new project, called RAMCloud, which proposes to reduce the latency for storage in a datacenter by orders of magnitude by developing storage servers with fast RPC (5-10 usec) and that hold all of the data in RAM, rather than on disk.

My own student Geoffrey Challen gave a talk on Integrated Distributed Energy Awareness in sensor networks. The idea is to provide a distributed service allowing sensor nodes to learn of resource availability on other nodes and make informed, coordinated resource allocation decisions. This is the first step towards a distributed OS for sensor nets, something my group is currently working on.

Benjamin Reed from Yahoo! Research gave a talk on "BFT for the skeptics." He made the case that industry is very good at handling crash failures, and some non-crash failures using simple techniques, like checksums. Essentially, it's not clear that BFT solves a real problem that industry has today and argued that unless BFT solutions are easy to configure and solve a real problem, industry would be slow to roll them out.

Eran Tromer from MIT talked about the problem of architectural side-channels allowing information to leak between VMs running on the same physical machine in a cloud. He proposed a tool, called DynamoREA, that rewrites x86 binaries to eliminate these side-channel attacks; for example, by adding noise or delays.

seL4: Formal Verification of an OS Kernel

This paper won one of the best paper awards. The team from NICTA and UNSW formally verified the L4 microkernel, around 8,700 lines of C code, using a 200kloc handwritten specification and machine-checked proof. This is a tremendous amount of work and is the largest verification effort on a complex piece of systems code that I am aware of. They formally proved that there are ZERO bugs in the L4 microkernel: no null pointer dereferences, no memory leaks, no buffer overflows, no infinite loops, and so forth. To make the problem tractable they had to make a number of assumptions - for example, that the compiler and hardware are correct. This seems reasonable to me although we all know that in reality there are bugs in these components as well. Of course, going through this process also uncovered bugs in the specification itself. Very nice work.

Surviving Sensor Network Software Faults

Phil Levis from Stanford gave this talk on Neutron, a new version of TinyOS that adds support for recovery units within the OS, which can be independently rebooted and reinitialized following a memory fault. The idea is to statically determine self-contained groupings of TinyOS components that can be rebooted without impacting each other. The TinyOS kernel itself comprises its own unit which can be rebooted, allowing application code to resume when the reboot is complete. Recovery units can also mark data as "precious" allowing its state to persist across reboots. The authors show that this can be accomplished with low overhead and it reduces the impact of node reboots on overall network availability.

I actually have my doubts about this approach; I think it will make it harder, not easier, for developers to reason about the effect of memory bugs on their code. The advantage of a hard reboot is that it returns the entire node to a known-good state. Having deployed sensor nets with strange bugs impacting data quality, I don't know whether this technique would actually make my life any easier. Damn cool idea though.

Tomorrow I'll blog about the business meeting (tonight) and whatever talks I make it to tomorrow; I have to skip out on the last session to catch my flight home.


  1. No BFT papers? Surely Upright qualifies, even though it doesn't have BFT in its title.

  2. Yup, need to wait for day 3 for BFTs, and it's not in the title, but nevertheless there

  3. Duh, you're right. Fixed the post.

  4. Here are two of the three Hall of Fame papers:
    1. Experience with Processes and Monitors in Mesa
    2. Leases: an efficient fault-tolerant mechanism for distributed file cache consistency


Post a Comment

Popular posts from this blog

Why I'm leaving Harvard

The word is out that I have decided to resign my tenured faculty job at Harvard to remain at Google. Obviously this will be a big change in my career, and one that I have spent a tremendous amount of time mulling over the last few months.

Rather than let rumors spread about the reasons for my move, I think I should be pretty direct in explaining my thinking here.

I should say first of all that I'm not leaving because of any problems with Harvard. On the contrary, I love Harvard, and will miss it a lot. The computer science faculty are absolutely top-notch, and the students are the best a professor could ever hope to work with. It is a fantastic environment, very supportive, and full of great people. They were crazy enough to give me tenure, and I feel no small pang of guilt for leaving now. I joined Harvard because it offered the opportunity to make a big impact on a great department at an important school, and I have no regrets about my decision to go there eight years ago. But m…

Rewriting a large production system in Go

My team at Google is wrapping up an effort to rewrite a large production system (almost) entirely in Go. I say "almost" because one component of the system -- a library for transcoding between image formats -- works perfectly well in C++, so we decided to leave it as-is. But the rest of the system is 100% Go, not just wrappers to existing modules in C++ or another language. It's been a fun experience and I thought I'd share some lessons learned.

Why rewrite?

The first question we must answer is why we considered a rewrite in the first place. When we started this project, we adopted an existing C++ based system, which had been developed over the course of a couple of years by two of our sister teams at Google. It's a good system and does its job remarkably well. However, it has been used in several different projects with vastly different goals, leading to a nontrivial accretion of cruft. Over time, it became apparent that for us to continue to innovate rapidly wo…

Running a software team at Google

I'm often asked what my job is like at Google since I left academia. I guess going from tenured professor to software engineer sounds like a big step down. Job titles aside, I'm much happier and more productive in my new role than I was in the 8 years at Harvard, though there are actually a lot of similarities between being a professor and running a software team.

I lead a team at Google's Seattle office which is responsible for a range of projects in the mobile web performance area (for more background on my team's work see my earlier blog post on the topic). One of our projects is the recently-announced data compression proxy support in Chrome Mobile. We also work on the PageSpeed suite of technologies, specifically focusing on mobile web optimization, as well as a bunch of other cool stuff that I can't talk about just yet.

My official job title is just "software engineer," which is the most common (and coveted) role at Google. (I say "coveted&quo…