Tuesday, October 13, 2009

SOSP 2009, Day Two

It's Day Two here in Big Sky and we have a few fresh inches of snow on the ground. Apparently Mike Freedman hit an elk while driving here from Salt Lake City on Sunday (he is OK, the elk is likely not).

Last night at the banquet and awards ceremony, Eric Brewer won the Mark Weiser Award -- well deserved! Eric represents exactly the kind of person this award was designed for, and just before they made the announcement, I was thinking, "Why hasn't Eric won this yet?" Three papers won the SIGOPS Hall of Fame Award, given to the paper that is at least 10 years old that has had substantial influence on the community. I don't recall the papers (they are not posted on the above page yet) but David Cheriton, Butler Lampson, and Hank Levy were all coauthors of the awarded papers.

A few general comments on the SOSP program, and some trends. The hot topics this year continue to be in the realm of debugging and preventing or managing faults. Roughly half of the papers fall into this broad category. There are essentially no papers on distributed systems, at least not in the large scale. (The Microsoft paper on WER is more about debugging than the distributed systems aspect.) Peer to peer, DHTs, etc. are so 2003. Finally, I am quite surprised that there are no papers on virtual machines or cloud computing. This is one area that seems to be getting a lot of attention lately but it's not represented at SOSP this year. Finally, only one BFT paper (gasp!).

Some highlights from Day Two.

Better I/O Through Byte-Addressable, Persistent Memory

This paper from MSR is about rethinking the design of filesystems around the concept of a (hypothetical) byte-addressible, persistent memory, which the authors call BPRAM. Phase Change Memory is now in large-scale production and could provide this capability with reasonable cost within the next few years. BPRAM is expected to be much more durable than flash and much faster (with access times on the order of nanoseconds). The paper proposes BPFS, a new filesystem design that takes advantage of these properties. The key contribution is an atomicity and durability technique called short-circuit shadow paging, which is similar to shadow paging but has a number of optimizations since BPRAM allows one to partially update blocks in place (e.g., one can update just one pointer in an indirect block without rewriting the whole block). The paper also proposes hardware support for atomicity and durability: a new kind of barrier for caches, and adding capacitors to the DIMM so that pending writes complete even if power is lost. What I liked about this paper is that it was looking out beyond existing commercially-available hardware and considering how BPRAM would impact storage system design.

Work-in-Progress Talks

The WIPs are always a lot of fun. Some highlights:

John Ousterhout from Stanford gave an overview on a new project, called RAMCloud, which proposes to reduce the latency for storage in a datacenter by orders of magnitude by developing storage servers with fast RPC (5-10 usec) and that hold all of the data in RAM, rather than on disk.

My own student Geoffrey Challen gave a talk on Integrated Distributed Energy Awareness in sensor networks. The idea is to provide a distributed service allowing sensor nodes to learn of resource availability on other nodes and make informed, coordinated resource allocation decisions. This is the first step towards a distributed OS for sensor nets, something my group is currently working on.

Benjamin Reed from Yahoo! Research gave a talk on "BFT for the skeptics." He made the case that industry is very good at handling crash failures, and some non-crash failures using simple techniques, like checksums. Essentially, it's not clear that BFT solves a real problem that industry has today and argued that unless BFT solutions are easy to configure and solve a real problem, industry would be slow to roll them out.

Eran Tromer from MIT talked about the problem of architectural side-channels allowing information to leak between VMs running on the same physical machine in a cloud. He proposed a tool, called DynamoREA, that rewrites x86 binaries to eliminate these side-channel attacks; for example, by adding noise or delays.

seL4: Formal Verification of an OS Kernel

This paper won one of the best paper awards. The team from NICTA and UNSW formally verified the L4 microkernel, around 8,700 lines of C code, using a 200kloc handwritten specification and machine-checked proof. This is a tremendous amount of work and is the largest verification effort on a complex piece of systems code that I am aware of. They formally proved that there are ZERO bugs in the L4 microkernel: no null pointer dereferences, no memory leaks, no buffer overflows, no infinite loops, and so forth. To make the problem tractable they had to make a number of assumptions - for example, that the compiler and hardware are correct. This seems reasonable to me although we all know that in reality there are bugs in these components as well. Of course, going through this process also uncovered bugs in the specification itself. Very nice work.

Surviving Sensor Network Software Faults

Phil Levis from Stanford gave this talk on Neutron, a new version of TinyOS that adds support for recovery units within the OS, which can be independently rebooted and reinitialized following a memory fault. The idea is to statically determine self-contained groupings of TinyOS components that can be rebooted without impacting each other. The TinyOS kernel itself comprises its own unit which can be rebooted, allowing application code to resume when the reboot is complete. Recovery units can also mark data as "precious" allowing its state to persist across reboots. The authors show that this can be accomplished with low overhead and it reduces the impact of node reboots on overall network availability.

I actually have my doubts about this approach; I think it will make it harder, not easier, for developers to reason about the effect of memory bugs on their code. The advantage of a hard reboot is that it returns the entire node to a known-good state. Having deployed sensor nets with strange bugs impacting data quality, I don't know whether this technique would actually make my life any easier. Damn cool idea though.

Tomorrow I'll blog about the business meeting (tonight) and whatever talks I make it to tomorrow; I have to skip out on the last session to catch my flight home.


  1. No BFT papers? Surely Upright qualifies, even though it doesn't have BFT in its title.

  2. Yup, need to wait for day 3 for BFTs, and it's not in the title, but nevertheless there

  3. Duh, you're right. Fixed the post.

  4. Here are two of the three Hall of Fame papers:
    1. Experience with Processes and Monitors in Mesa
    2. Leases: an efficient fault-tolerant mechanism for distributed file cache consistency


Startup Life: Three Months In

I've posted a story to Medium on what it's been like to work at a startup, after years at Google. Check it out here.