Wednesday, May 20, 2009

HotOS 2009, Day Two

Some highlights from Day Two of HotOS 2009...

Michael Kozuch from Intel Research Pittsburgh described an approach to load-balancing computation within a datacenter that involves migrating the running operating system (and the applications running on top of it) from one physical machine to another. One approach is to shut down the OS and reboot it on the new hardware, but Michael is going further by looking at migrating a running OS instance and its device driver state -- even across nodes with different physical hardware. Ballsy.

Don Porter from UT Austin made the claim that operating systems should expose a transactional interface, allowing applications to describe a set of system calls as occurring within a transaction. Although there is a lot of related work in this area Don's point is that the interface should be very simple and general enough to capture essentially any set of system calls within a transaction (rather than being limited to filesystem calls, for example).

Andrew Baumann from ETH Zurich gave perhaps the best and most exciting talk of the workshop (so far) on "Your computer is already a distributed system. Why isn't your OS?" He pointed out that multicore systems already have a wide range of access latencies across processors and caches. Rather than relying on shared memory for communication, why not use asynchronous messaging between cores for everything? The proposed approach is called a multikernel and they are working on a prototype called Barrelfish. One nice aspect of this work is that they are doing a clean-slate design and throwing out support for legacy applications. Right now, the work is very much focused on performance; I'd like to see them look at the reliability and robustness issues that arise when running multiple OS kernels on your machine. (They do make a good argument that it is much easier to reason about a message-passing system than a shared memory system.)

Jeffrey Mogul from HP Labs made the case that we should be using a combination of flash and DRAM (which he calls FLAM) instead of only DRAM for main memory. The idea is to exploit the properties of flash memories in terms of high density and low price (compared to DRAM) to optimize a memory system -- he is not even concerned with the nonvolatile aspect of flash. The idea is to migrate pages between DRAM and flash; I'm not sure why this is so different than having less DRAM and using an SSD as your swap device. One thing you have to worry about is the high latency for flash access and the fact that it wears out over time.

This year we held a (sober) "Big Ideas" session in addition to the traditional (non-sober) "Outrageous Opinions" session. Some Big Ideas:
  • Michael Scott argued that we need to rethink how we teach concurrency to undergraduates, using top-down rather than bottom-up examples.
  • John Wilkes and Kim Keaton proposed that "Quality of Information" is at least as important -- if not more important -- than "Quality of Service" in big systems, and that we need explicit metrics to capture the information quality impact of optimizations in a system.
  • Geoffrey Werner Challen opened up a wide-ranging discussion on the environmental impact of computing technology.
  • Armando Fox argued that e-mail is dead as a communication medium due to the huge volume of spam. He claimed that social networks are far more effective since you cannot even contact someone whom you are not already connected with. Some folks not in the Facebook Generation bristled at this idea, of course. I don't agree that existing social networks are right for this -- for example, most of them do not allow you to maintain separate groups of contacts (such as "friends", "family", or "colleagues").
At the end of the day the beers came out and we had some silly presentations on topics as diverse as the broken conference reviewing system (Dan Wallach), the need for systems to simply predict the future and do that (Steve Hand), and the need for better venues for publishing longer works than just 14-page conference papers (Michael Scott). I made the case that systems conferences should be more like the Ancient Greek Συμπόσιον (Symposium) which was essentially a baudy drinking party. There was a lengthy discussion on ways to improve the conference reviewing process, whether reviews should be made public, and the role of blogs and online forums.


  1. Yes, reviews should be public and, preferably, less than 140 characters in length. :-)

    Btw: in the name of the quiet readers of your blog I would to thank you for these posts from HotOS. Keep them coming! :P

  2. Given that I love meta-discussions, what did Dan claim was broken about the conference reviewing system?

  3. Dan was referring to Ken Birman and Fred Shnieder's recent CACM piece on program committee overload in systems (as well as blogs that both he and I posted on similar topics). Basically, Dan said that the sheer number of reviews that PC members are asked to do is hurting quality and is in turn making it harder for good work to get published. His idea was to have a centralized archive of unpublished manuscripts as a "release valve" but I disagree that this will fix the problem - you still need to get your work published in a proper venue for it to really "count."

  4. I just deleted a comment that was evidently spam that was tagged off of keywords in this post. Don't post spam in the comments on my blog.


Startup Life: Three Months In

I've posted a story to Medium on what it's been like to work at a startup, after years at Google. Check it out here.