Skip to main content

HotOS 2009, Day Two

Some highlights from Day Two of HotOS 2009...

Michael Kozuch from Intel Research Pittsburgh described an approach to load-balancing computation within a datacenter that involves migrating the running operating system (and the applications running on top of it) from one physical machine to another. One approach is to shut down the OS and reboot it on the new hardware, but Michael is going further by looking at migrating a running OS instance and its device driver state -- even across nodes with different physical hardware. Ballsy.

Don Porter from UT Austin made the claim that operating systems should expose a transactional interface, allowing applications to describe a set of system calls as occurring within a transaction. Although there is a lot of related work in this area Don's point is that the interface should be very simple and general enough to capture essentially any set of system calls within a transaction (rather than being limited to filesystem calls, for example).

Andrew Baumann from ETH Zurich gave perhaps the best and most exciting talk of the workshop (so far) on "Your computer is already a distributed system. Why isn't your OS?" He pointed out that multicore systems already have a wide range of access latencies across processors and caches. Rather than relying on shared memory for communication, why not use asynchronous messaging between cores for everything? The proposed approach is called a multikernel and they are working on a prototype called Barrelfish. One nice aspect of this work is that they are doing a clean-slate design and throwing out support for legacy applications. Right now, the work is very much focused on performance; I'd like to see them look at the reliability and robustness issues that arise when running multiple OS kernels on your machine. (They do make a good argument that it is much easier to reason about a message-passing system than a shared memory system.)

Jeffrey Mogul from HP Labs made the case that we should be using a combination of flash and DRAM (which he calls FLAM) instead of only DRAM for main memory. The idea is to exploit the properties of flash memories in terms of high density and low price (compared to DRAM) to optimize a memory system -- he is not even concerned with the nonvolatile aspect of flash. The idea is to migrate pages between DRAM and flash; I'm not sure why this is so different than having less DRAM and using an SSD as your swap device. One thing you have to worry about is the high latency for flash access and the fact that it wears out over time.

This year we held a (sober) "Big Ideas" session in addition to the traditional (non-sober) "Outrageous Opinions" session. Some Big Ideas:
  • Michael Scott argued that we need to rethink how we teach concurrency to undergraduates, using top-down rather than bottom-up examples.
  • John Wilkes and Kim Keaton proposed that "Quality of Information" is at least as important -- if not more important -- than "Quality of Service" in big systems, and that we need explicit metrics to capture the information quality impact of optimizations in a system.
  • Geoffrey Werner Challen opened up a wide-ranging discussion on the environmental impact of computing technology.
  • Armando Fox argued that e-mail is dead as a communication medium due to the huge volume of spam. He claimed that social networks are far more effective since you cannot even contact someone whom you are not already connected with. Some folks not in the Facebook Generation bristled at this idea, of course. I don't agree that existing social networks are right for this -- for example, most of them do not allow you to maintain separate groups of contacts (such as "friends", "family", or "colleagues").
At the end of the day the beers came out and we had some silly presentations on topics as diverse as the broken conference reviewing system (Dan Wallach), the need for systems to simply predict the future and do that (Steve Hand), and the need for better venues for publishing longer works than just 14-page conference papers (Michael Scott). I made the case that systems conferences should be more like the Ancient Greek Συμπόσιον (Symposium) which was essentially a baudy drinking party. There was a lengthy discussion on ways to improve the conference reviewing process, whether reviews should be made public, and the role of blogs and online forums.


  1. Yes, reviews should be public and, preferably, less than 140 characters in length. :-)

    Btw: in the name of the quiet readers of your blog I would to thank you for these posts from HotOS. Keep them coming! :P

  2. Given that I love meta-discussions, what did Dan claim was broken about the conference reviewing system?

  3. Dan was referring to Ken Birman and Fred Shnieder's recent CACM piece on program committee overload in systems (as well as blogs that both he and I posted on similar topics). Basically, Dan said that the sheer number of reviews that PC members are asked to do is hurting quality and is in turn making it harder for good work to get published. His idea was to have a centralized archive of unpublished manuscripts as a "release valve" but I disagree that this will fix the problem - you still need to get your work published in a proper venue for it to really "count."

  4. I just deleted a comment that was evidently spam that was tagged off of keywords in this post. Don't post spam in the comments on my blog.


Post a Comment

Popular posts from this blog

Why I'm leaving Harvard

The word is out that I have decided to resign my tenured faculty job at Harvard to remain at Google. Obviously this will be a big change in my career, and one that I have spent a tremendous amount of time mulling over the last few months.

Rather than let rumors spread about the reasons for my move, I think I should be pretty direct in explaining my thinking here.

I should say first of all that I'm not leaving because of any problems with Harvard. On the contrary, I love Harvard, and will miss it a lot. The computer science faculty are absolutely top-notch, and the students are the best a professor could ever hope to work with. It is a fantastic environment, very supportive, and full of great people. They were crazy enough to give me tenure, and I feel no small pang of guilt for leaving now. I joined Harvard because it offered the opportunity to make a big impact on a great department at an important school, and I have no regrets about my decision to go there eight years ago. But m…

Rewriting a large production system in Go

My team at Google is wrapping up an effort to rewrite a large production system (almost) entirely in Go. I say "almost" because one component of the system -- a library for transcoding between image formats -- works perfectly well in C++, so we decided to leave it as-is. But the rest of the system is 100% Go, not just wrappers to existing modules in C++ or another language. It's been a fun experience and I thought I'd share some lessons learned.

Why rewrite?

The first question we must answer is why we considered a rewrite in the first place. When we started this project, we adopted an existing C++ based system, which had been developed over the course of a couple of years by two of our sister teams at Google. It's a good system and does its job remarkably well. However, it has been used in several different projects with vastly different goals, leading to a nontrivial accretion of cruft. Over time, it became apparent that for us to continue to innovate rapidly wo…

Running a software team at Google

I'm often asked what my job is like at Google since I left academia. I guess going from tenured professor to software engineer sounds like a big step down. Job titles aside, I'm much happier and more productive in my new role than I was in the 8 years at Harvard, though there are actually a lot of similarities between being a professor and running a software team.

I lead a team at Google's Seattle office which is responsible for a range of projects in the mobile web performance area (for more background on my team's work see my earlier blog post on the topic). One of our projects is the recently-announced data compression proxy support in Chrome Mobile. We also work on the PageSpeed suite of technologies, specifically focusing on mobile web optimization, as well as a bunch of other cool stuff that I can't talk about just yet.

My official job title is just "software engineer," which is the most common (and coveted) role at Google. (I say "coveted&quo…