Monday, May 30, 2011

Reflections on Fast, User-Level Networking

A couple of weeks ago at HotOS, one of the most controversial papers (from Stanford) was entitled "It's Time for Low Latency." The basic premise of the paper is that clusters are stuck using expensive, high-latency network interfaces (generally TCP/IP over some flavor of Ethernet), but it should now be possible to achieve sub-10-microsecond round-trip-times for RPCs. Of course, a tremendous amount of research looked at low-latency, high-bandwidth cluster networking in the mid-1990's, including Active Messages, the Virtual Interface Architecture, and U-Net (which I was involved with as an undergrad at Cornell). A bunch of commercial products were available in this space, including Myrinet (still the best, IMHO) and InfiniBand.

Not much of this work has really taken off in commercial datacenters. John Ousterhout and Steve Rumble argue that this is because the commercial need for low latency networking hasn't been there until now. Indeed, when we were working on this in the 90's, the applications we envisioned were primarily numerical and scientific computing: big matrix multiplies, that kind of thing.

When Inktomi and Google started demonstrating Web search as the "killer app" for clusters, they managed to get away with relatively high-latency, but cheap, Ethernet-based solutions. For these applications, the cluster interconnect was not the bottleneck. Rumble's paper argues that emerging cloud applications are motivating the need for fast intermachine RPC. I'm not entirely convinced of this, but John and Steve and I had a few good conversations about this at HotOS and I've been reflecting on the lessons learned from the "fast interconnect" heyday of the 90's...

Microbenchmarks are evil: There is a risk in focusing on microbenchmarks when working on cluster networking. The standard "ping-pong" latency measurement and bulk transfer throughput measurements rarely reflect the kind of traffic patterns seen in real workloads. Getting something to work on two unloaded machines connected back-to-back says little about whether it will work at a large scale with a complex traffic mix and unexpected load. You often find that real world performance comes nowhere near the ideal two-machine case. For that matter, even "macrobenchmarks" like the infamous NOW-Sort work be misleading, especially when measurements are taken under ideal conditions. Obtaining robust performance under uncertain conditions seems a lot more important than optimizing for the best case that you will never see in real life.

Usability matters:  I'm convinced that one of the reasons that U-Net, Active Messages, and VIA failed to take off is that they were notoriously hard to program to. Some systems, like Fast Sockets, layer a conventional sockets API on top, but often suffered large performance losses as a result, in part because the interface couldn't be tailored for specific traffic patterns. And even "sockets-like" layers often did not work exactly like sockets, being different enough that you couldn't just recompile your application to use them. A common example is not being entirely threadsafe, or not working with mechanisms such as select() and poll(). When you are running a large software stack that depends on sockets, it is not easy to rip out the guts with something that is not fully backwards compatible.

Commodity beats fast: If history has proven anything, there's only so much that systems designers are willing to pay -- in terms of complexity or cost -- for performance. The vast majority of real-world systems are based on some flavor of the UNIX process model, BSD filesystem, relational database, and TCP/IP over Ethernet. These technologies are all commodity and can be found in many (mostly compatible) variants, both commercial and open source; few companies are willing to invest time and money to tailor their design for some funky single-vendor user-level networking solution that might disappear one day.

Wednesday, May 11, 2011

Conference report: HotOS 2011 in Napa

This week, I served as program chair for the Thirteenth Workshop on Hot Topics in Operating Systems, or HotOS 2011, which took place at the Westin Verasa in Napa, California. HotOS is a unique workshop and one of my favorite venues -- it is the place for systems researchers to put forth their most forward-thinking ideas. Unlike most conferences, HotOS takes 5-page position papers, and it's expected that the submission really represents a position, not a mature piece of technical work condensed into the shorter format.

When it's done right, HotOS is full of great, big sky papers and lots of heated discussions that give the community a chance to think about what's next. In some years, HotOS has been more like an "SOSP preview," with 5-page versions of papers that are likely to appear in a major conference a few months after the workshop. We tried to avoid that this year, and for the most part I think we were successful -- very few papers in this year's HotOS were mature enough to have been considered for SOSP (although that remains to be seen).

I've already blogged about the highly contentious cloud computing panel at HotOS. Here's the rest of the trip report.

Timothy Roscoe holding court at HotOS.
This year I tried to tinker with the conventional conference format in which speakers give 25 minute talks with 5 minutes of questions afterwards. For HotOS, this seems excessive, especially since the papers are so short. Instead, we limited speakers to 10 minutes. There was some pushback on this, but overall I think it was extremely successful: I didn't feel that anyone was rushed, speakers did a great job of staying within the time limits, and by the time a talk started to get boring, it was over.

The other side is we wanted to have room for longer discussions and debates, which often can't happen in the 5 minutes between talks. Too often you hear "let's take that offline," which is code language for "I don't want to get into that in front of the audience." This is a cop-out. At HotOS, after every couple of paper sessions we had a 30-to-45 minute "open mic" session where anybody could ask questions or just rant and rave, which gave plenty of time for more in-depth discussions and debate. At first I was worried that we wouldn't be able to fill up the time, but remarkably there was often plenty of people lined up to take the mic, and lots of great back-and-forth.

A few highlights from this years' HotOS... all of the papers are available online, although they might be limited to attendees only for a while.

Jeff Mogul from HP kicked off the workshop with a talk about reconnecting OS and architecture research. He argued that the systems community is in a rut by demanding that new systems run on commodity hardware, and the architecture community is in a rut by essentially pushing the OS out of the way. He made some great points about the opportunity for OS designs to leverage new hardware features and for the systems community not to be afraid to do so.

To prove this point, Katelin Bailey from UW gave a great talk about how OS designs could leverage fast, cheap NVRAM. The basic idea is to get rid of the gap between memory and disk-based storage altogether, which opens up a wide range of new research directions, like processes which never "die." I find this work very exciting and look forward to following their progress.

Mike Walfish from UT Austin gave a very entertaining talk about "Repair from a Chair." The idea is to allow PC users to have their machines repaired by remote techs by pushing the full software image of their machine into the cloud, where the tech could fix it in a way that the end user can still verify exactly what changes were made to their system. The talk included a nice case study drawn from interviews with Geek Squad and Genius Bar techs -- really cool. My only beef with this idea is that the problem is largely moot when you run applications in the cloud and simply repair the service, rather than the end-user's machine.

Dave Ackley from UNM gave the wackiest, most out-there talk of the conference on "Pursue Robust Indefinite Scalability." I am still not sure exactly what it is about, but the idea seems to be to build modular computers based on a cellular automaton model that can be connected together at arbitrary scales. This is why we have workshops like HotOS -- it would be really hard to get this kind of work into more conventional systems venues. Best quote from the paper: "pledge allegiance to the light cone."

Steve Rumble from Stanford talked about "It's Time for Low Latency," arguing that the time has come to build RPC systems that can achieve 10 microsecond RTTs. Back in the mid-1990s, myself and a bunch of other people spent a lot of time working on this problem, and we called 10 usec the "Culler Constant," since that was the (seemingly unattainable) goal that David Culler set forth for messaging in the Berkeley NOW cluster project. Steve's argument was that the application pull for this -- cloud computing -- is finally here so maybe it's time to revisit this problem in light of modern architectures. I would love to see someone dust off the old work on U-Net and Active Messages and see what kind of performance we can achieve today, and whether there is a role for this kind of approach in modern cluster designs.

Geoff Challen from Univ. Buffalo and Mark Hempstead from Drexel gave the most entertaining talk of the workshop on "The Case for Power-Agile Computing." The idea of the talk was that mobile devices should incorporate multiple hardware components with different power/performance characteristics to support a wide range of applications. As you can see below, Geoff was dressed as a genie and had to say "shazam" a lot.
This might be the first open-shirted presentation ever at HotOS. Let us hope it was the last.
Moises Goldszmidt from MSR gave a really energetic talk on the need for better approaches for modeling and predicting the performance of complex systems. He proposed to use intervention at various points within the system to explore its state space and uncover dependencies. To me, this sounds a lot like the classic system identification problem from control theory, and I would love to see this kind of rigorous engineering approach applied to computer systems performance management.

The traditional Wild and Crazy Ideas session did not disappoint. Margo Seltzer argued that all of the studies assuming users keep cell phones in their pocket (or somewhere on their person) failed to account for the fact that most women keep them in a bag or elsewhere. Good point: I have lost count of how many papers assume that people carry their phones on them at all times. Sam King from UIUC talked about building an app store for household robots, in which the killer app really is a killer app. Dave Andersen from CMU made some kind of extended analogy between systems researchers and an airliner getting ready to crash into a brick wall. (It made more sense with wine.)

We gave away four amazing prizes: Google ChromeOS Laptops! Dave Ackley won the "most outrageous opinion" prize for his wild-eyed thoughts on computer architecture. Vijay Vasudevan from CMU won the best poster award for a poster entitled "Why a Vector Operating System is a Terrible Idea", directly contradicting his own paper in the workshop. Chris Rossbach from MSR and Mike Walfish from UT Austin won the two best talk awards for excellent delivery and great technical content.

Finally, I'd like to thank the program committee and all of the folks at USENIX for helping to make this a great workshop.

Tuesday, May 10, 2011

How can academics do research on cloud computing?

This week I'm in Napa for HotOS 2011 -- the premier workshop on operating systems. HotOS is in its 24th year -- it started as the Workshop on Workstation Operating Systems in 1987. More on HotOS in a forthcoming blog post, but for now I wanted to comment on a very lively argument discussion that took place during the panel session yesterday.

The panel consisted of Mendel Rosenblum from Stanford (and VMWare, of course); Rebecca Isaacs from Microsoft Research; John Wilkes from Google; and Ion Stoica from Berkeley. The charge to the panel was to discuss the gap between academic research in cloud computing and the realities faced by industry. This came about in part because a bunch of cloud papers were submitted to HotOS from academic research groups. In some cases, the PC felt that the papers were trying to solve the wrong problems, or making incorrect assumptions about the state of cloud computing in the real world. We thought it would be interesting to hear from both academic and industry representatives about whether and how academic researchers can hope to do work on the cloud, given that there's no way for a university to build something at the scale and complexity of a real-world cloud platform. The concern is that academics will be relegated to working on little problems at the periphery, or come up with toy solutions.

The big challenge, as I see it, is how to enable academics to do interesting and relevant work on the cloud when it's nearly impossible to build up the infrastructure in a university setting. John Wilkes made the point that that he never wanted to see another paper submission showing a 10% performance improvement in Hadoop, and he's right -- this is not the right problem for academics to be working on. Not because 10% improvement is not useful, or that Hadoop is a bad platform, but because those kinds of problems are already being solved by industry. In my opinion, the best role for academia is to open up new areas and look well beyond where industry is working. But this is often at odds with the desire for academics to work on "industry relevant" problems, as well as to get funding from industry. Too often I think academics fall into the trap of working on things that might as well be done at a company.

Much of the debate at HotOS centered around the industry vs. academic divide and a fair bit of it was targeted at my previous blog posts on this topic. Timothy Roscoe argued that academia's role was to shed light on complex problems and gain understanding, not just to engineer solutions. I agree with this. Sometimes at Google, I feel that we are in such a rush to implement that we don't take the time to understand the problems deeply enough: build something that works and move onto the next problem. Of course, you have to move fast in industry. The pace is very different than academia, where a PhD student needs to spend multiple years focused on a single problem to get a dissertation written about it.

We're not there yet, but there are some efforts to open up cloud infrastructure to academic research. OpenCirrus is a testbed supported by HP, Intel, and Yahoo! with more than 10,000 cores that academics can use for systems research. Microsoft has opened up its Azure cloud platform for academic research. Only one person at HotOS raised their hand when asked if anyone was using this -- this is really unfortunate. (My theory is that academics have an allergic reaction to programming in C# and Visual Studio, which is too bad, since this is a really great platform if you can get over the toolchain.) Google is offering a billion core hours through its Exacycle program, and Amazon has a research grant program as well.

Providing infrastructure is only one part of the solution. Knowing what problems to work on is the other. Many people at HotOS bemoaned the fact that companies like Google are so secretive about what they're doing, and it's hard to learn what the "real" challenges are from the outside. My answer to this is to spend time at Google as a visiting scientist, and send your students to do internships. Even though it might not result in a publication, I can guarantee you will learn a tremendous amount about what the hard problems are in cloud computing and where the great opportunities are for academic work. (Hell, my mind was blown after my first couple of days at Google. It's like taking the red pill.)

A few things that jump to mind as ripe areas for academic research on the cloud:
  • Understanding and predicting performance at scale, with uncertain workloads and frequent node failures.
  • Managing workloads across multiple datacenters with widely varying capacity, occasional outages, and constrained inter-datacenter network links.
  • Building failure recovery mechanisms that are robust to massive correlated outages. (This is what brought down Amazon's EC2 a few weeks ago.)
  • Debugging large-scale cloud applications: tools to collect, visualize, and inspect the state of jobs running across many thousands of cores.
  • Managing dependencies in a large codebase that relies upon a wide range of distributed services like Chubby and GFS.
  • Handling both large-scale upgrades to computing capacity as well as large-scale outages seamlessly, without having to completely shut down your service and everything it depends on.

Tuesday, May 3, 2011

What I'm working on at Google: Making the mobile web fast

A bunch of people have asked me what I work on at Google these days. When I joined Google last July in the Cambridge office, I worked with the team that runs Google’s content delivery network, which is responsible for caching a vast amount of (mostly video) content at many sites around the world. It is a fantastic project with some great people. My own work focused on building tools to measure and evaluate wide-area network performance and detect performance problems. This was a great “starter project,” and I got to build and deploy some pretty large systems that now run on Google’s worldwide fleet.

Now that I’m in Seattle, I am heading my own team with the charter to make the mobile web fast. By “mobile web”, I mean the entire web as accessed from all mobile devices, not just Google services and not just Android. While Android is a big focus of our work, we care a lot about improving performance for all mobile devices. This project is an outgrowth of Google’s broader make the web faster initiative, which has (to date) largely been focused on desktop web. The parent project has been involved with the Chrome browser, the SPDY protocol (a replacement for HTTP), the WebP image format, numerous network stack enhancements, and more. I see mobile as the next big frontier for the web and there are some huge opportunities to make impact in this space.

This is a hugely exciting project since it touches on many different platforms and cuts across layers. Everyone knows that mobile web usage is taking off: the growth of the mobile web is much, much faster than the growth of the desktop web was back during the dot-com boom. At the same time, the mobile web is painfully slow for most sites and most users. And of course, not everyone has the benefit of using a bleeding-edge 4G phone with LTE network speeds of 25 Mbps, like I do (my current phone is an HTC Thunderbolt, and it’s awesome). For the vast majority of Internet users, a mobile phone will be the only device they ever interact with.

So what are we working on? At a high level we are planning to tackle problems in three broad areas: The mobile devices themselves; the services they connect to; and the networks that connect them. On the device side, we are looking at a wide range of OS, network stack, and browser enhancements to improve performance. On the service side, we are looking at better ways to architect websites for mobile clients, providing tools to help web developers maximize their performance, and automatic optimizations that can be performed by a site or in a proxy service. Finally, at the network layer we are looking at the (sometimes painful) interactions between different layers of the protocol stack and identifying ways to streamline them. There is a huge amount of work to do and I am really fortunate to work with some amazing people, like Steve Souders and Arvind Jain, on this effort.

Our goal is to share solutions with the broader web development community -- we hope to release most of our code as open source, as many other Google projects have done. I also plan to maintain strong ties with the academic research community, since I feel that there is a great opportunity to open up new avenues for mobile systems research, as well as leverage the great work that universities are doing in this space.

I think Google is in a unique position to solve this problem. And, of course, we are hiring! Drop me a line if you are interested in joining our team.

Startup Life: Three Months In

I've posted a story to Medium on what it's been like to work at a startup, after years at Google. Check it out here.