Volatile and Decentralized

Sunday, December 26, 2010

Day in the Life of a Googler

I was thinking recently about how different my workdays are now that I'm at Google, compared to the faculty job at Harvard. The biggest difference is that I spent nearly 90% (or more) of my time writing code, compared to Harvard where I was lucky if I got half an hour a week to do any programming. I also spend a lot less time at Google procrastinating and reading a zillion stupid websites -- mostly because I'm enjoying the work a lot more.

Here's a short rundown of my typical day at Google:

6:30am - Wake up, get son up, shower, breakfast, take dog to the park.

8:30am - Leave for work (I take the subway most days).

9:00am - Arrive at work. Type passwords into half a dozen different windows to get my work environment back to a sane state. Check email. Check on status of my several jobs running in various datacenters. Page in work from day before.

9:30am-10:15am - Work on code to add requested feature to the system I'm working on. Debug it until it's working, write a unit test or two. Fire off code changelist for review. Grab third free Diet Coke of the day.

10:15-11:00 - Switch git branches to another project. Take a look at code review comments from a colleague. Go through the code and address the comments. Build new version, re-run tests, re-run lint on the code to make sure it's working and looks pretty. Submit revised changelist and responses to comments.

11:00-11:30 - Switch git branches again. Rebuild code to be safe, then fire off a three-hour MapReduce job to crunch log data to analyze network latencies.

11:30 - 12:00 - Quick videoconference meeting with team members in Mountain View.

12:00-12:35 - Lunch of free yummy food in cafeteria. Regale coworkers with stories of Apple IIgs hacking when I was in middle school.

12:35-2:00 - Back at desk. Check email. Check status of MapReduce job - about halfway done. Respond to last set of comments from code review done in the morning and submit the code. Merge and clean up the git branch. Take a look at task list to decide what to work on next.

2:00-3:00 - Project meeting with teams in Cambridge, Mountain View, and elsewhere by videoconference. This is my only hour-long meeting of the whole week. It is mildly amusing and I mostly spend the time doing some light hacking on my laptop and hitting reload on the MapReduce status page to see if it's done yet. Check Buzz and post a snarky comment or two.

3:00-4:00 - Red Bull infusion to keep energy going for the rest of the day. MapReduce is finally done. Generate graphs of the resulting data and stare at them for a while. Think about why the results are different than expected and write next version of code to generate another set of statistics. Try to get the code to the point where I can fire off another MapReduce before leaving for the day.

4:00-5:00 - Whiskey Thursday! Round up a group of colleagues to drink scotch and play Guitar Hero. (I have a nice collection of scotch under my desk. Somehow I have been designated as the guardian of the alcohol supply, which suits me fine.)

5:00 - Pack up laptop and head home.

5:30-8:00 - Dinner and family time until son goes to bed.

8:00 until bedtime - More hacking, if there's stuff I want to get done tonight, or make a few nice cocktails if not.

Contrast this to my typical work day at Harvard:

6:30am - Wake up, get son up, shower, breakfast, take dog to the park

8:30am - Leave for work (a 20-minute walk from home to the office, and I bring the dog with me).

9:00am - Arrive at office. Check email. Groan at the amount of work I have to do before the onslaught of meetings in the afternoon.

9:15am - Start working on outline for a grant proposal. About three minutes later, decide I don't know what I want to write about so spend next 45 minutes reading Engadget, Hacker News, and Facebook instead.

10:00am - Try to snap out of the Web-induced stupor and try to make headway on a pile of recommendation letters that I have to write. Fortunately these are easy and many of them are cut-and-paste jobs from other recommendation letters I have written for other people before.

11:00am - Check calendar, realize I have only an hour left to get any real work done. Respond to some emails that have been sitting in my inbox for weeks. Email my assistant to set up three more meetings for the following week.

11:30am - Try to make some token headway on the grant proposal by drafting up a budget and sending off the three emails to various support staff to get the paperwork going. Make up a title and a total budget for the proposal that sound reasonable. Still undecided on what the project should be about.

12:00pm - Take dog out for a 20-minute walk around campus. Sometimes spend longer if we run into other dogs to play with.

12:30pm - Run over to Law School cafeteria to grab overpriced and not-very-appetizing lunch, which I eat sullen and alone in my office, while reading Engadget and Hacker News.

1:00pm - First meeting of the day with random person visiting from a random company in Taiwan who will never give me any money but wants me to spend half an hour explaining my research projects to them in extraordinary detail.

1:30pm - Second meeting of the day with second-semester senior who has suddenly decided after four aimless years in college that he wants to do a PhD at Berkeley or MIT. Explain that this will not be possible given zero research track record, but somehow end up promising to write a recommendation letter anyway. Mentally note which other recommendation letters I will cut and paste from later.

2:00pm - Realize that I have to give lecture in half an hour. Pull up lecture notes from last year. Change "2009" to "2010" on the title slide. Skim over them and remember that this lecture was a total disaster but that I don't have time to fix it now.

2:30pm - 4:00pm - Give lecture on cache algorithms to 70 or so somewhat perplexed and bored undergrads. Try to make the lecture more exciting using extensive PowerPoint animations and wild gesticulations with the laser pointer. Answer a bunch of questions that remind me why the lecture was a disaster last year and vow to fix it before delivering again next year.

4:00-4:10pm - Hide in office with door closed trying to calm down after adrenaline rush of lecturing. Gulp large amounts of Diet Coke to re-energize and re-hydrate.

4:10-4:20pm - Check email. Check Engadget. Check Facebook.

4:30-5:00pm - Last meeting of the day with two grad students working on a paper due in less than a week. They have no outline and no results yet but are very optimistic that they will make it in time. Spend half an hour sketching ideas and possible graphs on the whiteboard while they scribble furiously in their notebooks. Make vague promises about reviewing a draft if I see one later in the week.

5:00pm - Walk home with my dog. This is the best part of my day.

5:30pm - Get home, immediately sit down to check enormous pile of email that accumulated while I was in lecture and meetings. Forward five new meeting requests to my assistant for scheduling next week.

5:45pm - 8:00pm - Family time, dinner.

8:00pm - Pretend to "work" by reading email and tinkering with PowerPoint slides for a talk I have to give the next week. Too exhausted to do anything useful, make a drink and read Engadget again.

Tuesday, November 16, 2010

Guest Post: Why I'm staying at Harvard (by Michael Mitzenmacher)

[Michael Mitzenmacher is a professor of Computer Science and the Area Dean for Computer Science at Harvard. He is a dear friend and colleague and has been one of the role models for my own career. Michael wanted to respond to my earlier blog post on leaving Harvard with his own reasons for staying; I am only too happy to oblige. (I swear I did not ghost write this.) You can read more of Michael's own blog here, though he's not posting much these days. --MDW]

To begin, I'd like to say how sorry we are at Harvard that Matt's not returning. Matt's been a great colleague, continually pushing to make CS at Harvard better. His enthusiasm and tenaciousness have made us tangibly better in numerous ways. I, personally, will miss him a lot. Matt pushes hard for what he believes in, but in my experience he's always done so with open ears and an open mind. We're losing a leader, and Google is lucky to have him. I have no doubt he'll do great things for the company, and maybe even earn them another billion or two.

While Matt's decision has been a blow to CS at Harvard, I'm optimistic that our plan for growth will, eventually, make up for that loss. My job as Area Dean is to try to make that happen as soon as possible. I don't want to suggest that replacing Matt will be easy, but rest assured we'll be on the case.

I'd also like to say that I think I understand Matt's reasons for leaving. I'm glad to have him write "I love Harvard, and will miss it a lot." And how could I disagree with statements like "The computer science faculty are absolutely top-notch, and the students are the best a professor could ever hope to work with. It is a fantastic environment, very supportive, and full of great people." But I know from previous talks with him that he hasn't always loved being a professor. And that's what I'll try to write about the rest of the post.

I think there's a sense in academia that people get PhD's so that they can become professors. Most graduate students have that point of view going in -- their experience with research professionals at that point is essentially entirely with faculty. And most professors encourage students to have that goal. Some of that, I think, is that most professors like their job (unsurprisingly), and some may not have other experiences to suggest to their students. And some of it may be more calculated. One measure of a faculty member's success is how many faculty offspring they've produced.

But being a faculty member is not for everyone. As Matt has described in this blog, and I in the past have described in my blog, being a professor is probably not exactly what most people expect. Besides teaching and research, your time gets taken up with administration, managing (graduate) students, fundraising, and service to your scientific community. It's perhaps absurd to expect that everyone who starts out in a PhD program be interested in all these various aspects of the job. And, fortunately, in computer science, there are still many other compelling options available.

As Matt says, at Google, "I get to hack all day." That's just not true as a faculty member -- time for actual hacking is usually pretty small, and more of your time is spend managing others to hack for you. (This is a complaint I've heard from many faculty members.) I can understand why Google would be a very appealing place for someone who wants to write code. I'm sure Matt will come to miss some of the other aspects of being a professor at some point, and I'd imagine Google will to some extent let him entertain some of those aspects.

One of the comments suggested money must be a motivation. For some people who have to make this choice, maybe it is. (See Matt's comments on the post below for his take on that.) So what? Again, it's good that in our field there are good options that pay well. That's a big plus for our field, especially if we accept the fact that not everyone can be or wants to be a professor. But as Matt says, professors at Harvard (and top 20 institutions in general) are doing just fine, and money probably isn't the main issue for those who choose a different path.

I suppose the question that's left is why I'm staying at Harvard -- that is, why I still like being a professor. (And thank you to those of you who think the obvious answer is, "Who else would hire you?") I enjoy the freedom of working on whatever I find interesting; being unrestricted in who I choose to talk to about research problems and ideas; having the opportunity to work with a whole variety of interesting and smart people, from undergraduates to graduate students to CS colleagues all over the globe to math and biology professors a few buildings down; the ample opportunity to do consulting work that both pays well and challenges me in different ways; the schedule that lets me walk my kids to school most every day and be home for dinner most every night; and the security that, as long as I keep enjoying it, I can keep doing this job for the next 30+ years.

The job is never boring. On any given day, I might be teaching, planning a class, working with students, thinking, writing a paper, writing some code, reading, listening to a talk, planning or giving a talk, organizing an event, consulting in some form, or any other manner of things. In the old days, I wrote a blog. These days, I'm administrating, making sure our classes work smoothly, our faculty are satisfied and enabled to do the great things they do, and we're able to continue to expand and get even better. Once I wrote a book, and someday I hope to do that again. Perhaps the biggest possible complaint is that there's always something to do, so you have to learn to manage your time, say no, and make good decisions about what to do every day. As someone who hates being bored, this is generally a good feature of the job for me.

And Harvard, I find, is an especially great place to work. We attract some of the most amazing students. Our still small-ish CS faculty really works together well; we all know who each other are, we keep aware of what we're all doing research-wise, we collaborate frequently, and we compromise and reach consensus on key issues. Outside of the CS faculty, there's all sorts of interesting people and opportunities on campus and nearby. Boston is a great city (albeit too cold and snowy in the winter).

Other profs have made similar comments in Matt's post -- there's a lot to like about the job, and at the same time, it's not the best choice for everyone. Of course I don't like everything about the job. Getting funding is a painful exercise, having papers rejected is frustrating and unpleasant, and not every student is a wondrous joy to work with. I sometimes struggle to put work away and enjoy the rest of my life -- not because of external pressure (especially post-tenure), but because lots of my work is engaging and fun. Of course that's the point -- there's good and bad in all of it, and people's preferences are, naturally, vastly different. I don't think anyone should read too much into Matt's going to Google about the global state of Computer Science, or Professordom, or Harvard, or Google. One guy found a job he likes better than the one he had. It happens all the time, even in academia. It's happened before and will happen again.

But I'm happy with my job right now. In fact, I'm pretty sure my worst day on the job this year was the day Matt told me he wasn't coming back. We'll miss you, Matt, and best of luck in all your endeavors.

Monday, November 15, 2010

Why I'm leaving Harvard

The word is out that I have decided to resign my tenured faculty job at Harvard to remain at Google. Obviously this will be a big change in my career, and one that I have spent a tremendous amount of time mulling over the last few months.

Rather than let rumors spread about the reasons for my move, I think I should be pretty direct in explaining my thinking here.

I should say first of all that I'm not leaving because of any problems with Harvard. On the contrary, I love Harvard, and will miss it a lot. The computer science faculty are absolutely top-notch, and the students are the best a professor could ever hope to work with. It is a fantastic environment, very supportive, and full of great people. They were crazy enough to give me tenure, and I feel no small pang of guilt for leaving now. I joined Harvard because it offered the opportunity to make a big impact on a great department at an important school, and I have no regrets about my decision to go there eight years ago. But my own priorities in life have changed, and I feel that it's time to move on.

There is one simple reason that I'm leaving academia: I simply love work I'm doing at Google. I get to hack all day, working on problems that are orders of magnitude larger and more interesting than I can work on at any university. That is really hard to beat, and is worth more to me than having "Prof." in front of my name, or a big office, or even permanent employment. In many ways, working at Google is realizing the dream I've had of building big systems my entire career.

As I've blogged about before, being a professor is not the job I thought it would be. There's a lot of overhead involved, and (at least for me) getting funding is a lot harder than it should be. Also, it's increasingly hard to do "big systems" work in an academic setting. Arguably the problems in industry are so much larger than what most academics can tackle. It would be nice if that would change, but you know the saying -- if you can't beat 'em, join 'em.

The cynical view is that as an academic systems researcher, the very best possible outcome for your research is that someone at Google or Microsoft or Facebook reads one of your papers, gets inspired by it, and implements something like it internally. Chances are they will have to change your idea drastically to get it to actually work, and you'll never hear about it. And of course the amount of overhead and red tape (grant proposals, teaching, committee work, etc.) you have to do apart from the interesting technical work severely limits your ability to actually get to that point. At Google, I have a much more direct route from idea to execution to impact. I can just sit down and write the code and deploy the system, on more machines than I will ever have access to at a university. I personally find this far more satisfying than the elaborate academic process.

Of course, academic research is incredibly important, and forms the basis for much of what happens in industry. The question for me is simply which side of the innovation pipeline I want to work on. Academics have a lot of freedom, but this comes at the cost of high overhead and a longer path from idea to application. I really admire the academics who have had major impact outside of the ivory tower, like David Patterson at Berkeley. I also admire the professors who flourish in an academic setting, writing books, giving talks, mentoring students, sitting on government advisory boards, all that. I never found most of those things very satisfying, and all of that extra work only takes away from time spent building systems, which is what I really want to be doing.

We'll be moving to Seattle in the spring, where Google has a sizable office. (Why Seattle and not California? Mainly my wife also has a great job lined up there, but Seattle's also a lot more affordable, and we can live in the city without a long commute to work.) I'm really excited about the move and the new opportunities. At the same time I'm sad about leaving my colleagues and family at Harvard. I owe them so much for their support and encouragement over the years. Hopefully they can understand my reasons for leaving and that this is the hardest thing I've ever had to do.

Sunday, November 7, 2010

SenSys 2010 in Zurich

Photo from http://www.flickr.com/photos/aforero/542248140

I just got back from Zurich for SenSys 2010. I really enjoyed the conference this year and Jan Beutel did a fantastic job as general chair. The conference banquet was high up on the Uetliberg overlooking the city, and the conference site at ETH Zurich was fantastic. We also had record attendance -- in excess of 300 -- so all around it was a big success. I didn't make it to all of the talks but I'll briefly summarize some of my favorites here.

Sandy Pentland from the MIT Media Lab gave a great keynote on "Building a Nervous System for Humanity." He gave an overview of his work over the years using various sensors and signals to understand and predict people's behavior. For example, using various sensors in an automobile it is often possible to predict in advance whether someone is about to change lanes, based on subtle prepatory movements that they make while driving. His group has also used wearable sensors to gather data on conversational patterns and social interactivity within groups, and used this data to study practices that influence a business' productivity. This was an amazing keynote and probably the best we have ever had at SenSys -- very much in line with where a lot of work in the conference is headed.

The best paper award on Design and Evaluation of a Versatile and Efficient Receiver-Initiated Link Layer for Low-Power Wireless was presented by Prabal Dutta. This paper describes a new MAC layer based on receiver initiation of transmissions: receivers send probe signals that are used to trigger transmissions by sending nodes with pending packets. Their approach is based on a new mechanism called backcast in which senders respond to a receiver probe with an ACK which is designed to constrictively interfere with multiple ACKs being transmitted by other sending nodes. This allows the receiver probe mechanism to scale with node density. Because A-MAC does not rely on receivers performing idle listening, cross-channel interference (e.g., with 802.11) does not impact energy consumption nearly as much as LPL.

There were a bunch of talks this year on use of cell phones and other sensors for participatory sensing applications. One of my favorites was the paper on AutoWitness from Santosh Kumar's group at the University of Memphis. In this work, a small tag is embedded within a high-value item (like a TV set). If the item is taken from the home, accelerometer and gyro readings are used to determine its probable location. Using HMM-based map matching they showed that they can reconstruct the path taken by a burglar with fairly high accuracy.

Chenyang Lu from WUSTL presented a paper on Reliable Clinical Monitoring using Wireless Sensor Networks: Experience in a Step-down Hospital Unit. This paper presents one of the first studies to make use of low-power wireless sensors in a real hospital environment with real patients. My group spent about seven years working on this problem and we were often frustrated at our inability to get medical personnel to sign on for a full-scale study. Chenyang's group managed to monitor 46 patients in a hospital over 41 days (but only three patients at a time). Their paper showcases a lot of the challenges involved in medical monitoring using wireless sensors and is a must-read for anyone working in the area.

Finally, Steve Dawson-Haggerty from Berkeley presented his work on sMAP, a framework for tying together diverse sensor data for building monitoring. Steve's observation is that while different companies have worked on various protocols for standardizing building monitoring applications, most of these systems are highly proprietary, vertically-integrated nightmares of multiple entangled protocols. Steve took a "Web 2.0" approach to the problem and designed a simple REST-based API permitting a wide range of sensors to be queried through a Web interface. This is a really nice piece of work and demonstrates what is possible when a clean, open, human-centric design is preffered over a design-by-committee protocol spec with twenty companies involved.

Speaking of companies, one disappointing aspect of this years' conference is that there were very few industrial participants. None of the papers were from companies, and only a couple of the demos had any industrial affiliation. Part of the reason for this is that the conference organizers didn't approach many companies for support this year, since the budget was adequate to cover the meeting expenses, but this had the negative effect of there being essentially zero industrial presence. My guess is that the companies are going to the IEEE sensor nets conferences, but I am deeply concerned about what this means for the SenSys community. If companies aren't paying attention to this work, we run the risk of the wheels of innovation grinding to a halt.

There was one talk this year that was highly controversial -- Tian He's group from University of Minnesota presented a paper on an "energy distribution network" for sensor nets. The idea is to allow sensor nodes to push energy around, in this case, using wires connecting the nodes together. Unfortunately, the presenter did not justify this design choice at all and the only experiments involved very short (1 meter) cables between nodes. It seems to me that if you connect nodes together using wires, you can centralize the power supply and bypass the need for a low-power node design in the first place. The fact that the presenter didn't have any good arguments for this design suggests that the research group has not spent enough time talking to other people about their work, so they've built up a herd mentality that this actually makes sense. I don't think it does but would love to hear some good arguments to the contrary.

Apart from SenSys, I had the chance to (briefly) visit Timothy Roscoe at ETH Zurich as well as connect with some colleagues at the Google Zurich office. ETH Zurich is a very exciting place: lots happening, lots of faculty growth, tons of resources, good students and postdocs. I was very impressed. Even more impressive is Google's office in Zurich, which has the most over-the-top design of any of the Google offices I've visited so far (including Mountain View). The office is beautifully laid out and has a bunch of humorous design touches, including an indoor jungle and firepoles that connect the floors (with a helpful sign that reads, "don't carry your laptop while sliding down the pole.")

Wednesday, November 3, 2010

Conference talk pet peeves

I'm sitting here at SenSys 2010 in Zurich and listening to some pretty interesting -- and also some pretty dull -- talks on the latest research in sensor networks. Now seems like an appropriate time for a blog post I've been saving for a while -- some of the things that really annoy me when I'm listening to a talk. Of course, I'm sometimes guilty of these myself, and I'm not the best speaker either. But I guess I have license to gripe as a listener.

There are lots of tips on there on how to give a good talk. David Patterson's "How to give a bad talk" is a great summary of what NOT to do. Some of these things are fairly obvious, like not cramming too much text on one slide, but others I see happen again and again when I'm listening to talks at a conference.

The dreaded outline slide: Nearly every 25-minute talk in a systems conference has the same format. Why do speakers feel compelled to give the mandatory outline slide --

"First, I'll give the motivation and background for this work. Next, I'll describe the design of FooZappr, our syetem for efficient frobnotzing of asynchronous boondoggles. Next, I'll describe the implementation of FooZappr. Then, I will present evaluation, and finally, related work and conclusions..."

After having seen several hundred such talks I have this memorized by now, so I don't think it is a good use of time. An outline slide is sometimes a good idea for a longer talk, but it should have some content -- guideposts for the audience, or highlights of the major ideas. This is rarely needed for a short conference talk.

Reading the slides: The number one thing that drives me up the wall is when the speaker simply reads the text on the slide, or essentially says the same thing in slightly different words than what is printed on the bullets. This is lazy, and suggests that the talk hasn't been rehearsed at all. It's also the fastest way to bore the audience. Most members of the audience have two parallel reception channels: visual and auditory -- so I try to use both at once and provide (slightly) redundant information across the two channels in case of loss (e.g., tuning out the slide).

No sense of design: It can be physically painful to watch an entire talk crammed full of multiple fonts, clashing colors, inconsistent use of graphics, and that awful PowerPoint clip art (you know the ones: skinny stick figures scratching their heads). Modern presentation software, including PowerPoint, lets you design beautiful and visually compelling talks -- use it! If you insist on coming up with your own template, at least use the colors and fonts in a minimal and consistent way. I tend to use the Harvard crimson banners on my slides and the same color for highlight text. A grad student once complemented me on the beautiful font choice in my talk -- it was Helvetica. You shouldn't spend too much time on this, after all, if your slides look good but have terrible content, it's not worth it.

No sense of humor: I've lost count of how many conference talks I've heard that are nothing more than dry recitations of the technical content of the paper. No attempt is made at humor anywhere in the talk - not a joke to warm up the audience, or at least a visual joke somewhere in the slides to wake people up a bit. A conference talk is entertainment (albeit an obscure kind of entertainment for an incredibly dorky audience) -- the speaker should at least make some effort to make the talk interesting and delightful. Most conference attendees spent hundreds of dollars (thousands if you include travel) for the privilege of listening to your talk, so you owe it to them to deliver it well. This is not to say that you should overload the talk with jokes, but breaking up the presentation with a bit of levity never hurt anyone.

Keep in mind that a conference talk is meant to be an advertisement for your paper. You do not have to cram every technical detail in there. What will the audience remember about your talk? I'll never forget Neil Spring's talk on ScriptRoute where he used a bunch of ridiculous custom Flash animations.

Of course, the talk delivery matters tremendously. If you're one of those dull, monotonic speakers or have a thick accent, you are probably not going to get a reputation as a good speaker. If you sound totally bored by your talk, the audience will be too. Some grad students are surprised that this matters so much and think it shouldn't -- but if you're planning on pursuing an academic career, you have to give a LOT of talks. So you should get good at it.

"Let's take that offline." This is a frequent response to a question that the speaker doesn't want to answer. I've heard speakers jump immediately to this rather than make any attempt whatsoever of answering. This has become far too socially acceptable at conferences and I think questioners (and session chairs) should push back. It is occasionally OK to take a discussion offline if it is going to be a lengthy discussion or there's clearly no agreement between the speaker and questioner, but I think speakers should be expected to answer questions posed after the talk.

Finally, with digital cameras there's an increasing trend of audience members taking photos of every talk slide (sometimes for every talk). Here at SenSys there's someone who is taking a video of every talk on his cell phone. I find this fairly obnoxious, especially when the photographer insists on using a flash and leaving on the camera's shutter "beep". If you want my slides, just ask me and I'll send you the PPT. I also think it's rude to take a video of a talk without asking the speaker's permission.

Tuesday, October 19, 2010

Computing at scale, or, how Google has warped my brain

A number of people at Google have stickers on their laptops that read "my other computer is a data center." Having been at Google for almost four months, I realize now that my whole concept of computing has radically changed since I started working here. I now take it for granted that I'll be able to run jobs on thousands of machines, with reliable job control and sophisticated distributed storage readily available.

Most of the code I'm writing is in Python, but makes heavy use of Google technologies such as MapReduce, BigTable, GFS, Sawzall, and a bunch of other things that I'm not at liberty to discuss in public. Within about a week of starting at Google, I had code running on thousands of machines all over the planet, with surprisingly little overhead.

As an academic, I have spent a lot of time thinking about and designing "large scale systems", though before coming to Google I rarely had a chance to actually work on them. At Berkeley, I worked on the 200-odd node NOW and Millennium clusters, which were great projects, but pale in comparison to the scale of the systems I use at Google every day.

A few lessons and takeaways from my experience so far...

The cloud is real. The idea that you need a physical machine close by to get any work done is completely out the window at this point. My only machine at Google is a Mac laptop (with a big honking monitor and wireless keyboard and trackpad when I am at my desk). I do all of my development work on a virtual Linux machine running in a datacenter somewhere -- I am not sure exactly where, not that it matters. I ssh into the virtual machine to do pretty much everything: edit code, fire off builds, run tests, etc. The systems I build are running in various datacenters and I rarely notice or care where they are physically located. Wide-area network latencies are low enough that this works fine for interactive use, even when I'm at home on my cable modem.

In contrast, back at Harvard, there are discussions going on about building up new resources for scientific computing, and talk of converting precious office and lab space on campus (where space is extremely scarce) into machine rooms. I find this idea fairly misdirected, given that we should be able to either leverage a third-party cloud infrastructure for most of this, or at least host the machines somewhere off-campus (where it would be cheaper to get space anyway). There is rarely a need for the users of the machines to be anywhere physically close to them anymore. Unless you really don't believe in remote management tools, the idea that we're going to displace students or faculty lab space to host machines that don't need to be on campus makes no sense to me.

The tools are surprisingly good. It is amazing how easy it is to run large parallel jobs on massive datasets when you have a simple interface like MapReduce at your disposal. Forget about complex shared-memory or message passing architectures: that stuff doesn't scale, and is so incredibly brittle anyway (think about what happens to an MPI program if one core goes offline). The other Google technologies, like GFS and BigTable, make large-scale storage essentially a non-issue for the developer. Yes, there are tradeoffs: you don't get the same guarantees as a traditional database, but on the other hand you can get something up and running in a matter of hours, rather than weeks.

Log first, ask questions later. It should come as no surprise that debugging a large parallel job running on thousands of remote processors is not easy. So, printf() is your friend. Log everything your program does, and if something seems to go wrong, scour the logs to figure it out. Disk is cheap, so better to just log everything and sort it out later if something seems to be broken. There's little hope of doing real interactive debugging in this kind of environment, and most developers don't get shell access to the machines they are running on anyway. For the same reason I am now a huge believer in unit tests -- before launching that job all over the planet, it's really nice to see all of the test lights go green.

Sunday, October 10, 2010

In Defense of Mark Zuckerberg

I finally got to see The Social Network, the new movie about the founding of Facebook. The movie is set during my first year teaching at Harvard, and in fact there is a scene where I'm shown teaching the Operating Systems course (in a commanding performance by Brian Palermo -- my next choice was Brad Pitt, but I'm thrilled that Brian was available for the role). The scene even shows my actual lecture notes on virtual memory. Of course, the content of the scene is completely fictional -- Mark Zuckerberg never stormed out of my class (and I wouldn't have humiliated him for it if he had) -- although the bored, glazed-over look of the students in the scene was pretty much accurate.

It's a great movie, and very entertaining, but there are two big misconceptions that I'd like to clear up. The first is that the movie inaccurately portrays Harvard as a place full of snobby, rich kids who wear ties and carry around an inflated sense of entitlement. Of course, my view (from the perspective of a Computer Science faculty member) might be somewhat skewed, but I've never seen this in my seven years of teaching here. Harvard students come from pretty diverse backgrounds and are creative, funny, and outgoing. I've had students from all corners of the world and walks of life in my classes, and I learn more from them than they'll ever learn from me -- the best part of my job is getting to know them. I've only seen one student here wearing a tweed jacket with elbow patches, and I'm pretty sure he was being ironic.

The second big problem with the movie is its portrayal of Mark Zuckerberg. He comes across in the film as an enormous asshole, tortured by the breakup with his girlfriend and inability to get into the Harvard Final Clubs. This is an unfair characterization and not at all the Mark Zuckerberg that I know. The movie did a good job at capturing how Mark speaks (and especially how he dresses), but he's nowhere near the back-stabbing, ladder-climbing jerk he's made out to be in the film. He's actually an incredibly nice guy, super smart, and needless to say very technically capable. If anything, I think Mark was swept up by forces that were bigger and more powerful than anyone could have expected when the Facebook was first launched. No doubt he made some mistakes along the way, but it's too bad that the movie vilifies him so. (Honestly, when I first heard there was a movie coming out about Facebook with Mark Zuckerberg as the main character, I couldn't believe it -- the quiet, goofy, somewhat awkward Mark that I know hardly sounded like a winning formula for a big-budget Hollywood film.)

The take-away from the movie is clear: nerds win. Ideas are cheap and don't mean squat if you don't know how to execute on them. To have an impact you need both the vision and the technical chops, as well as the tenacity to make something real. Mark was able to do all of those things, and I think he deserves every bit of success that comes his way. As I've blogged about before, I once tried to talk Mark out of starting Facebook -- and good thing he never listened to me. The world would be a very different (and a lot less fun, in my opinion) place if he had.