Volatile and Decentralized: 2013

Sunday, August 18, 2013

Rewriting a large production system in Go

My team at Google is wrapping up an effort to rewrite a large production system (almost) entirely in Go. I say "almost" because one component of the system -- a library for transcoding between image formats -- works perfectly well in C++, so we decided to leave it as-is. But the rest of the system is 100% Go, not just wrappers to existing modules in C++ or another language. It's been a fun experience and I thought I'd share some lessons learned.

Plus, the Go language has a cute mascot ... awwww!

Why rewrite?

The first question we must answer is why we considered a rewrite in the first place. When we started this project, we adopted an existing C++ based system, which had been developed over the course of a couple of years by two of our sister teams at Google. It's a good system and does its job remarkably well. However, it has been used in several different projects with vastly different goals, leading to a nontrivial accretion of cruft. Over time, it became apparent that for us to continue to innovate rapidly would be extremely challenging on this large, shared codebase. This is not a ding to the original developers -- it is just a fact that when certain design decisions become ossified, it becomes more difficult to rethink them, especially when multiple teams are sharing the code.

Before doing the rewrite, we realized we needed only a small subset of the functionality of the original system -- perhaps 20% (or less) of what the other projects were doing with it. We were also looking at making some radical changes to its core logic, and wanted to experiment with new features in a way that would not impact the velocity of our team or the others using the code. Finally, the cognitive burden associated with making changes to any large, shared codebase is unbearable -- almost any change required touching lots of code that the developer did not fully understand, and updating test cases with unclear consequences for the other users of the code.

So, we decided to fork off and do a from-scratch rewrite. The bet we made was that taking an initial productivity hit during the initial rewrite would pay off in droves when we were able to add more features over time. It has also given us an opportunity to rethink some of the core design decisions of our system, which has been extremely valuable for improving our own understanding of its workings.

Why Go?

I'll admit that at first I was highly skeptical of using Go. This production system sits directly on the serving path between users and their content, so it has to be fast. It also has to handle a large query volume, so CPU and memory efficiency are key. Go's reliance on garbage collection gave me pause (pun intended ... har har har), given how much pain Java developers go through to manage their memory footprint. Also, I was not sure how well Go would be supported for the kind of development we wanted to do inside of Google. Our system has lots of dependencies, and the last thing I wanted was to have to reinvent lots of libraries in Go that we already had in C++. Finally, there was also simply the fear of the unknown.

My whole attitude changed when Michael Piatek (one of the star engineers in the group) sent me an initial cut at the core system rewrite in Go, the result of less than a week's work. Unlike the original C++ based system, I could actually read the code, even though I didn't know Go (yet). The #1 benefit we get from Go is the lightweight concurrency provided by goroutines. Instead of a messy chain of dozens of asynchronous callbacks spread over tens of source files, the core logic of the system fits in a couple hundred lines of code, all in the same file. You just read it from top to bottom, and it makes sense.

Michael also made the observation that Go is a language designed for writing Web-based services. Its standard libraries provide all of the machinery you need for serving HTTP, processing URLs, dealing with sockets, doing crypto, processing dates and timestamps, doing compression. Unlike, say, Python, Go is a compiled language and therefore very fast. Go's modular design makes for beautiful decomposition of code across modules, with clear explicit dependencies between them. Its incremental compilation approach makes builds lightning fast. Automatic memory management means you never have to worry about freeing memory (although the usual caveats with a GC-based language apply).

Being terse

Syntactically, Go is very succinct. Indeed, the Go style guidelines encourage you to write code as tersely as possible. At first this drove me up the wall, since I was used to using long descriptive variable names and spreading expressions over as many lines as possible. But now I appreciate the terse coding approach, as it makes reading and understanding the code later much, much easier.

Personally, I really like coding in Go. I can get to the point without having to write a bunch of boilerplate just to make the compiler happy. Unlike C++, I don't have to split the logic of my code across header files and .cc files. Unlike Java, you don't have to write anything that the compiler can infer, including the types of variables. Go feels a lot like coding in a lean scripting language, like Python, but you get type safety for free.

Our Go-based rewrite is 121 Go source files totaling about 21K lines of code (including comments). Compare that to the original system, which was 1400 C++ source files with 460K lines of code. (Remember what I said about the new system implementing a small subset of the new system's functionality, though I do feel that the code size reduction is disproportionate to the functionality reduction.)

What about ramp-up time?

Learning Go is easy coming from a C-like language background. There are no real surprises in the language; it pretty much makes sense. The standard libraries are very well documented, and there are plenty of online tutorials. None of the engineers on the team have taken very long at all to come up to speed in the language; heck, even one of our interns picked it up in a couple of days.

Overall, the rewrite has taken about 5 months and is already running in production. We have also implemented 3 or 4 major new features that would have taken much longer to implement in the original C++ based system, for the reasons described above. I estimate that our team's productivity has been improved by at least a factor of ten by moving to the new codebase, and by using Go.

Why not Go?

There are a few things about Go that I'm not super happy about, and that tend to bite me from time to time.

First, you need to "know" whether the variable you are dealing with is an interface or a struct. Structs can implement interfaces, of course, so in general you tend to treat these as the same thing. But when you're dealing with a struct, you might be passing by reference, in which the type is *myStruct, or you might be passing by value, in which the type is just myStruct. If, on the other hand, the thing you're dealing with is "just" an interface, you never have a pointer to it -- an interface is a pointer in some sense. It can get confusing when you're looking at code that is passing things around without the * to remember that it might actually "be a pointer" if it's an interface rather than a struct.

Go's type inference makes for lean code, but requires you to dig a little to figure out what the type of a given variable is if it's not explicit. So given code like:

foo, bar := someFunc(baz)

You'd really like to know what foo and bar actually are, in case you want to add some new code to operate on them. If I could get out of the 1970s and use an editor other than vi, maybe I would get some help from an IDE in this regard, but I staunchly refuse to edit code with any tool that requires using a mouse.

Finally, Go's liberal use of interfaces allows a struct to implement an interface "by accident". You never have to explicitly declare that a given struct implements a particular interface, although it's good coding style to mention this in the comments. The problem with this is that it can be difficult to tell when you are reading a given segment of code whether the developer intended for their struct to implement the interface that they appear to be projecting onto it. Also, if you want to refactor an interface, you have to go find all of its (undeclared) implementations more or less by hand.

Most of all I find coding in Go really, really fun. This is a bad thing, since we all know that "real" programming is supposed to be a grueling, painful exercise of fighting with the compiler and tools. So programming in Go is making me soft. One day I'll find myself in the octagon ring with a bunch of sweaty, muscular C++ programmers bare-knuckling it out to the death, and I just know they're going to mop the floor with me. That's OK, until then I'll just keep on cuddling my stuffed gopher and running gofmt to auto-intent my code.

ObDisclaimer: Everything in this post is my personal opinion and does not represent the view of my employer.

Thursday, July 11, 2013

Does the academic process slow innovation?

I've been wondering recently whether the extended, baroque process of doing research in an academic setting (by which I mean either a university or an "academic style" research lab in industry) is doing more harm than good when it comes to the pace of innovation.

From http://academicnegativity.tumblr.com/

Prior to moving to industry, I spent my whole career as an academic. It took me a while to get used to how fast things happen in industry. My team, which is part of Chrome, does a new major release every six weeks. This is head-spinningly fast compared to academic projects. Important decisions are made on the order of days, not months. Projects are started up and executed an order of magnitude faster than it would take a similarly-sized academic research group to get up to speed.

This is not just about having plenty of funding (although that is part of it). It is also about what happens when you abandon the trappings of the academic process, for which the timelines are glacial:

A three month wait (typically) to get a decision on a conference submission, during which time you are not allowed to submit similar work elsewhere.
A six month wait on hearing back on a grant proposal submission.
A year or more wait for a journal publication, with a similar restriction on parallel submissions.
Five plus years to get a PhD.
Possibly one or two years as a postdoc.
Six to eight years to get tenure.
A lifetime of scarring as the result of the above. (Okay, I'm kidding. Sort of.)

This is not a problem unique to computer science of course. In the medical field, the average age at which a PI receives their first NIH R01 grant is 44 years. Think about that for a minute. That's 23-some-odd years after graduation before an investigator is considered an "independent" contributor to the research field. Is this good for innovation?

Overhead

Part of the problem is that the academic process is full of overheads. Take a typical conference program committee for example. Let's say the committee has 15 members, each of whom has 30 papers to review (this is pretty average, for good conferences at least). Each paper takes at least an hour to review (often more) - that's the equivalent of at least 4 work days (that is, assuming academics work only 8 hours a day ... ha ha!). Add on two more full days (minimum) for the program committee meeting and travel, and you're averaging about a full week of work for each PC member. Multiply by 15 -- double it for the two program co-chairs -- and you're talking about around 870 person-hours combined effort to decide on the 25 or so papers that will appear in the conference. That's 34 person-hours of overhead per paper. This doesn't count any of the overheads associated with actually organizing the conference -- making the budget, choosing the hotel, raising funds, setting up the website, publishing the proceedings, organizing the meals and poster sessions, renting the projectors ... you get my point.

The question is, does all of this time and effort produce (a) better science or (b) lead to greater understanding or impact? I want to posit that the answer is no. This process was developed decades ago in a pre-digital era where we had no other way to disseminate research results. (Hell, it's gotten much easier to run a program committee now that submissions are done via the web -- it used to be you had to print out 20 copies of your paper and mail them to the program chair who would mail out large packets to each of the committee members.)

But still, we cling to this process because it's the only way we know how to get PhD students hired as professors and get junior faculty tenured -- any attempt to buck the trend would no doubt jeopardize the career of some young academic. It's sad.

How did we get here?

Why do we have these processes in the first place? The main reason is competition for scarce resources. Put simply, there are too many academics, and not enough funding and not enough paper-slots in good conference venues. Much has been said about the sad state of public funding for science research. Too many academics competing for the same pool of money means longer processes for proposal reviews and more time re-submitting proposals when they get rejected.

As far as the limitation on conferences goes, you can't create more conferences out of thin air, because people wouldn't have time to sit on the program committees and travel to all of them (ironic, isn't it?). Whenever someone proposes a new conference venue there are groans of "but how will we schedule it around SOSP and OSDI and NSDI and SIGCOMM?!?" - so forget about that. Actually, I think the best model would be to adopt the practice of some research communities and have one big mongo conference every year that everybody goes to (ideally in Mexico) and have USENIX run it so the scientists can focus on doing science and leave the conference organization to the experts. But I digress.

The industrial research labs don't have the same kind of funding problem, but they still compete for paper-slots. And I believe this inherently slows everything down because you can't do new research when you have to keep backtracking to get that paper you spent so many precious hours on finally published after the third round of rejections with "a strong accept, two weak accepts, and a weak reject" reviews. It sucks.

Innovative != Publishable

My inspiration for writing this post came from the amazing pace at which innovation is happening in industry these days. The most high-profile of these are crazy "moon shot" projects like SpaceX, 23andme, and Google's high-altitude balloons to deliver Internet access to entire cities. But there are countless other, not-as-sexy innovations happening every day at companies big and small, just focused on changing the world, rather than writing papers about it.

I want to claim that even with all of their resources, had these projects gone down the conventional academic route -- writing papers and the like -- they would have never happened. No doubt if a university had done the equivalent of, say, Google Glass and submitted a MobiSys paper on it, it would have been rejected as "not novel enough" since Thad Starner has been wearing a computer on his head for 20 years. And high-altitude Internet balloons? What's new about that? It's just a different form of WiFi, essentially. Nothing new there.

We still need to publish research, though, which is important for driving innovation. But we should shift to an open, online publication model -- like arXiv -- where everything is "accepted" and papers are reviewed and scored informally after the fact. Work can get published much more rapidly and good work won't be stuck in the endless resubmission cycle. Scientists can stop wasting so much time and energy on program committees and conference organization. (We should still have one big conference every year so people still get to meet and drink and bounce ideas around.) This model is also much more amenable to publications from industry, who currently have little incentive to run the conference submission gauntlet, unless publishing papers is part of their job description. And academics can still use citation counts or "paper ratings" as the measure by which hiring and promotion decisions are made.

Wednesday, May 15, 2013

What I wish systems researchers would work on

I just got back from HotOS 2013 and, frankly, it was a little depressing. Mind you, the conference was really well-organized; there were lots of great people; an amazing venue; and fine work by the program committee and chair... but I could not help being left with the feeling that the operating systems community is somewhat stuck in a rut.

It did not help that the first session was about how to make network and disk I/O faster, a topic that has been a recurring theme for as long as "systems" has existed as a field. HotOS is supposed to represent the "hot topics" in the area, but when we're still arguing about problems that are 25 years old, it starts to feel not-so-hot.

Of the 27 papers presented at the workshop, only about 2 or 3 would qualify as bold, unconventional, or truly novel research directions. The rest were basically extended abstracts of conference submissions that are either already in preparation or will be submitted in the next year or so. This is a perennial problem for HotOS, and when I chaired it in 2011 we had the same problem. So I can't fault the program committee on this one -- they have to work with the submissions they get, and often the "best" and most polished submissions represent the most mature (and hence less speculative) work. (Still, this year there was no equivalent to Dave Ackley's paper in 2011 which challenged us to "pledge allegiance to the light cone.")

This got me thinking about what research areas I wish the systems research community would spend more time on. I wrote a similar blog post after attending HotMobile 2013, so it's only fair that I would subject the systems community to the same treatment. A few ideas...

Obligatory diisclaimer: Everything in this post is my personal opinion and does not represent the view of my employer.

An escape from configuration hell: A lot of research effort is focused on better techniques for finding and mitigating software bugs. In my experience at Google, the vast majority of production failures arise not due to bugs in the software, but bugs in the (often vast and incredibly complex) configuration settings that control the software. A canonical example is when someone bungles an edit to a config file which gets rolled out to the fleet, and causes jobs to start behaving in new and often not-desirable ways. The software is working exactly as intended, but the bad configuration is leading it to do the wrong thing.

This is a really hard problem. A typical Google-scale system involves many interacting jobs running very different software packages each with their own different mechanisms for runtime configuration: whether they be command-line flags, some kind of special-purpose configuration file (often in a totally custom ASCII format of some kind), or a fancy dynamically updated key-value store. The configurations are often operating at very different levels of abstraction --- everything from deciding where to route network packets, to Thai and Slovak translations of UI strings seen by users. "Bad configurations" are not just obvious things like syntax errors; they also include unexpected interactions between software components when a new (perfectly valid) configuration is used.

There are of course tools for testing configurations, catching problems and rapidly rolling back bad changes, etc. but a tremendous amount of developer and operational energy goes into fixing problems arising due to bad configurations. This seems like a ripe area for research.

Understanding interactions in a large, production system: The common definition of a "distributed system" assumes that the interactions between the individual components of the system are fairly well-defined, and dictated largely by whatever messaging protocol is used (cf., two phase commit, Paxos, etc.) In reality, the modes of interaction are vastly more complex and subtle than simply reasoning about state transitions and messages, in the abstract way that distributed systems researchers tend to cast things.

Let me give a concrete example. Recently we encountered a problem where a bunch of jobs in one datacenter started crashing due to running out of file descriptors. Since this roughly coincided with a push of a new software version, we assumed that there must have been some leak in the new code, so we rolled back to the old version -- but the crash kept happening. We couldn't just take down the crashing jobs and let the traffic flow to another datacenter, since we were worried that the increased load would trigger the same bug elsewhere, leading to a cascading failure. The engineer on call spent many, many hours trying different things and trying to isolate the problem, without success. Eventually we learned that another team had changed the configuration of their system which was leading to many more socket connections being made to our system, which put the jobs over the default file descriptor limit (which had never been triggered before). The "bug" here was not a software bug, or even a bad configuration: it was the unexpected interaction between two very different (and independently-maintained) software systems leading to a new mode of resource exhaustion.

Somehow there needs to be a way to perform offline analysis and testing of large, complex systems so that we can catch these kinds of problems before they crop up in production. Of course we have extensive testing infrastructure, but the "hard" problems always come up when running in a real production environment, with real traffic and real resource constraints. Even integration tests and canarying are a joke compared to how complex production-scale systems are. I wish I had a way to take a complete snapshot of a production system and run it in an isolated environment -- at scale! -- to determine the impact of a proposed change. Doing so on real hardware would be cost-prohibitive (even at Google), so how do you do this in a virtual or simulated setting?

I'll admit that these are not easy problems for academics to work on. Unless you have access to a real production system, it's unlikely you'll encounter this problem in an academic setting. Doing internships at companies is a great way to get exposure to this kind of thing. Replicating this problem in an academic environment may be difficult.

Pushing the envelope on new computing platforms: I also wish the systems community would come back to working on novel and unconventional computing platforms. The work on sensor networks in the 2000's really challenged our assumptions about the capabilities and constraints of a computer system, and forced us down some interesting paths in terms of OS, language, and network protocol design. In doing these kinds of explorations, we learn a lot about how "conventional" OS concepts map (or don't map) onto the new platform, and the new techniques can often find a home in a more traditional setting: witness how the ideas from Click have influenced all kinds of systems unrelated to its original goals.

I think it is inevitable that in our lifetimes we will have a wearable computing platform that is "truly embedded": either with a neural interface, or with something almost as good (e.g. seamless speech input and visual output in a light and almost-invisible form factor). I wore my Google Glass to HotOS, which stirred up a lot of discussions around privacy issues, what the "killer apps" are, what abstractions the OS should support, and so forth. I would call Google Glass an early example of the kind of wearable platform that may well replace smartphones, tablets, and laptops as the personal computing interface of choice in the future. If that is true, then now is the time for the academic systems community to start working out how we're going to support such a platform. There are vast issues around privacy, energy management, data storage, application design, algorithms for vision and speech recognition, and much more that come up in this setting.

These are all juicy and perfectly valid research problems for the systems community -- if only it is bold enough to work on them.

Sunday, April 21, 2013

The other side of "academic freedom"

My various blog posts about moving from academia to industry have prompted a number of conversations with PhD students who are considering academic careers. The most oft-cited reason for wanting a faculty job is "academic freedom," which is typically described as "being able to work on anything you want." This is a nice theory, but I think it's important to understand the realities, especially for pre-tenure, junior faculty.

I don't believe that most professors (even tenured ones) can genuinely work on "anything they want." In practice, as a professor you are constrained by at least four things:

What you can get funding to do;
What you can publish (good) papers about;
What you can get students to help you with;
What you can do better than anyone else in the field.

These are important limitations to consider, and I want to take them one by one.

Funding doesn't come easy. When I was a PhD student at Berkeley, I was fortunate to be a student of David Culler's, who had what seemed like an endless supply of funding from big DARPA and NSF grants, among others. When I went to start my faculty career, he (and many others) told me I would have "no problem" getting plenty of funding. This turned out not to be true. Shortly after I started my faculty job, DARPA all but shut down their programs in computer science, and NSF grants became heavily constrained (and much more competitive). Being a freshly-minted faculty member meant I was essentially a nobody, but that didn't mean that NSF review panels took pity on me -- apart from special programs like the CAREER award, you're competing with the most senior, established people in your field for every grant. To make matters worse, I didn't have a lot of senior colleagues in my area at Harvard to write proposals with, so I mostly had to go it alone.

Now, I will readily admit that I suck at writing grants, although according to my colleagues my hit rate for funding was about on par with other profs in my area. However, there were several projects that I simply could not do because I couldn't get funding for them. I tried for four years to get an NSF grant for our work on monitoring volcanoes with sensor networks -- which was arguably the thing I was most famous for as a professor. I failed. As a result we never did the large-scale, 100-node, multi-month study that we had hoped to do. It was a huge disappointment and taught me a valuable lesson that you can't work on something that you can't get funding for.

Who decides which problems are sexy (and therefore publishable)? I'll tell you: it's the 30-some-odd people who serve on the program committees of the top conferences in your area year after year. It is very rare for a faculty member to buck the trend of which topics are "hot" in their area, since they would run a significant risk of not being able to publish in the top venues. This can be absolutely disastrous for junior faculty who need a strong publication record to get tenure. I know of several faculty who were denied tenure specifically because they chose to work on problems outside of the mainstream, and were not able to publish enough top papers as a result. So, sure, they could work on "anything they wanted," but that ended up getting them fired.

Now, there are some folks (David Culler being one of them) who are able to essentially start new fields and get the community to go along with them. I argue that most professors are not able to do this, even tenured ones. Most people have to go where the funding and the publication venues are.

What can you get students to work on? I don't mean this in a kind of grad-students-won't-write-unit-tests kind of way (although that is also true). What I mean is how likely is it that you will find grad students in your field who have the requisite skills to undertake a particular research agenda? In my case, I would have killed for some students who really knew how to design circuit boards. Or students who had some deep understanding of compiler optimization -- but still wanted to work on (and publish) in the area of operating systems. A bunch of times I felt that the problems I could tackle were circumscribed by my students' (and my own) technical skills. This has nothing to do with the "quality" of the students; it's just the fact that PhD students (by definition) have to be hyper-specialized. This means that grad students in a given area tend to have a fairly narrow set of skills, which can be a limitation at times.

Can you differentiate your research? The final (and arguably most important) aspect of being successful as a faculty member is being able to solve new problems better than anyone else in your area. It is not usually enough to simply do a better job solving the same problem as someone else -- you need to have a new idea, a new spin, a new approach -- or work on a different problem. Hot areas tend to get overcrowded, making it difficult for individual faculty to differentiate themselves. For a while it felt like everyone was working on peer-to-peer networking. A bunch of "me too" research projects started up, most of which were forgettable. Being one of those "me too" researchers in a crowded area would be a very bad idea for a pre-tenure faculty member.

Do things get better after tenure? I didn't stick around long enough to find out, so I don't know. I definitely know some tenured faculty who are coasting and care a lot less about where and how much they publish, or who tend to dabble rather than take a more focused research agenda post-tenure. Certainly you cannot get fired if you are not publishing or bringing in the research dollars anymore, but to me this sounds like an unsatisfying career. Others -- like David Culler -- are able to embark on ambitious, paradigm-shifting projects (like NOW and TinyOS) without much regard to which way the winds are blowing. I think most tenured faculty would agree that they are subject to the same sets of pressures to work on fundable, publishable research as pre-tenure faculty, if they care about having impact.

Okay, but how much freedom do you have in industry? This is worth a separate post on its own, which I will write sometime soon. The short version is that it depends a lot on the kind of job you have and what kind of company you work for. My team at Google has a pretty broad mandate which gives us a fair bit of freedom. But unlike academia, we aren't limited by funding (apart from headcount, which is substantial); technical skills (we can hire people with the skills we need); or the somewhat unpredictable whims of a research community or NSF panel. So, yes, there are limitations, but I think they are no more severe, and a lot more rational, than what you often experience as an academic.

Monday, April 8, 2013

Running a software team at Google

I'm often asked what my job is like at Google since I left academia. I guess going from tenured professor to software engineer sounds like a big step down. Job titles aside, I'm much happier and more productive in my new role than I was in the 8 years at Harvard, though there are actually a lot of similarities between being a professor and running a software team.

LIKE A BOSS.

I lead a team at Google's Seattle office which is responsible for a range of projects in the mobile web performance area (for more background on my team's work see my earlier blog post on the topic). One of our projects is the recently-announced data compression proxy support in Chrome Mobile. We also work on the PageSpeed suite of technologies, specifically focusing on mobile web optimization, as well as a bunch of other cool stuff that I can't talk about just yet.

My official job title is just "software engineer," which is the most common (and coveted) role at Google. (I say "coveted" because engineers make most of the important decisions.) Unofficially, I'm what we call a "Tech Lead Manager," which means I am responsible both for the technical direction of the team as well as doing the people management stuff. (Some people use the alternate term "Über Tech Lead" but this has one too many umlauts for me.) A TLM is not a very common role at Google: most teams have separate people doing the TL and M jobs. I do both in part because, being based out of Seattle, it doesn't make sense to have my team report to a "regular" manager who would likely be in Mountain View. Besides I'm really happy to do both jobs and enjoy the variety.

There are four main aspects to my job: (1) Defining the technical agenda for the team and making sure we're successful; (2) Writing code of my own; (3) Acting as the main liaison between our team and other groups at Google, and (4) Doing the "people management" for the team in terms of hiring, performance reviews, promotion, and so forth.

Academics will immediately recognize the parallels with being a professor. In an academic research group, the professor defines the technical scope of the group as well as mentors and guides the graduate students. The big difference here is that I don't consider the folks on my team to be my "apprentices" as a professor would with graduate students. Indeed, most people on my team are much better software engineers than I am, and I lean on them heavily to do the really hard work of building solid, reliable software. My job is to shield the engineers on my team from distractions, and support them so they can be successful.

There are of course many differences with academic life. Unlike a professor, I don't have to constantly beg for funding to keep the projects going. I have very few distractions in terms of committees, travel, writing recommendation letters, pointless meetings. Of course, I also don't have to teach. (I loved teaching, but the amount of work it requires to do well is gargantuan.) Most importantly, my team's success is no longer defined through an arbitrary and often broken peer review process, which applies to pretty much everything that matters in the academic world. This is the best part. If we can execute well and deliver products that have impact, we win. It no longer comes down to making three grumpy program committee members happy with the font spacing in your paper submissions. But I digress.

I do spend about 50% of my time writing code. I really need to have a few solid hours each day hacking in order to stay sane. Since I don't have as many coding cycles (and service more interrupts) than other people on my team, I tend to take on the more mundane tasks such as writing MapReduce code to analyze service logs and generate reports on performance. I actually like this kind of work as it means dealing with a huge amount of data and slicing and dicing it in various interesting ways. I also don't need to show off my heroic coding skills in order to get promoted at this point, so I let the folks who are better hackers implement the sexy new features.

I do exert a lot of influence over the direction that our team's software takes, in terms of overall design and architecture. Largely this is because I have more experience thinking about systems design than some of the folks on my team, although it does mean that I need to defer to the people writing the actual code when there are hairy details with which I am unfamiliar. A big part of my job is setting priorities and making the call when we are forced to choose between several unappealing options to solve a particular problem. (It also means I am the one who takes the heat if I make the wrong decision.)

I reckon that the people management aspects of my job are pretty standard in industry: I do the periodic performance reviews for my direct reports, participate in compensation planning, work on hiring new people to the team (both internally and externally), and advocate for my team members when they go up for promotion. Of course I meet with each of my direct reports on a regular basis and help them with setting priorities, clearing obstacles, and career development.

The most varied part of my job is acting as the representative for our team and working with other teams at Google to make amazing things happen. My team is part of the larger Chrome project, but we have connections with many other teams from all over the world doing work across Google's technology stack. I am also frequently called into meetings to figure out how to coordinate my team's work with other things going on around the company. So it never gets boring. Fortunately we are pretty efficient at meetings (half an hour suffices for almost everything) and even with all of this, my meeting load is about half of what it was as an academic. (Besides, these meetings are almost always productive; compared to academic meetings where only about 10% of them have any tangible outcome.)

Despite the heavy load and lots of pokers in the fire, my work at Google is largely a 9-to-5 job. I rarely work on the evenings and weekends, unless there's something I'm really itching to do, and the volume of email I get drops to near-zero when it's outside of working hours. (Although I am on our team's pager rotation and recently spent a few hours in the middle of the night fixing a production bug.) This is a huge relief from the constant pressure to work, work, work that is endemic of professors. I also feel that I get much more done now, in less time, due to fewer distractions and being able to maintain a clear focus. The way I see it is this: If I'm being asked to do more than I can get done in a sane work week, we need to hire more people. Fortunately that is rarely a problem.

Disclaimer: Everything in this post is my personal opinion and does not represent the view of my employer.

Thursday, March 21, 2013

Looking back on 1 million pageviews

This blog just hit one million pageviews:

Seems like a pretty cool milestone to me. I never imagined I'd get so much traffic.

Just for fun, here are the top five most popular posts on this blog so far:

Why I'm Leaving Harvard (99263 pageviews), in which I announce my departure from Harvard to Google. I guess this post became a kind of touchstone for a bunch of people considering an academic career, or those who also made the decision to leave academia. I'm often asked whether I still think I made the right decision after nearly 3 years at Google. The answer is a resounding yes: I'm extremely happy and my team is doing amazing things - some of which you can read about here.

So, you want to go to grad school? (43314 pageviews), in which I try to give an honest assessment of why someone should (or should not) do a PhD in Computer Science. The main thing I try to dispel is this myth that you should "take a year off" and work in industry before going to grad school. Way too many students tell me that they plan to do this, and I think it is a really bad idea if you are serious about doing a PhD.

Day in the life of a Googler (33885 pageviews), which was intended as a tongue-in-cheek look at the difference between a day at Google and a day as a professor. Somehow this got taken seriously by people, and someone sent me a link to a Chinese translation that was getting a lot of hits and comments (in Chinese). My guess is that the intended humor was lost in translation.

How I almost killed Facebook (28367 pageviews), an early post about the time I tried to talk Mark Zuckerberg out of dropping out of Harvard to do a startup. Thankfully he did not listen to me.

Programming != Computer Science (25794 pageviews), a little rant against grad students who seem to mix up writing software with doing research.

Of course, not all of my posts have been widely read. Going back over them, it looks like the ones with the smallest number of hits focus on specific research topics, like my trip report for SenSys 2009 (115 pageviews!) and an announcement for postdoc openings in my group (a whopping 68 pageviews). I guess I should stick to blogging about Mark Zuckerberg instead.

Tuesday, March 19, 2013

Moving my life to the cloud

http://www.flickr.com/photos/clspeace/2250488434/

I'm in the process of moving my (computing) life entirely to the cloud -- no more laptop: just a phone, tablet (which I use rarely) and a Chromebook Pixel. My three-year-old MacBook Pro is about to croak, and it seems like now is the time to migrate everything to the cloud, so I can free myself from having to maintain a bunch of files, music, photos, applications, and backups locally. I'd really like to be in a place where I could throw my laptop out a moving vehicle and not care a bit about what happens to my data. Still, there are some challenges ahead.

The Chromebook Pixel itself is a sweet piece of kit. The keyboard and trackpad are nearly as good as my Mac, and the screen resolution is simply unreal: you CANNOT see the pixels (ironic choice of product name; as if the next version of a Mac would be the "MacBook Virus"). It boots in 10 seconds. Hell, the other day I did a complete OS upgrade (switching from the beta to the dev channel), which took no more than 10 seconds -- including the reboot. The Pixel comes with 1 TB (!) of Google Drive storage, so at this point there's no excuse for not storing all my stuff in the cloud -- this is more space than any laptop I've ever owned.

But you only get to use Chrome!?!? Working at Google, I spend about 70% of my time in Chrome already, so the environment is pretty much exactly what I need. The other 30% of my time is spent ssh'ing into a Linux machine to do software development. The Secure Shell Chrome extension provides a full-on terminal emulator within the browser. I pretty much only use the shell and vim when doing development, so this setup is fine for me.

Since I left academia, I don't have much need for writing papers in LaTeX and doing fancy PowerPoint slides anymore. If those were still major uses of my time, I'd have to find another solution. Google Docs works perfectly well for the kind of writing and presentations I do these days; in fact, the sharing capabilities turn out to be more important than fancy formatting.

What about working offline!?!?! Who the hell ever works offline anymore? I certainly don't. Even on airplanes, the majority of the time I have WiFi. I generally can't get any work done without an Internet connection, so optimizing for "offline" use seems silly to me. If I'm really offline, I'll read a book.

Music? Google Play Music and the Amazon Cloud Player work great. I have a huge music library (some 1,200 albums) which I keep in both places.

Movies and TV shows? It's true that iTunes has the best selection, but what's available on Google Play and Amazon Instant Video is pretty good. I mostly watch movies and TV on my actual TV (crazy, I know) but for "on the road" I think streaming content will work well enough. There's no real offline video playback on the Chromebook as far as I know; for that I can use my Android tablet though. Netflix apparently works fine on the Chromebook, although I unsubscribed from Netflix when they started screwing people over on their pricing.

Of course, it's not all roses. A few pain points, so far:

Migrating my photo library to the cloud was more painful than I had hoped. I have around 70 GB of pictures and videos taken over the years, and wanted to get it onto Google Drive so I'd have direct access to it from the Chromebook. This involved installing the Google Drive Mac app which allowed me to copy everything over, although the upload took a day or so, and it wasn't clear at first if everything was syncing correctly. (I also had to make sure not to sync the photo library on my other machines which had the Drive app installed.)

Managing photos in the cloud still kind of sucks. I'm not happy with any of the cloud-based photo library management solutions that I've found. I have a Flickr Pro account which I use for sharing select pictures with family and friends, but I don't feel comfortable uploading all of my photos to Flickr. I could use Google+, however, it's more focused on sharing rather than large library management. I am not sure what is going on with Picasa these days. Dropbox is another option, which I use for general files, but its photo management is pretty rudimentary as well. For now I'm going to make do with the bare-bones photo support in Google Drive and think about a better way to manage this. What's cool is that I already take all of my photos on my phone which automatically syncs then to both Google Drive and Dropbox, so there's never a need to physically plug the phone in to anything.

Editing plain text files is -- surprisingly -- kind of hard. About the only use I have for plain text files (apart from coding) anymore is writing paper reviews -- I read a PDF in one window; fill in the plain-ASCII review form for HotCRP in the other. There are a couple of Chrome extensions with bare-bones text editors, but it's a far cry from a full-fledged editor. I am experimenting with Neutron Drive, which is a pretty cool editor/IDE Chrome Extension which uses Google Drive in the backend. Maybe I'll have to change my habits and just fill in my reviews in HotCRP directly (see above about not being able to get any work done offline).

Where to keep my really private stuff? By which I mean porn, of course. Or tax returns. Or anything I don't want (or can't) store in any of the cloud services. This article from VentureBeat does a good job at summarizing the policies of the popular cloud storage providers, but the upshot is that all of them have some mechanism to either take down objectionable content or report it to law enforcement.

What I'd really like is to set up a "private cloud", perhaps running a server at home which I could then access (securely) over the web. There are several solutions for private encrypted cloud storage out there (like Arq and Duplicati), but most of them require some form of specialized client (which won't work on ChromeOS any time soon). I guess I could run a WebDAV server or something on a local box or even a machine in the cloud which I could access through the browser. Still, I'm not sure what to do about this yet. It seems insane to me that it's 2013 and we still don't know how to get file sync right.

Disclaimer: Everything in this post is my personal opinion and does not represent the view of my employer.

Wednesday, February 27, 2013

Grad students: Learn how to give a talk

I've been to roughly a hundred academic conferences and listened to thousands of talks, mostly by grad students. Over the years, the quality of the speaking and presentations has not gotten any better -- if anything, it's gotten worse. A typical grad student talk is so horribly bad, and it's surprising how little effort is put into working on presentation and speaking skills, especially given how important this skill is for academics.

Grad students need to learn how to give good, clear, compelling presentations. Especially those who think they want to be professors one day.

It is difficult to overstate how important presentation skills are for academics. This is about much more than "being a good teacher" (which is a nice trait to have, but not actually that important for an academic's career in the long run). There is a huge division between the professors who are influential leaders, and those who are also-rans. In almost all cases that I can think of, the professors who are very successful are also good speakers, and good communicators overall. They can give good, clear, funny talks. They can engage in meaningful conversations at a technical level and at a personal level. They have a strong command of English and can use the language effectively to communicate complex ideas. So I claim that there is a strong correlation between good communication skills and overall research impact.

In some sense, a professor's job is to communicate the research ideas being done in their group. Although grad students often give the conference talks, professors give countless other talks at other universities, companies, workshops, and elsewhere. The professors write the grant proposals, and often the papers (or good chunks of them) as well. Once you're a professor, it matters a lot less how good of a hacker you are -- your job is to be the PR rep.

So it's surprising that grad students generally receive no formal training in presentation skills. A typical grad student might get three or four opportunities to give conference talks during their Ph.D., but this is hardly enough practice to hone their skills. Acting as a TA or giving "practice talks" isn't much help either. I honestly don't know how to fix this problem, short of running a course specifically on giving good presentations, which sounds like a drag -- but might be necessary.

The language barrier is a big part of the problem. Students who do not have English as their first language are almost invariably worse at giving talks than those who are native speakers, and students from Asia tend to be worse than those from Europe. (In academic Computer Science, English is the only language that matters.) But it's more than just command of the language -- it's about being expressive, funny, charismatic. The grad student who stands frozen in place and reads off their slides might speak English perfectly well, but that doesn't make them a good speaker.

It's also true that grad students are often "sized up" at conferences based on their speaking skills. If you can give a good talk at a conference, you'll get the attention of the professors who will be looking at your faculty job application later. Likewise, if your talk sucks, it's going to leave a bad impression (or, at best, you'll be forgettable).

So, please, grad students: If you're serious about pursuing an academic career, hone your presentation skills. This stuff matters more than you know.

Sunday, January 27, 2013

My mobile systems research wish list

Working on mobile systems at Google gives me some insight into what the hard open problems are in this space. Sometimes I am asked by academic researchers what I think these problems are and what they should be working on. I've got a growing list of projects I'd really like to see the academic community try to tackle. This is not to say that Google isn't working on some of these things, but academics have fewer constraints and might be able to come up with some radically new ideas.

Disclaimer: Everything in this post is my personal opinion and does not represent the view of my employer, or anyone else. In particular, sending a grant proposal to Google on any of the following topics will by no means guarantee it will be funded!

First, a few words on what I think academics shouldn't be working on. I help review proposals for Google's Faculty Research Awards program, and (in my opinion) we get too many proposals for things that Google can do (or is already doing) already -- such as energy measurements on mobile phones, tweaks to Android or the Dalvik VM to improve performance or energy efficiency, or building a new mobile app to support some specific domain science goal (such as a medical or environmental study). These aren't very good research proposal topics, in my opinion -- they aren't far-reaching enough, and aren't going to yield a dramatic change five to ten years down the line.

I also see too many academics doing goofy things that make no sense. A common example these days is dusting off the whole peer-to-peer networking area from the late 1990s and trying to apply it in some way to smartphones. Most of these papers start off with the flawed premise that using P2P would help reduce congestion in the cellular network. A similar flawed argument is made for some of the "cloud offload" proposals that I have seen recently. What this fails to take into account is where cellular bandwidth is going: About half is video streaming, and the other half things like Web browsing and photo sharing. None of the proposed applications for smartphone P2P and cloud offload are going to make a dent in this traffic.

So I think it would help academics to understand what the real -- rather than imagined -- problems are in mobile systems. Some of the things on my own wish list are below.

Understanding the interaction between mobile apps and the cellular network. It's well known that cellular networks weren't designed for things like TCP/IP, Web browsing, and YouTube video streaming. And of course most mobile apps have no understanding of how cellular networks operate. I feel that there is a lot of low-hanging fruit in the space of understanding these interactions and tuning protocols and apps to perform better on cellular networks. Ever noticed how a video playback might stall a few seconds in when streaming over 3G? Or that occasionally surfing to a new web page might take a maddening few extra seconds for no apparent reason? Well, there's a lot of complexity there and the dynamics are not well understood.

3G and 4G networks have very different properties from wired networks, or even WiFi, in terms of latency, the impact of packet loss, energy consumption, and overheads for transitioning between different radio states. Transport-layer loss is actually rare in cellular networks, since there are many layers of redundancy and HARQ that attempt to mask loss in lower layers of the network stack. This of course throws TCP's congestion control algorithms for a loop since it typically relies on packet loss to signal congestion. Likewise, the channel bandwidth can vary dramatically over short time windows. (By the way, any study that tries to understand this using simple benchmarks such as bulk downloads is going to get it wrong -- bulk downloads don't look anything like real-world mobile traffic, even video streaming, which is paced above the TCP level.)

The lifetime of a cellular network connection is also fairly complex. Negotiating a dedicated cellular channel can take several seconds, and there are many variables that affect how the cell network decides which state the device should be in (and yes, it's usually up to the network). These parameters are often chosen to balance battery lifetime on the device; signaling overhead in the cell network; user-perceived delays; and overall network capacity. You can't expect to fix this just by hacking the device firmware.

To make things even more hairy, mobile carriers often use different network tuning parameters in different markets, based on what kind of equipment they have deployed and how much (and what kinds) of traffic they see there. So there is no one-size-fits-all solution; you can't just solve the problem for one network on one carrier and assume you're done.

Understanding the impact of mobile handoffs on application performance. This is an extension to the above, but I haven't seen much academic work in this space. Handoffs are a complex beast in cellular networks and nobody really understands what their impact is on what a user experiences, at least for TCP/IP-based apps. (Handoff mechanisms are often more concerned with not dropping voice calls.) Also, with the increased availability of both WiFi and cellular networks, there's a lot to be done to tune when and how handoffs across network types occur. I hate it when I'm trying to get driving directions when leaving my house, only to find that my phone is trying in vain to hang onto a weak WiFi connection that is about to go away. Lots of interesting problems there.

Why doesn't my phone last all day? This is a hot topic right now but I think the research community's approach tends to be to change the mobile app SDK, which feels like a non-starter to me. Unfortunately, the genie is out of the bottle with respect to mobile app development, in the sense that any proposal that suggests we should just get all of the apps to use a new API for saving energy is probably not going to fly. In the battle between providing more power and flexibility to app developers versus constraining the API to make apps more efficient, the developer wins every time. A lot of the problems with apps causing battery drainage are simply bugs -- but app developers are going to continue to have plenty of rope to hang themselves (or their users) with. There needs to be a more fundamental approach to solving the energy management issue in mobile. This can be solved at many layers -- the OS, the virtual machine, the compiler -- and understanding how apps interact with the network would go a long way towards fixing things.

Where is my data and who has access to it? Let's be frank: Many apps turn smartphones into tracking devices, by collecting lots of data on their users: location, network activity, and so forth. Some mobile researchers even (unethically) collect this data for their own research studies. Once this data is "in the cloud", who knows where it's going and who has access to it. Buggy and malicious apps can easily leak sensitive data, and currently there's no good way to keep tabs on what information is being collected, by whom, for what purpose. There's been some great research on this (including the unfortunately-named TaintDroid) but I think there's lots more to be done here -- although we are sadly in an arms race with developers who are always finding new and better ways to track users.

What should a mobile web platform look like 10 years from now? I think that the research community fails to appreciate the degree of complexity and innovation that goes into building a really good, fast web browser. Unfortunately, the intersection between the research and web dev communities is pretty low, and most computer scientists think that JavaScript is a joke. But make no mistake: The browser is basically an operating system in its own right, and is rapidly getting features that will make it possible to do everything that native apps can do (and more). On the other hand, I find the web development community to be pretty short-sighted, and unlikely to come up with really compelling new architectures for the web itself. Hell, the biggest breakthroughs in the web community right now are a sane layout model for CSS and using sockets from JavaScript. In the mobile space, we are stuck in the stone ages in terms of exploiting the web's potential. So I think there is a lot the research community can offer here.

In ten years, the number of mobile web users will outstrip desktop web users by an order of magnitude. So the web is going to be primarily a mobile platform, which suggests a bunch of new trends: ubiquitous geolocation; users carrying (and interacting with) several devices at a time; voice input replacing typing; using the camera and sensors as first-class input methods; enough compute power in your pocket to do things like real-time speech translation and machine learning to predict what you will do next. I think we take a too-narrow view of what "the web" is, and we still talk about silly things like "pages" and "links" when in reality the web is a full application development platform with some amazing features. We should be thinking now about how it will evolve over the next decade.

Tuesday, January 22, 2013

The ethics of mobile data collection

The mobile computing and networking research communities need to start paying closer attention to the data collection practices of researchers in our field. Now that it's easy to write mobile apps that collect data from real users, I'm going to argue that computer science publication venues should start requiring authors to document whether they have IRB approval for studies involving human subjects, and how the study participants were consented. This documentation requirement is standard in the medical and social science communities, and it makes sense for computer science conferences and journals to do the same. Otherwise I fear we run the risk of accepting papers that have collected data unethically, hence rewarding researchers for not adequately protecting the privacy of the study participants.

I am often asked to review papers in which the authors have deployed a mobile phone app that collects data about the app's users. In some cases, these apps are overtly used for data collection and the users of the app are told how this data will be collected and used. But I have read a number of papers in which data collection has been embedded into apps that have some other purpose -- such as games or photo sharing. The goal, of course, is to get a lot of people to install the app, which is great for getting lots of "real world" data for a research paper. In some cases, I have downloaded the app in question and installed it, only to discover that the app never informs the user that it is collecting sensitive data in the background.

The problem is, such practices are unethical (and possibly illegal) according to federal requirements for protecting the privacy for human subjects in a research study. Even if there is some fine print in the app the use of data for a research study, it's not clear to me that in all cases the researchers have actually gone through the federally-mandated Institutional Review Board approval process to collect this data.

Unfortunately, not many computer scientists seem to be familiar with the IRB approval requirement for studies involving human subjects. Our field is pretty lax about this, but I think it's time we started taking human subjects approval more seriously.

It is now dead simple to develop mobile apps that collect all kinds of data about their users. On the Android platform, an app can collect data such as the device's GPS location; which other apps are running and how much network traffic they use; what type of wireless network the device is using; the device manufacturer, model, and OS version; which cellular carrier the device uses; the device's battery level; and the current cell tower ID. Similar provisions exist on iOS and other mobile operating systems. With rooted devices, it's possible to collect even more information, such as a complete network packet trace and complete information on which websites and apps have been used.

Put together, this data can yield a rich picture of the usage patterns, mobility, and network performance experienced by a mobile user. It is very tempting for researchers to exploit this capability, and it's easy to get thousands of people to install your app by releasing it on Google Play or the Apple App Store. However, I have very little confidence that most researchers are adhering to legal and ethical guidelines for collecting such data -- I bet the typical scenario is that the data ends up being logged to an unsecured computer under some grad student's desk.

So, what is an IRB? In the US and many other countries, any institution that receives federal funding must ensure that research studies involving human subjects protect the rights and privacy of the participants in such studies. This is accomplished through Institutional Review Board review which much occur prior to the study taking place. The purpose of the IRB is to ensure that the study meets certain guidelines for protecting the privacy of the study participants. The Stanford IRB Website has some good background about the purpose of IRB approval and what the process is like. The principles underpinning IRB review were set forth in the Declaration of Helsinki, which has been the basis for many countries' laws regarding protection of human subjects.

Failing to get IRB approval for a research study is serious business. In the medical and social science communities, failing to get IRB approval is tantamount to faking data or plagiarism. The Retraction Watch blog has a long list of cases in which published articles have been retracted due to lack of IRB approval. In those fields, this kind of forced retraction can destroy an academic's career.

Documenting IRB approval and informed consent for study participants is becoming standard practice in the medical and social science communities. For example, the submission guidelines to the Annals of Internal Medicine require an explicit statement from authors regarding IRB approval:

"The authors must confirm review of the study by the appropriate institutional review board or affirm that the protocol is consistent with the principles of the Declaration of Helsinki (see World Medical Association). If the authors did not obtain institutional review board approval before the start of the study, they should so state and explain the circumstances. If the study was exempt from review, the authors must state that such exemption complied with the policy of their local institutional review board. They should affirm that study participants gave their informed consent or state than an institutional review board approved conduct of the research without explicit consent from the participants. If patients are identifiable from illustrations, photographs, pedigrees, case reports, or other study data, the authors must submit the release form for each such individual (or copies of the figures with the appropriate release statement) giving permission for publication with the manuscript. Consult the Research section of the American College of Physicians Ethics Manual for further information."

But yet, in computer science, we tend not to take this process very seriously. I suspect most computer scientists have never heard of, or dealt with, their institution's IRB. I was surprised to see that CHI, the top conference in the area of human-computer interaction (in which user studies are commonplace), says nothing in its call for papers about requiring IRB approval disclosure for human subjects studies -- perhaps the practice of obtaining IRB approval is already widespread in that community, though I doubt it.

Why do I think we should require authors to document IRB approval? For two reasons. First, to raise awareness of this issue and ensure that authors are aware of their obligations before they submit a paper to such venues. Second, to prevent paper reviewers from having to make a judgment call when a paper is unclear on whether and how a study protects its participants. The whole point of an IRB is to front-load the approval process before the research study even begins, well before a paper gets submitted. The nature of a research project may well change depending on the IRB's requirements for protecting user privacy.

To give an example of how this can be done properly, colleagues of mine at University of Michigan and University of Washington are developing a mobile app for collecting network performance data, called MobiPerf. The PIs have IRB approval for this study and the app clearly informs the users that the data will be collected for a research study when the app first starts; clicking "No thanks" immediately exits the app. Furthermore, there is a fairly detailed privacy statement and EULA on the app's website, explaining exactly what data is collected. It's true that going through these steps required more effort on the part of the researchers, but it's not just a good idea -- it's the law.

This is my personal blog. The views expressed here are mine alone and not those of my employer.

Thursday, January 3, 2013

How to get a faculty job, Part 3: Negotiating the offer

This is the third (actually fourth) part in this series on how to get a faculty job in Computer Science. Part 1 and Part 1b dealt with the application process, and Part 2 was about interviewing. In this post, I'll talk about what happens when you get a job offer and how to negotiate when you have multiple offers.

There is often a long and painful wait from the time you complete the interview until you hear back from the school about whether they will be making you an offer. This is generally because all (or most) of the candidates need to complete interviews before the final hiring decisions are made, and the actual offer needs to be approved by the department or school administration before the candidate can be given the good news. Depending on how early you interview, this wait can be on the order of a month or two. (Generally, candidates interview between February and April, and offers start getting made around April or May.) Sometimes a school won't contact you at all after the interview, and after a while you figure you're not getting an offer after all. Sometimes they contact you fairly quickly to deliver the coup de grâce, which is greatly appreciated since then you can at least stop holding out hope.

As I pointed out in the previous post on interviewing, it is a very good idea to keep in touch with schools you are really interested in and let them know where you are in the process, and especially if you have offers from other schools. Usually this can be done via informal email to your host when you interviewed. The last thing a department wants is for their top candidate to take a job elsewhere before they have a chance to make an offer. So let people know what's happening and try to find out how your top choices are doing in terms of making offers.

There are three kinds of offers: (1) Straight-up offers; (2) "Offers for offers", and (3) Second-choice offers. I'll explain each below.

Straight-up offers

The best possible outcome is that you get a call from your host or the hiring committee chair who says, "I'm happy to let you know that we're going to be making you an offer." At this stage, you probably will not get into any of the details about salary, research funding, and the like -- that comes later.

Most of the time, departments will offer to fly you out for a second visit, sometimes with your spouse or significant other, so you can spend time getting to know the department, university, and town. This is much more relaxed than the interview, and is a great way to get to know your potential future colleagues under less stressful conditions. A second visit can be very important for deciding where to kick off your career as a faculty member: you will learn many things that you might not have had time to get into when you interviewed. In particular, you are going to care much more about things like housing, schools for your kids, quality of life, and other factors that you didn't get a chance to judge during the interview. Definitely do a second visit if you are serious about a school.

Offers for offers

The dilemma faced by many departments is that they have several really good candidates but only one (or maybe two) open positions. If a department blindly makes an offer to its top candidate, but that person is not that serious about taking the job there, then their second- or third-choice candidates (who might be just as good!) might end up taking offers elsewhere while the first candidate sits on the offer in the hopes of using it as a point of negotiation with another school. Also keep in mind that schools generally cannot have multiple outstanding offers for a single position.

So, sometimes a department won't make an outright job offer, but will instead feel you out to find out if you're really serious about taking a job there, a so-called "offer for an offer". The idea is that the department can (and will!) make a formal offer, but only after determining that you really want it.

From a purely selfish perspective, it might seem that your best strategy is to amass as many offers as you can so you have the most leverage when negotiating salary and other aspects of the compensation. But this also puts the department in a real bind if you end up sitting on the offer without any real intention of taking it. I don't think pissing a bunch of people off (even at a place where you don't take a job) is a good strategy for anyone trying to jumpstart an academic career.

Some schools do ridiculous things like exploding offers, which expire after a set time, to avoid the situation where someone sits on an offer for too long. Given that schools are rarely well-synchronized in their recruiting schedules, this can be disastrous: Say you get an offer that explodes after two weeks, but you haven't finished interviewing yet and still haven't heard from most of the schools. The last thing you want is to be forced into accepting a job at a school because the offer was going to time out. By no means should you be forced to make a decision on taking a faculty job before you have had a chance to evaluate all of your options. Personally, I think schools that do this are being idiotic and should think seriously about what kind of people they are going to be successful recruiting though such tactics.

I once heard a case of a hiring committee which couldn't make up its mind, so they called their top five candidates and said, "We have two offers available, the first two people who call us to claim the offer will get one, but it will explode in two weeks." I think this kind of strategy is a complete load of crap, and the hiring committee should be ashamed of itself for not being able to commit to their top one or two candidates and ride it through. But I digress.

Second-choice offers

It is often the case that you aren't the school's top choice, but you are their second (or third) choice for the position. Sometimes a school will tell you this outright: That they would love to make you an offer, assuming that their first-choice candidate declines them. This can sting, of course, and I question the wisdom of telling candidates this much information. Most people don't want to take a job somewhere where they feel as though they were the consolation prize. Sometimes, you find out through the grapevine that someone else already has an offer from that school, but later on you get a call with an offer of your own (and it just so happens that the other candidate recently accepted a job elsewhere). At some point you have to swallow your pride and appreciate that in a few months, nobody will remember (or care) that you weren't the first choice, and you got an awesome job at a good school, and that's all that matters. The point is that an offer's an offer, so don't worry too much if you weren't the department's original top choice.

From sitting on the faculty hiring committee at Harvard, I can vouch for how hard it can be for a school to narrow its choices to one or two people in a field of really good candidates. Often the choice of who to make the first offer to is arbitrary, based on some general vibe that you think the person might be more or less inclined to accept the job. A department might have two or three candidates who are all more or less equal but they have to make a first choice somehow.

What's in an offer?

In most cases, the initial job offer is verbal and you won't get a formal, written job offer until much later, based on extensive discussions with the dean or department chair about what you expect the offer letter to say. There are several components to most faculty job offers that should be (eventually) spelled out in writing:

The salary (of course). Usually salary is paid for 9 months of the academic year, with the expectation that you will pay the other 3 months out of a research grant. So if the offer is $100k for 9 months, that's really a 12-month salary of $133k.
Summer salary support. Since most junior faculty come in with no research grants, usually a department will offer to pay one or two summers' worth of your salary until you get grants of your own.
Teaching relief. At many schools, incoming junior faculty are given a semester of teaching relief which they can take at some point in the first couple of years. This gives you a little more free time to kick start your research and lessens the load of transitioning into the new job. My strong recommendation is to wait until your second or third term before taking teaching relief: Teaching a course (especially a graduate seminar) your first term on the job is a great way of recruiting students to your research group, and you're so screwed anyway the first semester as a new faculty member that teaching relief is hardly beneficial until you get your research group up to speed.
Graduate student support. Many schools will provide funding to support one or two grad students for a couple of years, to help seed your research group. Of course, you still have to identify and recruit the students (a topic for a future blog post). Keep in mind that grad students aren't cheap. In addition to their paltry salary, the student's tuition and fringe benefits need to be paid for. Typically a PhD student will cost around $75K year all in, so support for a couple of students is a lot of money.
Research support. This can take many forms depending on the school, but generally this is money (in some form) to help you get your research going in lieu of any grants. The best form of this is an outright slush fund which you can use to pay for anything related to your research: computers, equipment, students, summer salary, travel, conference registrations, pizza parties for the team, you name it. At Harvard, my "startup package" was in the six figures, but this is unusual; I think that most schools do something in the $20K range, sometimes less. (If the school is offering to pay for students or summer salary separately, you have to factor this in as well.) In many cases, a department will separately offer you some amount of equipment (such as a fund to buy a computers and laptop) in addition to, or in lieu of, a general slush fund. It depends very much on how the school manages its finances and chooses to account for things. Some schools without deep pockets may only offer you a hand-me-down workstation and a few hundred bucks to offset the cost of a laptop. It varies a lot.
Lab space. I don't know how common it is for a job offer to include an explicit provision for lab space (that is, not including your own office). In many departments, grad student space is a shared resource and there is not usually a need for dedicated labs for specific faculty. However, depending on the nature of your research, you might need specialized lab space -- for example, if you are developing a swarm of quad-copters you probably need some dedicated space for that.
Other perks. It is common for the department to pay for (or offset) your moving expenses, especially if you are moving from far away. An offer also might include things like temporary housing when you first move. Again, this varies a lot.

How to negotiate

Okay, so let's assume you're lucky enough to have a couple of faculty job offers in hand. What do you need to keep in mind?

First things first. Only negotiate with schools you are really serious about. It is a waste of everyone's time (and patience) if you feign excitement about a school just to get them to bump up your offer and use that as leverage against another school. People will know if you are bullshitting them. And keep in mind that even if you don't take a job somewhere, those people you run the risk of pissing off will continue to be important academic colleagues. One day they might be called upon to write tenure review letters for you. The point is you want to avoid making enemies.

Secondly, you can't compare industry and academic offers. At all. Compensation from industry is going to be much higher (especially over time) than any academic offer, when you factor in salary, bonuses, stock options, and the steeper increase year over year compared to a university job. So you can't expect to use an an industry offer as leverage to negotiate higher compensation at a university.

At many universities, the salary is non-negotiable as it is based on a standard scale that (in most cases) can't be changed. You might be able to negotiate a small salary increase if another school is offering much more, but this seems unlikely to me. Keep in mind that the range of starting salaries for junior faculty across different schools (at least among top-ranked research institutions) is pretty tight, so there's not much wiggle room there anyway. You can ask but don't be surprised if you're told that the salary is fixed.

If you can, try to get your startup package to be all or mostly cash. By "cash" I mean funding that can be used to pay for anything: students, equipment, travel, whatever. If your startup is segmented into X dollars for students, Y dollars for equipment, and so forth, that can constrain you down the line, if, for example, you end up wanting to hire more students than you expected or don't need as much travel funding. Fungibility is good.

It's a good idea to have a rough idea of how much you need to get started before you start talking hard numbers. When I did my faculty job search, I had in mind a research agenda involving building out an experimental workstation cluster as well as some other equipment needs, travel to several conferences in my first couple of years, and support for two students. I made up a quick and dirty spreadsheet to estimate how much all of this would cost and used that as the starting point for talking about the size of the startup package. If you have no idea how much you expect to spend -- and what you might spend it on -- you will have a hard time making a convincing case that you need more than what's being offered.

If you have a two-body problem (which is probably deserving of its own blog post), find out what, if anything, the university can do to help your partner land a job in the area. You may be surprised. When I was on the job market, my wife was finishing up medical school and we were going to make a decision about where to go in large part based on whether she would be able to get a good residency position. Although nobody could guarantee my wife a residency slot, the schools that were recruiting me helped set up meetings with a bunch of people to learn more about the programs in each area so we got a good sense of what her options were like. It is also not uncommon for universities to facilitate positions for spouses and partners of faculty they are trying to recruit -- many things are possible.

If you have kids, you should by all means try to negotiate for a spot in the university's day care center. The waiting lists for day care can be years long, but special exceptions can often be made when a school is trying to recruit a new faculty member. This is not always possible but it's worth asking about.

Finally, don't be greedy. This is not about maximizing your compensation and startup package and pissing everyone off in the process. Your goal in negotiating the offer is not to squeeze every penny you can out of them -- instead, it's to reach a point where you feel confident that the compensation and startup package will allow you to be happy and successful in your new job.

So which offer should you take?

Although I'm sure it happens, I would hope that nobody would take a faculty job just because it paid the most or had the largest startup package. If your only goal in life is to maximize your compensation, trust me: You do not want to be a professor. There are many, many other factors that are more important than the size of the offer: The culture and quality of the department, the students, the physical location, the quality of life ... the list goes on and on. In steady state, you're going to be a (relatively) poor academic, and struggling to get research grants just like everyone else. The initial salary and startup package can give you a boost, but it mostly comes out in the wash -- the absolute numbers won't matter much beyond the first year or so. So focus on finding the job that will make you happiest, not just that which pays the most.