Tuesday, February 7, 2012

My love affair with code reviews

One of the most life-altering events in my move from academia to industry was the discovery of code reviews. This is pretty standard fare for developers in the "real world", but I have never heard of an academic research group using them, and had never done code reviews myself before joining Google.

In short: Code reviews are awesome. Everyone should use them. Heck, my dog should use them. You should too.

For those of you not in the academic research community, you have to understand that academics are terrible programmers. (I count myself among this group.) Academics write sloppy code, with no unit tests, no style guidelines, and no documentation. Code is slapped together by grad students, generally under pressure of a paper deadline, mainly to get some graphs to look pretty without regard for whether anyone is ever going to run the code ever again. Before I came to Google, that was what "programming" meant to me: kind of a necessary side effect of doing research, but the result was hardly anything I would be proud to show my mother. (Or my dog, for that matter.) Oh, sure, I released some open source code as an academic, but now I shudder to think of anyone at a place like Google or Microsoft or Facebook actually reading that code (please don't, I'm begging you).

Then I came to Google. Lesson #1: You don't check anything in until it has been reviewed by someone else. This took some getting used to. Even an innocent four-line change to some "throw away" Python script is subject to scrutiny. And of course, most of the people reviewing my code were young enough to be my students -- having considered myself to be an "expert programmer" (ha!), it is a humbling experience for a 23-year-old one year out of college to show you how to take your 40 lines of crap and turn them into one beautiful, tight function -- and how to generalize it and make it testable and document the damn thing for chrissakes.

So there's a bunch of reasons to love code reviews:

Maintain standards. This is pretty obvious but matters tremendously. The way I think of it, imagine you get hit by a truck one day, and 100 years from now somebody who has never heard of your code gets paged at 3 a.m. because something you wrote was suddenly raising exceptions. Not only does your code have to work, but it also needs to make sense. Code reviews force you to write code that fits together, that adheres to the style guide, that is testable.

Catch bugs before you check in. God, I can't count the number of times someone has pointed out an obvious (or extremely subtle) bug in my code during the code review process. Having another pair of eyes (or often several pairs of eyes) looking at your code is the best way to catch flaws early.

Learn from your peers. I have learned more programming techniques and tricks from doing code reviews than I ever did reading O'Reilly books or even other people's code. A couple of guys on my team are friggin' coding ninjas and suggest all kinds of ways of improving my clunky excuse for software. You learn better design patterns, better approaches for testing, better algorithms by getting direct feedback on your code from other developers.

Stay on top of what's going on. Doing code reviews for other people is the best way to understand what's happening in complex codebase. You get exposed to a lot of different code, different approaches for solving problems, and can chart the evolution of the software over time -- a very different experience than just reading the final product.

I think academic research groups would gain a lot by using code reviews, and of course the things that go with them: good coding practices, a consistent style guide, insistence on unit tests. I'll admit that code quality matters less in a research setting, but it is probably worth the investment to use some kind of process.


The thing to keep in mind is that there is a social aspect to code reviews as well. At Google, you need an LGTM from another developer before you're allowed to submit a patch. It also takes a lot of time to do a good code review, so it's standard practice to break large changes into smaller, more review-friendly pieces. And of course the expectation is you've done your due diligence by testing your code thoroughly before sending it for review.

Don't code reviews slow you down? Somewhat. But if you think of code development as a pipeline, with multiple code reviews in the flight at a time you can still sustain a high issue rate, even if each individual patch has higher latency. Generally developers all understand that being a hardass on you during the review process will come back to bite them some day -- and they understand the tradeoff between the need to move quickly and the need to do things right. I think code reviews can also serve to build stronger teams, since everyone is responsible for doing reviews and ensuring the quality of the shared codebase. So if done right, it's worth it.


Okay, Matt. I'm convinced. How can I too join the code review bandwagon? Glad you asked. The tool we use internally at Google was developed by none other than Guido van Rossum, who has graciously released a similar system called Rietveld as open source. Basically, you install Rietveld on AppEngine, and each developer uses a little Python script to upload patches for review. Reviews are done on the website, and when the review is complete, the developer can submit the patch. Rietveld doesn't care which source control system you use, or where the repository is located -- it just deals with patches. It's pretty slick and I've used it for a couple of projects with success.

Another popular approach is to use GitHub's "pull request" and commenting platform as a code review mechanism. Individual developers clone a master repository, and submit pull requests to the owner of that repository for inclusion. GitHub has a nice commenting system allowing for code reviews to be used with pull requests.

I was floored the other day when I met an engineer from a fairly well-known Internet site who said they didn't use code reviews internally -- and complained about how messy the code was and how poorly designed some pieces were. No kidding! Code reviews aren't the ultimate solution to a broken design process, but they are an incredibly useful tool.