Thursday, June 21, 2012

Google's Hybrid Approach to Research

This month's Communications of the ACM features an article on Google's Hybrid Approach to Research by Alfred Spector, Peter Norvig, and Slav Petrov. Since this is a topic I've blogged about here before, I thought I'd provide a quick pointer to the article:

http://cacm.acm.org/magazines/2012/7/151226-googles-hybrid-approach-to-research/fulltext

Overall I think the article does a nice job of summarizing Google's approach. The key takeaway is that Google doesn't separate its research and engineering activities: most "research" at Google happens during the day-to-day work of building products.

The benefit of this model is that it's easy to have real world impact, and the pace of innovation is fairly rapid, meaning research results get translated into products quickly. The possible downside is that you don't always get a chance to fork off  long-term (multi-year) projects that will take a long time to translate into a product. However, there are exceptions to this rule -- things like Google Glass, for example -- and plenty of things I can't talk about publicly. It is true that Google tends not to do "pure academic" research just for the purpose of publishing papers. We could have a healthy debate about whether this is good or bad, but I'll leave that for the comments...

19 comments:

  1. The link to Google Glass appears to be broken. Visible only from inside Google+ with the proper permissions perhaps?

    ReplyDelete
  2. @Shrutarshi No, he is just has an extra "sandbox" in the URL.

    ReplyDelete
  3. Whoops, thanks for pointing that out. This happens because when I use Google+ while logged into my corporate Google account, the URL changes has the extra "sandbox" thing in it. Does it work now?

    ReplyDelete
  4. "Google tends not to do "pure academic" research just for the purpose of publishing papers"

    Is that also true of Research Scientists at Google?

    ReplyDelete
    Replies
    1. Yes, I think so. There may be exceptions, but it's my understanding that at Google, research done primarily for the purpose for publishing papers is somewhat unusual.

      Delete
    2. I'm interpreting 'research for the purpose of publishing papers' not to mean publication for its own sake, but research which is not product-related. In this case, I would say that it's unusual, but not non-existent. It varies from area to area, and I think is more common in algorithms/theory. We definitely do some research that is mathematically/theoretically interesting, but may never make it into a product. I would say that theory people at Google tend to write 3 kinds of papers:
      1. Product-driven and likely to make an impact on a product.
      2. A mathematical formalization of a product-inspired problem, but with a primarily theoretical solution that is unlikely to make it into development. (For example, we might prove a worst-case bound, while knowing that our actual inputs are far from worst-case.)
      3. Problems without real applications (though people might claim some application as motivation in a paper introduction ).

      Category 3 is a small (but non-zero) fraction of the papers we publish. Category 2 is a large fraction of what we do. It's arguable whether one can call these papers product-related, and it probably depends on the level of abstraction. The time we spend working on these proofs is really time spent on research for the purpose of publishing, because it's unlikely to make much of a product impact. I would say that like academics, we write these papers primarily for the purpose of making a contribution to the research community, perhaps through the introduction of new ideas and techniques.

      Delete
  5. From my two summer experiences as an intern working with Research Scientists in Jay Yagnik's computer vision team, there was more focus on doing new things in the visual domain using the existing infrastructure than getting papers for the sake of getting a +1 on your CV. Papers were nice bonuses, but if getting a paper meant working on small datasets and using Matlab, then it was discouraged. This does not mean that nobody tried to get papers, but that was not a priority as it appears to be when interning as a PhD student at MSR. A few students spent their summers porting over some open-sourced academic-quality Matlab code they had already written for a paper.

    The infrastructure at Google is so great and so massive, you there's just so much to learn from Google as an intern that it is better thought of as taking a super advanced graduate-level course on system building a summer of paper writing with new people. I maybe even learned more from my two summers at Google than I did from all the classes I took at CMU as a PhD student. I always tell fellow PhD students, go to Google to learn how to build in the real world, the paper-writing world will not disintegrate when you come back in September. If you love good code, Google will love you back.

    ReplyDelete
    Replies
    1. Tomasz - that's a GREAT way of putting it. One of the tensions here is that many PhD students feel the pressure to get a publication out of their internship. I think the experience you gain working on large data in the real world is more than offset by the fact that you might not get a paper out of it...

      Delete
  6. Honestly there is a lot of self-serving aggrandizement in this "hybrid research" model. Having worked at two leading Internet companies in the last decade, I can definitely say there are several companies that do "hybrid research", except they are probably so unsophisticated that they call it "engineering".

    ReplyDelete
    Replies
    1. Anon - I won't disagree with that. In fact, here at Google, we just call it "Engineering" as well. This article was done to clear up a lot of confusion that we see about what Google's research model is, since we don't have a pure research lab like Microsoft, IBM, AT&T, etc. to point to.

      Delete
    2. Sure, that's reasonable. I also agree that the statement "Google does what many leading edge tech companies are doing" is probably not as appealing, even if it is closer to the truth than touting a hybrid research model.

      To the extent that Google is doing something different (and I don't know that it is or isn't), I think the unique thing to highlight would not be the product focus, but the interest in more far reaching initiatives not necessarily directly tied to tactical product interests (like the driverless car), where there are likely to be deep and difficult problems that need to be solved, not the (comparatively small) integration and domain specific design improvements, combined with impressive at-scale implementations that constitute "hybrid research" in Google [as in many companies have projects of comparable complexity as MapReduce, BigTable, GFS etc., please note that I am not knocking either the quality or the impressive effort involved in making these robust systems, but it seems grandiose to call this research. It is serious engineering, which by itself should be attractive to smart systems folks, I would think.]

      The problem, I suppose, is that efforts like the car occupy a minuscule fraction of effort and people compared to the bread and butter of the company - which is as it should be - and so to the extent that Google wants to maintain research cachet, it needs to spin this as part of "hybrid research".

      I wonder if FB's aura will fade with its troubled IPO, and we will see them having to do things like this to attract top systems folks :)

      [I have not worked either at Google or FB, nor at their competitors, so I don't really have an ax to grind in that regard.]

      Delete
  7. I completely agree the point that research idea should come from daily engineer effort. That is the reason I don't quiet believe the "start-up" type of research activity in academic could work.

    Give a simple example in wireless: You observed a high packet loss rate, then what? that's typically long discussed issue in wireless field, but the academic research will focus on its own field, either RF, or MAC, or application setup?

    In daily engineering effort, I observed this could be complex mixed reasons:

    1. RF performance issue of packet capture. antenna issue.
    2. MAC layer CSMA performance shortage, especially when dealing with periodical wakeup/sleep period cycle.(undetermined CSMA backoff and determined wake up cycle).
    3. OS layer none preemptive task execution delay mismatches the .
    4. application level wrong setup of TCP retransmission scheme.

    I have never had chance to get into this much more exposed problem solving in academic. Although this is not a research project, but somehow even this long discussed and believed solved problem has much more than our thoughts before.

    ReplyDelete
  8. It is not the style of research, but the intent of research which makes the big difference. Do you want to do research to solve next problem(could be big or small), or do you want to do research to get sth published?

    Get shit done could be trivial, ugly, simple...etc, but that leads you to potential big impact research.

    Academic research dose not like trivial, ugly, simple...etc solution.So you need to compromise your real talent to the most toxic claim and thinking style in academic area:"in order to solve/simplify ..., these assumptions are made"

    most time "assumptions" are real problem, unfortunately we don't realize or want to admit it.

    ReplyDelete
  9. Is there a typo?

    "However, there exceptions to this rule"...

    ReplyDelete
    Replies
    1. Maybe I'm wrong, but isn't it supposed to be:

      "there are exceptions to this rule"...?

      Sorry for being picky...

      Delete
    2. Hah! I never even noticed it even after you pointed it out :-) Thanks - I'll fix it.

      Delete
  10. I read the Google article and now I am confused. The article challenges two of my core beliefs about research and mostly research in CS in general.

    i) Something can be considered good research merely on the basis of good implementations / usable applications even if it does not pose a fundamental research question? In short while reading a research paper critically we will no longer be ask the question - "Hmm, what is the big research question/idea the author is asking/answering here".

    2) Research without apps is not meaningful or is of lesser quality. With this I think a lot of disciplines inside CS will be in trouble. E.g. consider the case of MANET. There have been so many outstanding ideas proposed in MANET literature pertaining to specific sub-areas such as - routing/content distribution, but if we take a closer look at realistic MANET applications, the number if very much closer to zero. Does that necessarily mean that the scholarship was not of outstanding quality?

    Sorting gazillion data in nano-seconds by caching or building massive scale server clusters but using some well known variant of quicksort seems more like an impressive feat of engineering to me, not research. IMHO.

    ReplyDelete
    Replies
    1. Anon - The article talks about *Google's* approach to research, not research in general. Research spans a broad spectrum from very practical to purely theoretical. It's not saying that this is how everyone should do research - just that this is the balance that Google has struck within its own organization. This would not be, for example, a very good model for a university to follow, since (a) universities don't have access to this much data, and (b) universities aren't building products (although see my other recent blog post for some thoughts on that).

      Re: "research" vs. "engineering" - MapReduce is just "engineering", right? Except that work - as crude and simplistic as it might sound to a non-systems person - has spawned a tremendous amount of good academic research into more efficient mechanisms for processing large volumes of data in parallel (look at all of the papers coming out about variants of Hadoop, etc.) MapReduce was not started as a "research" project with an end goal to publish a paper - but its impact on the research community has been tremendous. This is a good example of Google's approach.

      Delete