Tuesday, June 7, 2016

Death by peer review

I recently had the occasion to give some advice a friend who was considering making the switch from industry to academia. One of my key pieces of advice was to keep in mind that success (or failure) in academia is largely based on peer review -- by program committees, proposal review panels, tenure committees. While peer review has many good things going for it, it can also be extremely, dishearteningly random. Being an academic means living your life one peer-review decision to the next, and in many cases, those decisions are simply not the right ones. After a while, a string of semi-random decisions can be psychologically draining.

From http://www.michaeleisen.org/blog/?p=1778. Yes, I own the t-shirt.
The law of large numbers certainly applies here. Good work eventually gets published and funded, given enough iterations. Good researchers get their papers in, eventually. Peer review feedback can be incredibly helpful for refining a piece of work and improving it over time. But in the vast majority of cases, papers or proposals, whether accepted or rejected in the end, get a wide range of scores -- it is quite rare for even a good paper to get all "accept" reviews. Whether a paper gets accepted often depends on who the reviewers are, whether they've had enough coffee in the PC meeting, whether they are confident enough to stand up for the work, and so forth. Above a certain threshold, the objective merit of the work has little to do with the outcome.

This situation can get really bad. NSF proposal reviews are, historically, quite random, in part because NSF's conflict-of-interest policy prevents anyone who might actually be an expert in the area from reviewing your work (unless they forgot to submit their own proposal). I have submitted substantially the same NSF proposal multiple times and had vastly different scores. My best proposal never got funded; my worst proposal actually did.

To be clear, we also use peer review at Google, in particular for things like promotions. I've served on quite a few promotions committees, and I can tell you it can be just as random as, say, a conference program committee. Get four people into a room and they will have four different opinions about a given candidate. So I don't think this problem is specific to the academic peer review process.

But I want to contrast the peer-review process with the "industry process." At least at Google, I feel that hard work is generally rewarded if it has impact and leads to good products. My expectation is that the same is true at most companies. Rather than the success or failure of a project coming down to the dreaded Reviewer #3, it comes down to the team's ability to execute, targeting the right market, and attracting users.

Of course, many of these factors are just as out of the control of an engineer on the team as the capriciousness of a program committee. However, I believe the industry process is far less arbitrary. Yes, projects can (and do) get canceled by higher-level management. I've personally canceled projects on my teams, for a range of reasons. But the reasons for project cancellation are, in most cases, made after careful deliberation and by people who have a vested interest in the team. Even if the decision ends up being wrong, at least it's a decision that makes sense -- not the crapshoot you face every time you submit a paper or proposal to a random committee.

Do companies make bad decisions? Absolutely. Are decisions made that I personally disagree with? Of course. Do I have to work for a company that continually makes bad decisions that I don't agree with? Hell no. I'd much rather face my chances against a principled leadership organization that I trust and agree with than an ad hoc collection of anonymous reviewers.

While I have many thoughts on how the process of peer review could be improved, I don't argue that we should dispense with it entirely. I don't know of a better model for most things that academics need to be evaluated on. But aspiring academics should how much of your success hinges on the purely stochastic nature of the process. Industry is still a game, but it's a different kind of game, and one that I think tends to be more rational.


  1. Matt, you wrote, "But aspiring academics should [know] how much of your success hinges on the purely stochastic nature of the process." Just curious, what is your definition of academic success (and failure)? And what is your definition of non-academic, professional success (and failure)? It might be useful to define these terms before debating about the extent to which "the process" affects the outcomes.

    1. Isn't it pretty obvious, at least for the academic case?

      I'm sure it is to the audience Matt intended for this particular post.

      In particular, pretty much everybody who wants an academic career wants academic freedom. Unfortunately the only way to have this on a long-term basis is to get tenure (*) and in order to have freedom to take on projects bigger than a single person can handle you need both that and research grants. Grants are made based on direct peer review and tenure decisions are (purportedly) based on publication record which is an aggregation of many peer review decisions. Hence the article.

      Again, everything in the preceding paragraph should be pretty obvious and isn't nearly as clever as that t-shirt, which I am now lusting after.

      (*) I'd argue that externally-fellowshipped grad students actually have more freedom than any professor does -- but only for a few years.

  2. NIPS experiment endorses everything you said.

  3. > Rather than the success or failure of a project coming down to the dreaded Reviewer #3, it comes down to the team's ability to execute, targeting the right market, and attracting users.

    This is broadly true in the industry, especially at the macro scale, as business survival depends on it.

    With that said "Reviewer #3" does still make an appearance in companies (especially with siloed orgs), they're just called "Irritating VP of X" or "Obstructionist Manager #555".

    The military has a lovely term for these people: "Blue Falcons". The etymology of the term is left an an exercise for the reader.

  4. Matt --

    There are many things I agree with in this post. Peer review does appear to be a semi-random process, where the bias may be in the right direction (we both appear to think that it is), so over longer time scales the law of large numbers applies, but over shorter time scales the results can be frustrating. (And unfortunately for many graduate students or even young faculty, the short time scale is rather important.)

    Where I think I'd disagree is where you say: "However, I believe the industry process is far less arbitrary." And I'm afraid I don't think you really back up that statement with any actual evidence. (As a counterpoint, people at Microsoft Research Silicon Valley found Microsoft's decision to close their lab and fire almost all the people in it rather arbitrary.)

    You may be experiencing your own personal selection bias. Google appears to be (for those of us on the outside) a well-run company that has enjoyed a long period of success. But that may not be a permanent condition, and of course many companies have far less pleasant circumstances.

    So I'd happily agree that various aspects of peer review are frustrating, and certainly I've had reviews that I thought were not only misguided, but ignorant and wrongheaded. Industry is indeed a different kind of game, but I'm unconvinced that we can ascribe to it a higher rationality.

    1. As an expert on randomness, I have nothing but the highest respect for your opinion. Still, I think there's a vast and measurable difference between the amount of care that goes into writing a paper review and making corporate decisions like shutting down a research lab.

      While I can certainly imagine some companies treating major decisions as capriciously as the proverbial Reviewer #3 ("There's nothing novel about this work", "I don't understand why you did X instead of Y", etc.), I think you would agree that such a cavalier approach to corporate decision-making would not, in general, be a recipe for success.

      I hope you are not trying to equate Microsoft's decision to shut down its Silicon Valley lab with the kind of bullshit decisions I see being made on papers on *every single PC I have ever served on*. I'm sure that the people affected by that closure were not happy about it, but I am willing to bet good money that the decision was made far more deliberately, and with much greater care, than the sloppy, mostly shoot-from-the-hip approach taken by a large majority of program committee members.

      It is true that I'm fortunate to be at a stable company and working on a stable product team. If I were to join an early-stage startup, or even a new project team at Google, then my career would be subject to vastly different pressures in terms of raising funding, hiring good people, settling on the right product direction, and generally just being lucky. In such a case, I'd agree that the degree of entropy would be much higher than the relatively stable situation I find myself in now. I still claim that those sources of randomness are qualitatively different than the process of academic peer review, but perhaps we can agree that there would be just as much uncertainty.

  5. Good post.
    In a paper review, reviewers have a conflict of interest unless they never submit to that journal or conference.
    Because everybody is racing for a limited spot, SIGCOMM can only accept 20 papers, a journal accepts 15-20 etc.
    Thus, it is a good strategy to reject papers of others, and a good amount of people do that, unfortunately.
    Another corruption is that, in many CS journals, some associate editors always publish their papers there, not in the other journals in the same area. How ethical is this?

    The definition of 'conflict-of-interest' is not working. A person cannot review a paper from his university, but it is OK to publish on your journal or cherry-picking lenient reviewers for your paper, and harsh reviewers for opposing papers. That is a widely accepted practice in todays academia. How sick is this?

    In a company case, nobody has a conflict of interest if a project is approved, unless the company has a limited budget for projects. Probably, companies like Google are more flexible, since money is not that much of an issue.
    That is probably why, companies, by their nature are less arbitrary, not because they are good samaritans.

    1. While you are right in principle, I am not sure how much this conflict of interest plays itself out in practice. It seems exceedingly unlikely that a PC member could really influence the outcome of other papers in a way that would favor their own (though that might not stop them from trying). Anyway, it's an interesting point, but I just don't have data on how problematic this is.

  6. In some ways, the (at least on the surface) fact that many professors delegate their peer reviews to their (usually more idealistic) grad students is a feature of the system, not a bug. This is interesting because it is generally viewed negatively but in reality adds a much needed element of appreciation for newness to the process.

  7. Another factor is that for really novel work a lot of the mathematics has not been refined yet. It's much easier to get something published that is incremental and has been developed over decades with a few changes. Something completely new of course will not be perfect which the reviewer can use as an excuse.

  8. One problem is that the reviewers reject a submission by arguing some of their favorate papers are not cited. This is so annoying.

  9. Industry is vast and has many examples of both more rational and less rational processes.

    Zooming in specifically on grant proposals, for example, that could be compared to getting your startup funded by a particular firm. That's just as random, if not more random, than NSF panel recommendations: the outcome of your case will similarly be influenced by whether the reviewers had coffee in the morning, by the order in which they see the proposals, by how combative they are feeling on any particular day, and by two dozen other factors over which you have no control whatsoever. Within an organization, pushing a new project through could be straightforward in some settings, but is extremely difficult in others. In a well-run organization with healthy appetite for risk you probably know most of the rules and the approval process appears non-random to you. In many organizations, though, you will have next to no visibility into why your projects were picked up or not. Indeed, most organizations would default to not letting you propose risky projects requiring substantial funding commitments to begin with :).

    You make an interesting point re that in industry the decisions are made by people who have vested interest in the work. That's true, but, arguably, NSF's decisions are made by the representatives of the NSF, who hopefully have a vested interest in the success of their overall program, rather than by panelists who only advise the NSF. In industry, similarly, a decision-maker would have plenty of advisers and would depend on the quality of their advice -- no real procedural difference with the approach taken by the NSF. And there are real issues with the advice quality in industry settings as well. Major one being: it is often hard, if not impossible, to find people who are both knowledgeable and unbiased. After all, entire corporate strategy consulting industry exists to solve just this problem: provide senior corporate decision-makers with [most likely] diligent, [hopefully] unbiased, and [at least somewhat] knowledgeable advisers. In general, I do agree that major corporate decisions are treated with somewhat more diligence than NSF funding decisions. I am not sure that this diligence necessarily translates to better decisions. Take the example of the MSR SV lab closure. Microsoft most definitely spent considerable amounts of time and energy investigating whether to do it. Whether at the end the resulting closure was based on realistic projections, biased opinions, or a coin flip, we'll never know.

    As an aside, I think that current corporate decision-making processes are where networked systems folks could really make a difference right now. There are control and communication networks underlying all these issues that I have not seen anybody study with any kind of rigor (I may not be fully up to date on all the relevant literature though) -- and this is where there is tremendous impact to be had from improving things just a little tiny bit.


Startup Life: Three Months In

I've posted a story to Medium on what it's been like to work at a startup, after years at Google. Check it out here.