Monday, September 21, 2009

The New vs. the Good

I have wondered for a while whether the computer science community should not place more value on journal articles, rather than conference papers. Journal articles are not just longer versions of conference papers that take much more time to review -- they are meant to represent a capstone on a large piece of work, which is something we often overlook in our field.

Much systems research is driven by a rapid cycle of develop, evaluate, publish (and not always in that order). With a couple of major conference venues every year, and the need to build a strong publication record as a major determinant of one's (perceived) success in the field, there is a high incentive to push out new papers as quickly as possible, irrespective of how half-baked the content might be. Many conference papers do little more than scratch the surface of an idea -- it is hard to do more in only 14 pages. The expected longevity of a paper (even a good one at a top conference) is little more than a year, two tops. And most systems on which the papers are based never see the light of day, apart from perhaps a tarball slapped together and linked on a student's website.

It's a collective form of ADHD -- hack, publish, move onto the next thing. In some sense, it's more important to be the first person to publish in an area rather than to develop a system to the point where the major problems have been actually solved, and the concept thoroughly vetted. Research fads come and go pretty quickly. (Remember distributed hash tables?) Once the first few papers have been published in an area people start to get antsy looking for the next big idea.

In other scientific communities, there is a vastly different expectation of the maturity of a piece of work before it can be published, using journal articles as the primary means of dissemination. As much as we scorn journals, they do have the virtue of slowing things down -- requiring more in-depth presentation of the ideas, extensive reviews, and sometimes multiple revisions before the work can be published. (My wife, who is a psychiatrist, reports that several of her articles have been in the review and revision cycle for more than a year and a half. Computer scientists don't have this kind of patience.)

One can argue that the journal editorial cycle is too slow for a fast-moving field like CS. I think that's naive; other scientific disicplines -- molecular biology, particle physics -- are innovating at least as rapidly and manage to do so within the content of a journal article framework. Those communities have the means for getting early results out there -- posters and oral presentations at conferences, online repositories like arXiv -- but there is a much clearer line drawn between the early work and the culmination of a major research effort. In the systems community, we have workshops like HotOS for floating new ideas, but it's not uncommon for a HotOS paper to turn into a major conference publication just a few months later. (One could argue that a project at that point of maturity should not be a candidate for a "hot topics" workshop. CS research seems to exhibit a high degree of entropy: work goes from "hot" to "cold" pretty quickly.)

I wonder what this rapid cycle does to the quality and depth of the work in our field, compared to that in other fields. I like to think that CS has shed the antiquated, lumbering trappings of other academic disciplines, but in our rush to keep the publication cycle going, what are we missing? Does our rapid-fire approach to research cause us to spend too much time on playing small-ball, rather than investing time into the hard problems that could take years to bear fruit? Does it make sense to place more value on the currency of journal articles in CS?


  1. I'm not a computer scientist, so I won't try to answer your question. I came here to point you to this post by Quantum Pontiff with slow musings on the virtues of slow science.

  2. Whether or not a community insists on journal publications is a bit of a red herring. The larger problem (particularly in the systems community) is a lack of obvious standards for what constitutes fundamental work. In natural science, the big questions (and progress towards answering them) are often a lot more apparent than whether or not a newly proposed system design is solving an important problem.

    The difficulty in identifying fundamental work is made worse by the fact that the community seems to have given up on eating its own dog food. The inarguable successes for computer science research (for example, the Internet, RAID, BSD) came out of that culture, which now seems to have been largely replaced by efforts to be the first to write about a hot topic, while leaving any substantive implementation to industry.

    My own opinion is that this hurts the community by giving students the perception that industry is driving the field completely. I've seen many talented undergraduates eschew grad school (or grad students eschew academic careers) to work for Google, Microsoft, Amazon, or a startup because they believe that a career in industry is now the best way to take on interesting problems and have real-world impact. (Whether or not this is correct is debatable, but regardless, the perception is widespread.)

    Would insisting on journal publications change the culture? Probably not. If, as you point out, being first is more highly rewarded than being complete, journal publications would seem to be just a waste of time from the perspective of a strategic publisher (i.e., anyone trying to get a job, tenure, or funding, i.e., pretty much everyone on the academic side of the community). Until the perceived value of completing projects end-to-end changes, the tendency towards ADHD publishing will continue. It's not surprising that researchers are responding to prevailing incentives. And, even if the majority individually agrees that they aren't ideal, changing an entrenched reward structure is tough.

    It's also not clear that this tendency is bad. The extreme alternative to ADHD publishing, insisting on a complete solutions that requires years of person-effort to implement, doesn't seem like the solution either. More broadly, many people argue that scientific disciplines are not supposed to be generative, and perhaps the seemingly imbalanced standards of computer science are the result of an unfortunate choice of discipline name 8)

  3. I agree with everything Mike said. I've had this discussion many times (both with him and others), and it always boils down to this: we need some evaluation metric for hiring & tenure, and since this evaluation is done by people who often don't understand the candidate's work, publication & citation count is used as a proxy for how important the candidate's work is. And how does one get a large publication/citation count? The shotgun approach, of course. This strategy works well because of the (relatively) short turnaround time of conferences. Journals would uproot this whole process.

    That said, there *are* other fields where journal articles are norm. Mechanical engineering is one of them. But I think the only way that situation works is when a professor has a legion of grad students working on large projects and cranking out publications (this is my roommate's situation). Researchers with singular or interdisciplinary interests (such as myself) would be left in the dust -- one person alone can't do all of the work involved in those huge projects, and the turnaround time of 1-1.5 years is detrimental to building a body of citable work.

    So, I'm not convinced journals are the right answer. We need a more fundamental change toward recognizing activities *other* than publications and citations as being contributions to the field. Academic research is all about risk-taking, so we should encourage more of it.

  4. Mike makes some good points. I am also concerned about the industry-driven nature of systems in research in particular; are we working on problems with a long enough time horizon or just things that can get productized in the next 12-18 months? As one indicator of this, Microsoft Research has started to dominate systems conferences: they have 7 out of 23 papers at SOSP 2009 - nearly 1/3 of the conference. The quality of the research from MSR is excellent, but the balance of power is definitely shifting away from academic groups (which arguably have less firepower when it comes to putting together strong paper submissions).

    I don't agree with Justin's concern about the nature of journal-driven research. It is not universally true that journal-based fields utilize a proverbial army of grad students to crank out papers. The expectations are quite different: a PhD student in those fields aims for one, maybe two, top quality journal publications by the time they graduate. The quality and impact rating of the journal matter a lot more than the quantity of publications.

  5. I pretty much agree to what has been said here but I just wanted to point out that the nature of the computer science field kind of mashes together in the same conferences lots of different types of research.

    For example you might have things that people describe as clever hacks, other things as incremental while other things are more fundamental in nature. I agree on the fact that they are all important and they should go somewhere, I am just not sure they should all go to the same place. imho this leads to problems as people have different expectations on what should go in a certain place.

    I think that we need to get better at identifying those that free-ride on the conference publishing system just as we need to get better at identifying those that free-ride on the conference reviewing system (do bad, flimsy reviews etc). Going for journals might solve the second. I’m not sure it would solve the first though.


Startup Life: Three Months In

I've posted a story to Medium on what it's been like to work at a startup, after years at Google. Check it out here.