Wednesday, November 2, 2011

Software is not science

Very often I see conference paper submissions and PhD thesis proposals that center entirely on a piece of software that someone has built. The abstract often starts out something like this:

We have designed METAFOO, a sensor network simulator that accurately captures hardware level power consumption. METAFOO has a modular design that achieves high flexibility by allowing new component models to be plugged into the simulation. METAFOO also incorporates a Java-based GUI environment for visualizing simulation results, as well as plugins to MATLAB, R, and Gnuplot for analyzing simulation runs....


You get the idea.  More often than not, the paper reads like a technical description of the software, with a hairy block diagram with a bunch of boxes and arrows and a detailed narrative on each piece of the system, what language it's implemented in, how many lines of code, etc. The authors of such papers quite earnestly believe that this is going to make a good conference submission.

While this all might be very interesting to someone who plans to use the software or build on it, this is not the point of a scientific publication or a PhD dissertation. All too often, researchers -- especially those in systems -- seem to confuse the scientific question with the software artifact that they build to explore that question. They get hung up on the idea of building a beautiful piece of software, forgetting that the point was to do science.

When I see a paper submission like this, I will start reading it in the hopes that there is some deeper insight or spark of inspiration in the system design. Usually it's not there. The paper gets so wrapped up in describing the artifact that it forgets to establish the scientific contributions that were made in developing the software. These papers do not tend to get into major conferences, and they do not make a good foundation for a PhD dissertation.

In computer systems research, there are two kinds of software that people build. The first class comprises tools used to support other research. This includes things like testbeds, simulators, and so forth. This is often great, and invaluable, software, but not -- in and of itself -- the point of research itself. Countless researchers have used ns2, Emulab, Planetlab, etc. to do their work and without this investment the community can't move forward. But all too often, students seem to think that building a useful tool equates to doing research. It doesn't.

The second, and more important, kind of software is a working prototype to demonstrate an idea. However, the point of the work is the idea that it embodies, not the software itself. Great examples of this include things like Exokernel and Barrelfish. Those systems demonstrated a beautiful set of concepts (operating system extensibility and message-passing in multicore processors respectively), but nobody actually used those pieces of software for anything more than getting graphs for a paper, or maybe a cute demo at a conference.

There are rare exceptions of "research" software that took on a life beyond the prototype phase. TinyOS and Click are two good examples. But this is the exception, not the rule. Generally I would not advise grad students to spend a lot of energy on "marketing" their research prototype. Chances are nobody will use your code anyway, and time you spend turning a prototype into a real system is time better spent pushing the envelope and writing great papers. If your software doesn't happen to embody any radical new ideas, and instead you are spending your time adding a GUI or writing documentation, you're probably spending your time on the wrong thing.

So, how do you write a paper about a piece of software? Three recommendations:

  1. Put the scientific contributions first. Make the paper about the key contributions you are making to the field. Spell them out clearly, on the first page of the paper. Make sure they are really core scientific contributions, not something like "our first contribution is that we built METAFOO." A better example would be, "We demonstrate that by a careful decomposition of cycle-accurate simulation logic from power modeling, we can achieve far greater accuracy while scaling to large numbers of nodes." Your software will be the vehicle you use to prove this point.
  2. Decouple the new ideas from the software itself. Someone should be able to come along and take your great ideas and apply them in another software system or to a completely different problem entirely. The key idea you are promoting should not be linked to whatever hairy code you had to write to show that the idea works in practice. Taking Click as an example, its modular design has been recycled in many, many other software systems (including my own PhD thesis).
  3. Think about who will care about this paper 20 years from now. If your paper is all about some minor feature that you're adding to some codebase, chances are nobody will. Try to bring out what is enduring about your work, and focus the paper on that.




15 comments:

  1. Thank you Matt! I've reviewed a bunch of papers recently and have been grumbling to myself a lot about this exact problem.

    Following up on your 2nd recommendation, "Decouple the new ideas from the software itself," I often run into a related phenomenon in file system papers. Sometimes the best way to scientifically explore a new idea is to implement it in an existing system. This makes it easy to do A/B comparisons that focus on the costs, benefits, and other trade-offs of your cool new idea. Too often I see papers that have written entirely new file systems when a better approach would have been to extend an existing file system. Somehow I doubt this phenomenon is restricted to people who build file systems.

    Of course, that's not to say that researchers should never build completely new systems. Some ideas (such as Exokernel, or Log-structured file systems) can only be effectively demonstrated and explored that way. But I wish more people would stop and think about building what is best for the science rather than what is best for their ego.

    ReplyDelete
  2. So, if someone writes a program that can perfectly translate any English article into Chinese, is that not worth a PhD?

    Or, if someone handed you IBM's Watson for their PhD thesis, you would say NO?

    I guess, Artificial Intelligence is not science.

    ReplyDelete
  3. Keith - you are absolutely right that embedding a new idea in an existing software system can be more convincing than starting from scratch. This post is not really about whether you are building a new software system or not - just that any software you write should be thought of as a manifestation of an idea, and the idea is primary - not the code.

    Anon - You are obviously trolling, but I'll respond anyway. The people who built Watson would agree that the science behind the work was not limited to the software artifact that they produced. Watson embodies many new ideas about knowledge representation, language interpretation, parallelizing queries, and so forth. The software system "Watson" embodies those ideas. In the case of Watson, the new ideas are so glaringly obvious that you probably do not need to disentangle the ideas from the software. But I have read many, many (rejected) conference submissions about a piece of software where the new ideas are *not* obvious, and the paper is written entirely about the mundane aspects of the code.

    ReplyDelete
  4. Hi Matt,

    Your post touches a really good point, thanks.

    Although I totally agree with the main message that software is not science, I tend to think differently about this part:

    [...]
    Chances are nobody will use your code anyway, and time you spend turning a prototype into a real system is time better spent pushing the envelope and writing great papers.
    [...]

    I agree that a prototype is not a scientific contribution in itself, yet as a community we might benefit from creating incentives to the production and release of high quality prototype code. As opposed to assume the mindset: "it's unlikely my code will be reused".

    Sharing high quality prototype code could help researchers to: 1) repeat experiments; and/or, 2) to extend the core research ideas. This is not to say that we should prioritize the prototype over trying to push the envelope.

    Nevertheless, writing and sharing good and reusable code should not only be considered as a contribution to the scientific community, but it should be encouraged.

    For example, initiatives like SIGMOD experimental repeatability (http://www.sigmod2011.org/calls_papers_sigmod_research_repeatability.shtml) seem to provide good incentives to researchers that craft reusable prototypes.

    ReplyDelete
  5. Good post, but it is a bit sad that this needs saying.

    In a similar vein but worse are "framework papers". We present a new framework to solve problem X. Blah blah blah. Where the "framework" is just a general description of how one might go about writing the software. This gets written by people who could not be bothered to even write the software to test out their ideas.

    ReplyDelete
  6. Elizeu - The one place where I think it makes sense to go beyond a throw-away prototype is when you have to build up a lot of scaffolding to get to the hard and interesting problems. In the Berkeley NOW project, for example, unless we built a pretty solid and useful system, we would have never been able to explore some of the juicier research questions that depended on having that infrastructure in place. But 95% of the code that was written for NOW never got published, since it was "uninteresting" from a research perspective.

    I tend to worry that many students want to do as little work as possible to get a paper in. Usually that means starting with something like Linux and writing their own little thing on top, rather than investing the time and energy to build something more substantial and complex. But keep in mind that is still just scaffolding: it's the research, not the code, that counts....

    ReplyDelete
  7. Great points. I do think that building real solutions force us as researchers to actually fully evaluate our ideas. Running real systems in the real world (ideally at scale) will reveal things that would be difficult to discover in simulation or with (simple) prototypes. Thus, while it is of course important for authors to identify their research contributions, it is likewise important that the contributions are backed by reality. :-)

    Incidentally, exokernel ideas (not sure about the code) were commercialized by Doug Wyatt and others by Exotec/Vividon around 2000.

    ReplyDelete
  8. Like Elizeu, I worry about repeatability, and about assigning credit / academic reward for the necessary software scut work behind an ambitious academic project. I'm curious how NOW in particular dealt with this (as a very good example of a project that took a lot of work to get to the point where the research could yield results).

    ReplyDelete
  9. Jan - I think we don't do this very well in academia. On a project like NOW, the first couple of students who published the major papers got most of the credit. On any big project I'd argue there are unsung heroes who do a lot of the scut work to make the system actually work, but that are not so easy to publish. Our CitySense project (http://www.citysense.net) at Harvard was a great example of this - 90% of the work was unpublishable grunt work to build a network of sensors around Cambridge - we only got 1 or 2 papers out of all that effort.

    ReplyDelete
  10. This comment has been removed by the author.

    ReplyDelete
  11. While I largely agree that software isn't necessarily research, prototypes alone aren't sufficient to advance the field, particularly in systems research.

    We learn a lot from real usage -- what matters, what doesn't, and how our intuition is wrong. Unix and the web are great examples of this -- in both cases, the key innovation was simplicity: keep the features that matter, discard the superfluous, and adapt to what real users actually do. Research tools are also essential: PlanetLab, Emulab, Click, Xorp, TinyOS, the Intel WISP, etc. have been used as building blocks by hundreds of research systems.

    Yet, prevailing incentives in research discourage the grunt work required to make prototypes usable. Paradoxically, this leads to more incrementalism, since researchers are forced to wait for industry to do the grunt work for them. Examples include the proliferation of map reduce papers based on small tweaks to Hadoop, or mobile papers based on small tweaks to Android.

    In my view, our inability to develop a model for doing generative, engineering-heavy research is a fundamental problem, and I'm not sure how to solve it. On one hand, a lack of usable prototypes limits impact and progress. On the other, researchers can't spend time making prototypes usable, since the community (perhaps rightly!) emphasizes ideas, not implementations.

    ReplyDelete
  12. I 'd like to pop up a sad but true fact: Implementation is a perfect add-on for paper, but right now it seems to be a must for papers submitted for top-tier pubs. Why? Because we don't have truly innovative works that can impress others, we choose to decorate our paper by using more contents. I have seen many papers with so-so ideas but lots of implementation results so that they can be accepted in top-tier pubs. As a researcher who spend 2-3 years in one field, they can image most of implementation results by reading their theoretical parts or protocol design scheme(I personally can know whether their 10-30% performance increase is really amazing research or just some engineering tradeoff without looking any explanation in paper). yes, it is impressive, it takes a lot of time to finish. But that's not the paper you want to read several times except for citation.

    ReplyDelete
  13. It's hard to disagree with this, up until:

    "Great examples of this include things like Exokernel and Barrelfish. Those systems demonstrated a beautiful set of concepts (operating system extensibility and message-passing in multicore processors respectively)..."

    "Operating system extensibility" isn't a concept, it's a goal. And digging into the exokernel's claims is like turning over a rotten log in a forest: one finds a bunch of creepy crawly stuff. "Message-passing in multicore processors" isn't new to Barrelfish, and the numbers in the BF paper depend on shared memory. Those papers read better than they are, because they gesture at large concepts they cannot fully support. Just as some of the "I wrote a simulator" papers feel worse than they are, because they forget to gesture at a concept. (I still like exokernel & BF as papers.)

    There's a lot of mediocre research in CS systems, some of it done by me and others of us. This seems inevitable given the supply of students and various publication pressures. I am cranky about it too (and ashamed of my own contributions). What seems field-specific, though, is that a mediocre CS systems paper is 14pp long! Nature papers are like 4pp. The double-helix paper is 1 PAGE. (Rosalind Franklin's addition is 2pp.) Our text ocean trains us to read superficially.

    ReplyDelete
  14. So this post seems mostly about the *presentation* of research in papers. i.e., present your intellectual contributions first, then talk about what you built. That's certainly prudent and hopefully not too controversial.

    The more nebulous question, and one which various commenters seem to be approaching, is what standard of validation should new systems "ideas" be held to? If we are in the business of computer systems research, then what does it mean to do research which does not involve production-quality computer systems?

    Many systems papers which are closer to the style Matt advocates for are written as follows:

    "Here is my big novel idea and here is a [research prototype/simulation] that demonstrates it works."

    I'm somewhat new to this game, but I've probably seen or participated in at least ten different "prototypes" for various projects at different institutions and they're often total shit. They are done in haste to get something tangible up before a deadline, and when they reveal problems with the original design, those problems are often washed over or ignored.

    This type of "prototype" does not seem to provide value to anyone. I'd rather read a clean intellectual proposal for a design with no attempt at quantitative evaluation, or the other extreme, something like production quality code where actual deployment issues have been addressed.

    ReplyDelete
  15. Michael - You make a very good point that sometimes we don't learn what matters until we build a real, usable software system and gain experience with it in the "real" (or almost real) world.

    Eddie - I'm less cynical about those papers than you are. But I agree 100% that most of the best papers tend to do a great job at polishing a turd: the reality is often very different than the "elegant" designs written up in the papers.

    Patrick - I'd argue that those crappy prototypes, the bare minimum to get a paper published, are probably the right things for academics to be spending their time on. I heard about a research group at a university that invested a lot of time into a rigorous unit testing infrastructure for their code. This was probably not worth the effort.

    The deeper question (which you and Michael and Eddie all point to) is how realistic a prototype needs to be before you can make any claims about it. I think peer review does a reasonable job here. If I see a claim made about a piece of software that can only run toy applications or microbenchmarks, I'm pretty skeptical. This is not to say that I expect the prototype to have a full-featured window manager and USB driver support for every device under the sun. (Hell, my Mac doesn't even have that.)

    Good systems papers often chalk up the heroics that the authors engaged in to do something fantastic with their code. My favorite example of this is Nickolai Zeldovich rebooting his laptop - running HiStar - in the middle of his *job talk*, to demo the fast-recovery feature in the OS. Yes, it was a total stunt, but it did prove a point.

    ReplyDelete