Comments on Volatile and Decentralized: Software is not science

Michael - You make a very good point that sometime...

2011-11-07T22:01:58.157-08:00

Michael - You make a very good point that sometimes we don't learn what matters until we build a real, usable software system and gain experience with it in the "real" (or almost real) world.

Eddie - I'm less cynical about those papers than you are. But I agree 100% that most of the best papers tend to do a great job at polishing a turd: the reality is often very different than the "elegant" designs written up in the papers.

Patrick - I'd argue that those crappy prototypes, the bare minimum to get a paper published, are probably the right things for academics to be spending their time on. I heard about a research group at a university that invested a lot of time into a rigorous unit testing infrastructure for their code. This was probably not worth the effort.

The deeper question (which you and Michael and Eddie all point to) is how realistic a prototype needs to be before you can make any claims about it. I think peer review does a reasonable job here. If I see a claim made about a piece of software that can only run toy applications or microbenchmarks, I'm pretty skeptical. This is not to say that I expect the prototype to have a full-featured window manager and USB driver support for every device under the sun. (Hell, my Mac doesn't even have that.)

Good systems papers often chalk up the heroics that the authors engaged in to do something fantastic with their code. My favorite example of this is Nickolai Zeldovich rebooting his laptop - running HiStar - in the middle of his *job talk*, to demo the fast-recovery feature in the OS. Yes, it was a total stunt, but it did prove a point.

So this post seems mostly about the presentation...

2011-11-05T20:45:38.629-07:00

So this post seems mostly about the *presentation* of research in papers. i.e., present your intellectual contributions first, then talk about what you built. That's certainly prudent and hopefully not too controversial.

The more nebulous question, and one which various commenters seem to be approaching, is what standard of validation should new systems "ideas" be held to? If we are in the business of computer systems research, then what does it mean to do research which does not involve production-quality computer systems?

Many systems papers which are closer to the style Matt advocates for are written as follows:

"Here is my big novel idea and here is a [research prototype/simulation] that demonstrates it works."

I'm somewhat new to this game, but I've probably seen or participated in at least ten different "prototypes" for various projects at different institutions and they're often total shit. They are done in haste to get something tangible up before a deadline, and when they reveal problems with the original design, those problems are often washed over or ignored.

This type of "prototype" does not seem to provide value to anyone. I'd rather read a clean intellectual proposal for a design with no attempt at quantitative evaluation, or the other extreme, something like production quality code where actual deployment issues have been addressed.

It's hard to disagree with this, up until: &q...

2011-11-04T06:21:15.387-07:00

It's hard to disagree with this, up until:

"Great examples of this include things like Exokernel and Barrelfish. Those systems demonstrated a beautiful set of concepts (operating system extensibility and message-passing in multicore processors respectively)..."

"Operating system extensibility" isn't a concept, it's a goal. And digging into the exokernel's claims is like turning over a rotten log in a forest: one finds a bunch of creepy crawly stuff. "Message-passing in multicore processors" isn't new to Barrelfish, and the numbers in the BF paper depend on shared memory. Those papers read better than they are, because they gesture at large concepts they cannot fully support. Just as some of the "I wrote a simulator" papers feel worse than they are, because they forget to gesture at a concept. (I still like exokernel & BF as papers.)

There's a lot of mediocre research in CS systems, some of it done by me and others of us. This seems inevitable given the supply of students and various publication pressures. I am cranky about it too (and ashamed of my own contributions). What seems field-specific, though, is that a mediocre CS systems paper is 14pp long! Nature papers are like 4pp. The double-helix paper is 1 PAGE. (Rosalind Franklin's addition is 2pp.) Our text ocean trains us to read superficially.

I 'd like to pop up a sad but true fact: Imple...

2011-11-03T22:49:02.063-07:00

I 'd like to pop up a sad but true fact: Implementation is a perfect add-on for paper, but right now it seems to be a must for papers submitted for top-tier pubs. Why? Because we don't have truly innovative works that can impress others, we choose to decorate our paper by using more contents. I have seen many papers with so-so ideas but lots of implementation results so that they can be accepted in top-tier pubs. As a researcher who spend 2-3 years in one field, they can image most of implementation results by reading their theoretical parts or protocol design scheme(I personally can know whether their 10-30% performance increase is really amazing research or just some engineering tradeoff without looking any explanation in paper). yes, it is impressive, it takes a lot of time to finish. But that's not the paper you want to read several times except for citation.

While I largely agree that software isn't nece...

2011-11-03T12:11:55.504-07:00

While I largely agree that software isn't necessarily research, prototypes alone aren't sufficient to advance the field, particularly in systems research.

We learn a lot from real usage -- what matters, what doesn't, and how our intuition is wrong. Unix and the web are great examples of this -- in both cases, the key innovation was simplicity: keep the features that matter, discard the superfluous, and adapt to what real users actually do. Research tools are also essential: PlanetLab, Emulab, Click, Xorp, TinyOS, the Intel WISP, etc. have been used as building blocks by hundreds of research systems.

Yet, prevailing incentives in research discourage the grunt work required to make prototypes usable. Paradoxically, this leads to more incrementalism, since researchers are forced to wait for industry to do the grunt work for them. Examples include the proliferation of map reduce papers based on small tweaks to Hadoop, or mobile papers based on small tweaks to Android.

In my view, our inability to develop a model for doing generative, engineering-heavy research is a fundamental problem, and I'm not sure how to solve it. On one hand, a lack of usable prototypes limits impact and progress. On the other, researchers can't spend time making prototypes usable, since the community (perhaps rightly!) emphasizes ideas, not implementations.

2011-11-03T08:53:54.887-07:00

This comment has been removed by the author.

Jan - I think we don't do this very well in ac...

2011-11-03T08:42:30.512-07:00

Jan - I think we don't do this very well in academia. On a project like NOW, the first couple of students who published the major papers got most of the credit. On any big project I'd argue there are unsung heroes who do a lot of the scut work to make the system actually work, but that are not so easy to publish. Our CitySense project (http://www.citysense.net) at Harvard was a great example of this - 90% of the work was unpublishable grunt work to build a network of sensors around Cambridge - we only got 1 or 2 papers out of all that effort.

Like Elizeu, I worry about repeatability, and abou...

2011-11-03T08:32:39.282-07:00

Like Elizeu, I worry about repeatability, and about assigning credit / academic reward for the necessary software scut work behind an ambitious academic project. I'm curious how NOW in particular dealt with this (as a very good example of a project that took a lot of work to get to the point where the research could yield results).

Great points. I do think that building real soluti...

2011-11-03T07:13:47.104-07:00

Great points. I do think that building real solutions force us as researchers to actually fully evaluate our ideas. Running real systems in the real world (ideally at scale) will reveal things that would be difficult to discover in simulation or with (simple) prototypes. Thus, while it is of course important for authors to identify their research contributions, it is likewise important that the contributions are backed by reality. :-)

Incidentally, exokernel ideas (not sure about the code) were commercialized by Doug Wyatt and others by Exotec/Vividon around 2000.

Elizeu - The one place where I think it makes sens...

2011-11-02T22:19:43.371-07:00

Elizeu - The one place where I think it makes sense to go beyond a throw-away prototype is when you have to build up a lot of scaffolding to get to the hard and interesting problems. In the Berkeley NOW project, for example, unless we built a pretty solid and useful system, we would have never been able to explore some of the juicier research questions that depended on having that infrastructure in place. But 95% of the code that was written for NOW never got published, since it was "uninteresting" from a research perspective.

I tend to worry that many students want to do as little work as possible to get a paper in. Usually that means starting with something like Linux and writing their own little thing on top, rather than investing the time and energy to build something more substantial and complex. But keep in mind that is still just scaffolding: it's the research, not the code, that counts....

Good post, but it is a bit sad that this needs say...

2011-11-02T19:28:17.519-07:00

Good post, but it is a bit sad that this needs saying.

In a similar vein but worse are "framework papers". We present a new framework to solve problem X. Blah blah blah. Where the "framework" is just a general description of how one might go about writing the software. This gets written by people who could not be bothered to even write the software to test out their ideas.

Hi Matt, Your post touches a really good point, t...

2011-11-02T19:08:29.884-07:00

Hi Matt,

Your post touches a really good point, thanks.

Although I totally agree with the main message that software is not science, I tend to think differently about this part:

[...]
Chances are nobody will use your code anyway, and time you spend turning a prototype into a real system is time better spent pushing the envelope and writing great papers.
[...]

I agree that a prototype is not a scientific contribution in itself, yet as a community we might benefit from creating incentives to the production and release of high quality prototype code. As opposed to assume the mindset: "it's unlikely my code will be reused".

Sharing high quality prototype code could help researchers to: 1) repeat experiments; and/or, 2) to extend the core research ideas. This is not to say that we should prioritize the prototype over trying to push the envelope.

Nevertheless, writing and sharing good and reusable code should not only be considered as a contribution to the scientific community, but it should be encouraged.

For example, initiatives like SIGMOD experimental repeatability (http://www.sigmod2011.org/calls_papers_sigmod_research_repeatability.shtml) seem to provide good incentives to researchers that craft reusable prototypes.

Keith - you are absolutely right that embedding a ...

2011-11-02T14:30:57.885-07:00

Keith - you are absolutely right that embedding a new idea in an existing software system can be more convincing than starting from scratch. This post is not really about whether you are building a new software system or not - just that any software you write should be thought of as a manifestation of an idea, and the idea is primary - not the code.

Anon - You are obviously trolling, but I'll respond anyway. The people who built Watson would agree that the science behind the work was not limited to the software artifact that they produced. Watson embodies many new ideas about knowledge representation, language interpretation, parallelizing queries, and so forth. The software system "Watson" embodies those ideas. In the case of Watson, the new ideas are so glaringly obvious that you probably do not need to disentangle the ideas from the software. But I have read many, many (rejected) conference submissions about a piece of software where the new ideas are *not* obvious, and the paper is written entirely about the mundane aspects of the code.

So, if someone writes a program that can perfectly...

2011-11-02T14:19:59.693-07:00

So, if someone writes a program that can perfectly translate any English article into Chinese, is that not worth a PhD?

Or, if someone handed you IBM's Watson for their PhD thesis, you would say NO?

I guess, Artificial Intelligence is not science.

Thank you Matt! I've reviewed a bunch of pape...

2011-11-02T13:06:51.165-07:00

Thank you Matt! I've reviewed a bunch of papers recently and have been grumbling to myself a lot about this exact problem.

Following up on your 2nd recommendation, "Decouple the new ideas from the software itself," I often run into a related phenomenon in file system papers. Sometimes the best way to scientifically explore a new idea is to implement it in an existing system. This makes it easy to do A/B comparisons that focus on the costs, benefits, and other trade-offs of your cool new idea. Too often I see papers that have written entirely new file systems when a better approach would have been to extend an existing file system. Somehow I doubt this phenomenon is restricted to people who build file systems.

Of course, that's not to say that researchers should never build completely new systems. Some ideas (such as Exokernel, or Log-structured file systems) can only be effectively demonstrated and explored that way. But I wish more people would stop and think about building what is best for the science rather than what is best for their ego.