Saturday, February 28, 2009

The plight of the poor application paper

My research is unapologetically applications-driven: we've deployed sensor networks for monitoring volcanoes, disaster response, and for measuring limb movements in patients with Parkinson's Disease. One of the joys of working on sensor networks is that a lot of exciting research derives from close collaborations with domain experts, shedding light on challenges that we wouldn't otherwise be exposed to. It also keeps us in check and ensures we're working on real problems, rather than artificial ones.

At the same time, it's a sad truth that "deployment" or "application" papers often face an uphill battle when it comes to getting published in major conferences. I've seen plenty of (good!) application-focused papers get dinged in program committees for, well, simply not being novel enough. Now, we could have a healthy argument about the inherent novelty of building a real system, getting it to work, deploying it in a challenging field setting, and reporting on the results. But it's true that these papers are pretty different than those about a new protocol, algorithm, or language. I've thought a bit about what makes it harder for apps papers to get into these venues and have come up with the following observations.

1) Getting something to work in the real world often involves simplifying it to the point where most of the "sexy" ideas are watered down.

It is very rare for a successful sensor network deployment to involve brand-new, never-before-published techniques; doing so would involve a tremendous amount of risk. Generally it's necessary to use fairly robust code that embodies well-worn ideas, at least for the underpinnings of the system design (MAC, routing, time sync, and so forth). As a result, the components of the system design might end up not being very novel. Also, many application papers involve a combination of several "well known" techniques, but combined together in interesting ways. Still, when a reviewer picks apart a paper piece by piece, it's hard to identify the individual contributions. The hope is that the whole is greater than the sum of the parts; but this is often difficult to convey.

There is a way to avoid this problem, and that is to write the paper about something other than the "mundane" aspects of the system design itself. For our OSDI paper on the volcano sensor network, we decided to focus on the validation of the network's operation during the deployment, not the individual pieces that made up the system. Although it took a lot of work to take the "well-tested" implementations of major components (such as MultihopLQI) and get them to work robustly in the field, we didn't think the paper could rest on that refinement of previously-published ideas. The Berkeley paper on monitoring redwoods took a similar approach by focusing on the data analysis.

2) Academic research tends to reward those who come up with an idea first, not those who get the idea to work.

There are lots of great ideas in the literature that have only been studied in simulation or small-scale experiments. Almost no credit goes to those who manage to get an idea actually deployed and working under less certain conditions. So even though it might take an incredible amount of sweat to take, say, a routing protocol and get it working on real hardware in a large-scale field deployment, unless you ended up making substantial changes to the protocol, or learned something new about its operation, you're unlikely to get much credit for doing so.

We learned this the hard way with our paper on adapting the ADMR multicast protocol to work on motes, which we needed for the CodeBlue medical monitoring platform. It turns out that taking an existing protocol (which had only been studied using ns-2 with a simplistic radio model, and without consideration for memory or bandwidth limitations of mote-class devices), and implementing it on real hardware, didn't blow away the program committees the way we hoped it would. Eventually, we did publish this work (in the aptly-named REALMAN workshop). But the initial reviews contained things like "everybody knows that MANET protocols won't work on motes!" That was frustrating.

3) Deployments carry a substantial risk that the system won't actually work, making it harder to convince a reviewer that the paper is worth accepting.

Maybe there should be a built-in handicap for real deployment papers. Whereas in the lab, you can just keep tweaking and rerunning experiments until you get the results you want, this isn't possible in the field. On the other hand, it's not clear that we can really hold deployment papers to a different standard; after all, what constitutes a "real" deployment? Is an installation of nodes around an academic office building good enough? (We've seen plenty of those. If the world ever wants to know the average temperature or light level of the offices in a CS department, we are ready!) Or does it have to be in some gritty, untethered locale, like a forest, or a glacier? Does use of machetes and/or pack animals to reach the deployment site count for anything?

Of course, it is possible to get a great paper out of a deployment that goes sideways. The best way is to write the paper as a kind of retrospective, explaining what went wrong, and why. These papers are often entertaining to read, and provide valuable lessons for those attempting future work along the same lines. Also, failures can often take your research into entirely new directions, which I've blogged about before. As an example, we ended up developing Lance specifically to address the data quality challenges that arose in our deployment at Reventador. We would have never stumbled across that problem had our original system worked as planned.

One thing I don't think we should do is sequester deployment and application papers in their own venues, for example, by having a workshop on sensor networks applications. I understand the desire to get like-minded people together to share war stories, but I think it's essential that these kinds of papers be given equal billing with papers on more "fundamental" topics. In the best case, they can enrich an otherwise dry technical program, as well as inpire and inform future research. Besides, the folks who would go to such a workshop don't need to be convinced of the merits of application papers.

Personally, I'd like to see a bunch of real deployment papers submitted to Sensys 2009. Jie and I are thinking of ways of getting the program committee to think outside the box when reviewing these papers, and any suggestions as to how we should encourage a more open-minded perspective are most welcome.

14 comments:

  1. There's a fourth concern I have heard with building and deploying artifacts, which is their relative transience vis-a-vis ideas.

    One way to counter this handicap is for the community to promote application-driven research that either validates or points out significant drawbacks in making ideas work in practice. For e.g., in fields such as experimental Physics, it's even possible to obtain a Ph.D. for repeating prior ideas or claims carefully.

    ReplyDelete
  2. "Maybe there should be a built-in handicap for real deployment papers."

    Another, perhaps less popular approach might be to require that all papers (in OSDI, for example) release the source code / test scripts used in the experiments described in the paper. This would shine a light on papers that were based upon unrealistic simulations or that don't deal with hard implementation details, and might tip the balance back toward application papers.

    ReplyDelete
  3. "Another, perhaps less popular approach might be to require that all papers (in OSDI, for example) release the source code / test scripts used in the experiments described in the paper."

    I think SIGMOD made this stuff a mandatory requirement last year. I'm not sure if the results changed significantly.

    ReplyDelete
  4. I'm not sure that releasing source code is going to help matters much. PC members are overwhelmed with it is and I doubt that anyone would have time to take a serious look at the code/scripts/etc. when evaluating a paper. I guess it's a good idea in the sense that you could be "audited" at any time by a reviewer, but, it also opens up the potential for abuse where a PC member shoots down a paper due to lack of understanding of the code (or not liking the coding style, or some other trivial issue).

    ReplyDelete
  5. True, releasing code wouldn't help during the review process. However, it might help on a longer timescale - it would enable repeatability of experiments (something that the community talks about but doesn't do that much in practice), and might keep folks from submitting papers they can't back up with code by the time the camera ready is due. That in turn would (indirectly) help application papers.

    That said, it might prove too unpopular...

    ReplyDelete
  6. How about having people explicitly mark as part of the submission process that their paper should be considered as an application paper, then allocating reviewers accordingly? For SenSys in particular, there should be plenty of PC members and subreviewers who have the background and experience to judge these papers. There are plenty of issues in figuring out what is a good application paper, several of which you have raised. Most of these seem to come down to needing reviewers that have the appropriate "taste" in judging the paper.

    Having authors mark "this is an application paper, judge it by those criteria" at submission time would save time and match the paper to the right people. Of course you would not necessarily have every single PC member on the paper be an "applications person," just to keep things from becoming too inbred. Still, it would be a way to fairly evaluate such papers and give them a fighting chance without splitting off into a separate conference.

    I agree with the concern about splitting into a separate conference, by the way. I have seen cases where creating a new conference or workshop pays off with great new research (e.g. Privacy Enhancing Technologies, Usenix Electronic Voting Technologies), but in general I worry about fields becoming balkanized. Hard enough as it is to keep up with all the work coming out in the main focus of an area.

    ReplyDelete
  7. Hi Matt. I sympathize. I admit I kind of like the idea of "marking" a paper as an applications paper in some way, although one would hope that most people in the area would be able to read and judge such a paper appropriately.

    I've actually just put up a post I've been tinkering with for a few weeks on the plight of the poor theory paper for networking/systems conferences. Good timing. :)

    ReplyDelete
  8. "I admit I kind of like the idea of "marking" a paper as an applications paper in some way"

    I think Mike's suggestion is good. For what it's worth, SIGCOMM 2009 has "focus" tags such as "system implementation" in addition to the traditional topics classifiers.

    ReplyDelete
  9. Matt,

    Welcome to my world. I think that in today's system, research credit through real deployments comes not so much through traditional measures as through the respect of one's peers, as measured through your tenure letters. As we know, this measurement depends greatly on whom is asked.

    Over the years I have squeezed out quite a few papers because it is hard to build something serious without learning *something* new, which you can then publish. It is a harder sell to solve an old problem in a new way, but sometimes that can be done, too.

    Norman

    ReplyDelete
  10. I'd agree with Norman here. While building real systems isn't appropriately recognized in terms of publication count---and indeed, the incentives are in the opposite direction---I think they are well acknowledged by ones' peers at other points of your career (hires, promotions, awards, etc.).

    Speaking from a personal note, I only really published one real paper on CoralCDN, even though it ended up taking (portions) of many years. On the other hand, I think its "realness" ultimately had a huge impact on my job search. The same should be said, with even stronger emphasis, on Larry's work on PlanetLab. Remarkably few publications, but huge impact on the entire systems and networking community. And a large reason we probably have this thing called GENI (hopefully) around.

    ReplyDelete
  11. As another note, IMC would only consider papers for "Best Paper Awards" that agreed to publish their datasets. I always thought that was a great incentive device, one I wish we could somehow extend to other systems and networking conferences.

    ReplyDelete
  12. Many institutions limit access to their online information. Making this information available will be an asset to all.

    ReplyDelete
  13. Wonderful blog, i recently come to your blog through Google excellent knowledge keep on posting you guys.

    ReplyDelete
  14. This problem is worse for technology for developing regions for several reasons:

    1) the best solutions are simple, and simple solutions do not do well with reviewers!

    2) much of the value is the *discovery* of the problem; most fields reward problem discovery, but CS is not one of them. Once a problem is well defined, the solution tends to be too obvious for publication.

    3) applications in developing regions tend to be harder to execute, which is a kind of tax on the rest of the paper. (Given k hours for a paper, the higher the tax, the less time for innovation.)

    As we develop the IT for developing regions community (and conferences), we are trying to create a culture that values applications. On the plus side, the CHI community does this, so it is possible.

    -Eric

    ReplyDelete

Startup Life: Three Months In

I've posted a story to Medium on what it's been like to work at a startup, after years at Google. Check it out here.