Tuesday, March 31, 2009

Programming Haiku

Today in my graduate course we discussed the Berkeley snlog system. This is a declarative programming language for sensor nets, based on Datalog, and derived from the authors' previous work on P2.

Mike Lyons, one of the Ph.D. students in the course, made an interesting observation. He pointed out that programming in these very concise, domain-specific languages is like "writing haiku." The idea behind these various languages is to make programming sensornets simpler by abstracting away details and making the code tighter and easier to grok. Mike observed that he'd "rather write 10 pages of C code than 10 lines of Datalog," which is a very good point -- and one that speaks to a number of projects that equate "fewer lines of code" with "ease of programming."

A number of projects, including several of my own (Flask, Regiment), have focused on simplifying programming for sensor nets. The idea is that building up complex distributed behaviors from low-level programs, usually implemented in C, is too painful for non-experts. The main challenge is grappling with the complex dynamics that arise when you have uncertain environmental conditions, message loss, node failure, and so forth. This is difficult in conventional distributed systems, and even more painful in the extremely resource-challenged domain of sensor nets. To this end, there has been a slew of sensor net language papers, ranging from simple node-based languages to more ambitious, network-wide programming environments. For those without the patience to run them all down, Luca Mottola's PhD thesis does a great job at surveying the field.

Evaluating a language paper, especially in a relatively new domain, is often challenging. In sensor nets, CPU performance is not a good metric -- often we care more about energy consumption, communication overheads, and robustness to failure. Evaluating "ease of programming" is much more difficult. The most common quantitative measure is lines of code, but this is potentially misleading. As Mike pointed out today, the more compact a language is, the more important it is that every line of code is absolutely right -- there's less wiggle room or flexibility to explore alternative ways of implementing the same thing.

Peter Dinda's group at Northwestern has done an actual user study (gasp!) of a simple, BASIC-based language for sensor nets. While this is far better methodology, I'm worried that such an approach is too heavily biased towards absolute neophyte programmers, and this tells us very little about how effective the language will be for writing real applications. After all, most domain scientists who want to employ sensor nets are already programmers (though often in languages like MATLAB). So what is good for a user study may not be good for actual users.

So maybe we need to get away from "ease of programming" as our key metric, and focus on what domain experts really need to leverage sensor nets. It's clear that people can learn a new language if it will help them get their work done. In the end, the language is probably a lot less important than the mental abstraction required to capture a given program.

Sunday, March 29, 2009

I heart Kindle.app

Amazon recently released a free Kindle e-book reader for the iPhone, and I love it. Normally I don't shill products, but I was pretty skeptical about this one and have been pleasantly surprised at how good it is. I've been reading Denis Johnson's Tree of Smoke on my iPhone over the last couple of weeks -- mostly at the gym but also during a few long flights. It's a long book -- over 600 pages -- and having it in my pocket at all times has made it much easier to read bits and pieces whenever I get a chance.

The app is dead simple to use: you simply flick left or right to turn pages, and it automatically remembers your place so that when you relaunch the app you are back where you left off. In the current version, you have to buy e-books via the Web, and the next time the app launches it downloads the content to your phone. I guess this is not so great for spur-of-the-moment purchases while getting ready to board a flight, but my understanding is that a future version will let you buy content directly from the iPhone. As far as I can tell, it eats very little power -- not surprising, but nice to know in case you're worried that spending a few hours reading will drain the battery.

Amazon is definitely undercutting their own Kindle e-book reader by providing this app, and they claim that it was mainly intended as a stop-gap for Kindle owners who want to do a bit of reading in places like the supermarket check-out line. (Is anyone so impatient they really need to read a book while waiting to buy groceries?)

What's it like to read a book on the iPhone? Well, the app lets you pick the font size, which is nice. For reading on the elliptical machine I crank it up fairly large, which means flicking to the next page every 10-15 seconds. When I can hold the phone close and steady I use a smaller font. Actually, I tend to prefer the narrow column of text as it helps me keep my place through saccadic movements -- not unlike some speed-reading techniques -- as I tend to get lost on a large page of text. As it turns out, I actually prefer reading on the iPhone rather than a paper book -- odd.

The best part is that more than 250,000 titles are available and I can use a device I carry in my pocket at all times -- no need for another gadget, charger, USB cable, what have you. I was sorely tempted to buy a Kindle before this app came out, but now there's really no need.

Tuesday, March 24, 2009

Visit to Utah

I had a great visit to the University of Utah this week, and gave a distinguished lecture on "A New Era of Resource Responsibility for Sensor Networks." I had never visited Utah before, and am pretty impressed with their CS department overall. The folks there seem to get along very well and have considerable strength in graphics, languages, and embedded systems in particular. Their Scientific Computing Institute could be a model for what we've been doing at Harvard with the Initiative in Innovative Computing.

Of course, Utah is famous for the Flux research group, led by the late and great Jay Lepreau. Jay was one of my role models and I admired his approach to building real systems and getting others to use them (our MoteLab testbed was heavily inspired by Emulab.) I'm sorry that I never got a chance to visit Utah while Jay was still with us.

One thing that struck me was that the group is built around full-time research staff, which has enabled them to build substantial research infrastructure (such as Emulab) and continue to expand and maintain it beyond the typical timeframe of a graduate student. (It's also true that research staff tend to think less in terms of papers+thesis and more in terms of writing useful code. It's sad that these things are not always compatible in the course of academic research.) It's not a model I've seen used much in other systems groups -- likely because it requires a lot of funding to make it sustainable. Then again, Jay was a powerhouse when it came to bringing in research funds.

It's great to see that the Flux group is still going strong. I'm sure Jay would be really glad to see it.

(By the way, HotOS just won't feel the same without Jay there. We need to appoint an honorary Lepreau Proxy. Any volunteers to ask snarky, meandering questions after each talk?)

Friday, March 20, 2009

Princeton SNS group blog

Apropos of my earlier post on blogging a research project, Mike Freedman at Princeton has started a group blog where they talk about some of their ongoing, and past, projects. For example, he's posting a series of articles on experiences with developing and deploying CoralCDN. Way cool!

Wednesday, March 11, 2009

Top Prof

The success of reality TV shows that feature creative professionals in competitive situations has convinced me that it is time for a reality show for CS professors. Let me propose TOP PROF -- a new Bravo series featuring 12 junior CS profs all competing for the ultimate prize -- tenure at a top department, say -- in which each week one professor is voted out of the department until only one is left. The judges would include recent Turing award winner Barbara Liskov, the irascible but highly respected Andy Tanenbaum, industry bigwig and firebrand Al Spector, and (for comedic relief and 80's throwback cachet) the voice of WOPR from War Games.

Each week the contestants would have to face a different challenge that tests their ability to be the Top Prof. For example:
  • Review 25 conference paper submissions in 72 hours flat;
  • Prepare an undergraduate lecture on a topic you haven't seen since sophomore year;
  • Write a multimillion dollar grant proposal with six Co-PIs from three other universities in a week (and get all of the budget spreadsheets to work out!);
  • Juggle submitting 4 papers to the same conference, including one poorly written paper by a new grad student that requires a last-minute rewrite;
  • Write a dozen thoughtful recommendation letters for students you only saw briefly in your class three years ago;
  • Respond to hundreds of emails with no more than one-line responses; and
  • Hold a series of back-to-back half-hour meetings all day and manage to stay awake at the colloquium talk at 4pm.
I think this would do wonders to raise the stature of our field. Any volunteers?

Tuesday, March 3, 2009

Corporate sponsorship in Computer Science research

Luis von Ahn came up with a great idea to bolster research funding in an ailing economy -- corporate sponsorship of professors wearing logos while they teach. I wholly support this idea, especially since I currently wear t-shirts from companies like Microsoft, Amazon, and Intel for free. These companies really should be paying me for my advertising their brands to my students, given my ridiculously high degree of influence over them.

Add ImageThe recent New York Times article about drug company ties to medical schools got me thinking about why there isn't a similar controversy with corporate sponsorship of computer scientists. After all, many of us get research funding from companies, and much of that funding comes with an explicit (or implicit) assumption that we will leverage that company's technology in our work. For example, Microsoft gave my group a research grant last year to link our CitySense system into their way-cool SensorMap platform. This was clearly a blatant attempt by Microsoft to get us to use their technology, but how could we resist? We need the funding, and the only ethically dubious aspect of the project was having to program in C#. No pain, no gain, right? (Notice how I used the expression "way cool" in this paragraph? How's that for subtle?)

Apart from research grants, some companies grease us up with really serious perks! For example, Microsoft flies hundreds of CS professors out to Redmond every year for an intense three-day "Faculty Summit," the capstone of which is usually a cruise on Lake Washington with an open bar. Believe me, there is nothing that says "high roller" more than being crammed on a boat with a few hundred geeky CS profs drinking cheap chardonnay after a long day hearing talks about .NET and Windows Media Services. The coolest part is we get these really spiffy fleece vests to take home with us, which would be great for keeping warm in the New England winters except they have no arms. Those dermatologists getting free golf vacations to St. Kitts have no idea what they're missing!

It's clear that this flagrant corporate brainwashing is starting to trickle down to the educational mission as well. Hundreds of universities have ditched standard, non-proprietary languages like C and are now teaching their intro CS courses in -- gasp! -- Java. Here at Harvard, we're letting this nutjob from Sun named Jim Waldo teach a distributed systems course, knowing full well that he uses the entire semester to indoctrinate our students in how to program Java RMI. Stanford is even offering a course on iPhone programming. Both MIT and Stanford named their CS buildings after Bill Gates, and Berkeley named a lounge after -- get this! -- Steve Wozniak. What's next? The Werner Vogels Library? The Sergey Brin and Larry Page Annual Easter-Egg Hunt?

Where do we draw the line with this nonsense?

So I think there is a real crisis here and it's clear the NY Times is just not paying attention. I for one applaud those brave Harvard medical students who have dared to stand up to the insidious support of the drug companies with the expectation that rejecting corporate sponsorship will endow them with a "pure" education. It's only a matter of time before a bunch of CS students wise up to what is going on and do likewise. They may even go so far as to start a Facebook group to protest. Now that would really be something.

Saturday, February 28, 2009

The plight of the poor application paper

My research is unapologetically applications-driven: we've deployed sensor networks for monitoring volcanoes, disaster response, and for measuring limb movements in patients with Parkinson's Disease. One of the joys of working on sensor networks is that a lot of exciting research derives from close collaborations with domain experts, shedding light on challenges that we wouldn't otherwise be exposed to. It also keeps us in check and ensures we're working on real problems, rather than artificial ones.

At the same time, it's a sad truth that "deployment" or "application" papers often face an uphill battle when it comes to getting published in major conferences. I've seen plenty of (good!) application-focused papers get dinged in program committees for, well, simply not being novel enough. Now, we could have a healthy argument about the inherent novelty of building a real system, getting it to work, deploying it in a challenging field setting, and reporting on the results. But it's true that these papers are pretty different than those about a new protocol, algorithm, or language. I've thought a bit about what makes it harder for apps papers to get into these venues and have come up with the following observations.

1) Getting something to work in the real world often involves simplifying it to the point where most of the "sexy" ideas are watered down.

It is very rare for a successful sensor network deployment to involve brand-new, never-before-published techniques; doing so would involve a tremendous amount of risk. Generally it's necessary to use fairly robust code that embodies well-worn ideas, at least for the underpinnings of the system design (MAC, routing, time sync, and so forth). As a result, the components of the system design might end up not being very novel. Also, many application papers involve a combination of several "well known" techniques, but combined together in interesting ways. Still, when a reviewer picks apart a paper piece by piece, it's hard to identify the individual contributions. The hope is that the whole is greater than the sum of the parts; but this is often difficult to convey.

There is a way to avoid this problem, and that is to write the paper about something other than the "mundane" aspects of the system design itself. For our OSDI paper on the volcano sensor network, we decided to focus on the validation of the network's operation during the deployment, not the individual pieces that made up the system. Although it took a lot of work to take the "well-tested" implementations of major components (such as MultihopLQI) and get them to work robustly in the field, we didn't think the paper could rest on that refinement of previously-published ideas. The Berkeley paper on monitoring redwoods took a similar approach by focusing on the data analysis.

2) Academic research tends to reward those who come up with an idea first, not those who get the idea to work.

There are lots of great ideas in the literature that have only been studied in simulation or small-scale experiments. Almost no credit goes to those who manage to get an idea actually deployed and working under less certain conditions. So even though it might take an incredible amount of sweat to take, say, a routing protocol and get it working on real hardware in a large-scale field deployment, unless you ended up making substantial changes to the protocol, or learned something new about its operation, you're unlikely to get much credit for doing so.

We learned this the hard way with our paper on adapting the ADMR multicast protocol to work on motes, which we needed for the CodeBlue medical monitoring platform. It turns out that taking an existing protocol (which had only been studied using ns-2 with a simplistic radio model, and without consideration for memory or bandwidth limitations of mote-class devices), and implementing it on real hardware, didn't blow away the program committees the way we hoped it would. Eventually, we did publish this work (in the aptly-named REALMAN workshop). But the initial reviews contained things like "everybody knows that MANET protocols won't work on motes!" That was frustrating.

3) Deployments carry a substantial risk that the system won't actually work, making it harder to convince a reviewer that the paper is worth accepting.

Maybe there should be a built-in handicap for real deployment papers. Whereas in the lab, you can just keep tweaking and rerunning experiments until you get the results you want, this isn't possible in the field. On the other hand, it's not clear that we can really hold deployment papers to a different standard; after all, what constitutes a "real" deployment? Is an installation of nodes around an academic office building good enough? (We've seen plenty of those. If the world ever wants to know the average temperature or light level of the offices in a CS department, we are ready!) Or does it have to be in some gritty, untethered locale, like a forest, or a glacier? Does use of machetes and/or pack animals to reach the deployment site count for anything?

Of course, it is possible to get a great paper out of a deployment that goes sideways. The best way is to write the paper as a kind of retrospective, explaining what went wrong, and why. These papers are often entertaining to read, and provide valuable lessons for those attempting future work along the same lines. Also, failures can often take your research into entirely new directions, which I've blogged about before. As an example, we ended up developing Lance specifically to address the data quality challenges that arose in our deployment at Reventador. We would have never stumbled across that problem had our original system worked as planned.

One thing I don't think we should do is sequester deployment and application papers in their own venues, for example, by having a workshop on sensor networks applications. I understand the desire to get like-minded people together to share war stories, but I think it's essential that these kinds of papers be given equal billing with papers on more "fundamental" topics. In the best case, they can enrich an otherwise dry technical program, as well as inpire and inform future research. Besides, the folks who would go to such a workshop don't need to be convinced of the merits of application papers.

Personally, I'd like to see a bunch of real deployment papers submitted to Sensys 2009. Jie and I are thinking of ways of getting the program committee to think outside the box when reviewing these papers, and any suggestions as to how we should encourage a more open-minded perspective are most welcome.

Startup Life: Three Months In

I've posted a story to Medium on what it's been like to work at a startup, after years at Google. Check it out here.