Monday, June 1, 2009

Making collaborations work

I've been fortunate to have several productive collaborations with domain scientists in fields such as seismology, emergency medicine, rehabilitation medicine, and public health. One of the exciting things about sensor networks is that they open up avenues for this kind of cross-disciplinary work, but there are always challenges in getting new collaborations off the ground. I've been doing some thinking about some of the keys to successful links between CS and domain science.

Mutual respect for each other's field. I often hear CS people say that "those physicists" (for example) just need CS people to "make their code run fast." It's a conceit that only computer scientists know how to program -- the domain scientists I've worked with often build and maintain complex software systems. By the same token, it's really helpful when domain scientists "gets" what turns a computer scientist on -- not just implementing something to get the job done, but trying to do it in the right way, or more efficiently, generally, or elegantly than what you might try at first blush. If each side sees a caricature of the other it's harder to find common ground where you can work together for mutual benefit.

Research potential. Of course, for a collaboration to make sense there has to be publishable research on both the CS and the domain science side. Unfortunately, when first starting out it is rarely evident that this is the case, so some amount of blind faith is necessary to get the ball rolling. Hopefully, by the time you've solved the "easy" problems you've opened the door to some really exciting "hard" problems. I'm better now than I used to be at determining whether a given collaboration has long-term potential, but it's not always straightforward. As an example, when we first started working with clinicians at the Spaulding Rehabilitation Hospital, most of the work was straightforward hacking to stream data from a set of motes to a laptop base station. This has evolved into a long-term research effort involving innovations in sensor network resource management, signal processing, and data quality optimizations. This was not obvious when we first started the project.

Patience. Cross-domain collaborations take a lot of time to get established and yield results. This steep learning curve definitely slows down the research effort. Our first volcano sensor network deployment in 2004 was really about us learning how the seismologists do field work, and how they collect and analyze data. It wasn't for another year that we were able to go back to the field with a system that was remotely useful. Likewise, our colleagues in seismology haven't yet been able to directly use the data we have collected with our sensor networks, since it is not a direct replacement for how they do their work. All the same, we've been able to publish a number of papers together and have learned many things that have fed into our ongoing research and no small number of grant proposals.

Blind luck. It's also true that I've been extremely lucky to find some of the collaborators that I have. I got connected with seismology through a former student of Margo Seltzer's who happened to do some summer work with Jonathan Lees, a geophysicist at UNC who happens to work on volcanoes (and is a huge geek to boot). Some of our medical projects got started by people simply coming up to me after I gave a talk on my work. It helps to be open to opportunities, though it involves building many "bridges to nowhere." In the end I think it's been worth it.


  1. Observing other fields up close, what do you think CS systems folks do better? Flip side, what can we learn?

  2. I think we have a lot to learn from other fields about how we collect and analyze data. So much of CS systems research involves running experiments in controlled settings, and re-running them until you get the results you want. Doing science in the real world means dealing with messy, bad, or incomplete data that you can't sweep under the carpet.

    Domain scientists also have a certain discipline to managing their datasets that I think CS researchers lack - for example, careful timestamping and annotation. Many CS people invent their own approaches and often only after they've been burned by losing some important data (or losing track of where it came from).

    It's a good question what domain scientists could learn from us. I think we generally do a better job at automating tedious processes (like data analysis) and have a lot of tools for things like managing code repositories (SVN and the rest). Would love to hear what others think...