Skip to main content

Making collaborations work

I've been fortunate to have several productive collaborations with domain scientists in fields such as seismology, emergency medicine, rehabilitation medicine, and public health. One of the exciting things about sensor networks is that they open up avenues for this kind of cross-disciplinary work, but there are always challenges in getting new collaborations off the ground. I've been doing some thinking about some of the keys to successful links between CS and domain science.

Mutual respect for each other's field. I often hear CS people say that "those physicists" (for example) just need CS people to "make their code run fast." It's a conceit that only computer scientists know how to program -- the domain scientists I've worked with often build and maintain complex software systems. By the same token, it's really helpful when domain scientists "gets" what turns a computer scientist on -- not just implementing something to get the job done, but trying to do it in the right way, or more efficiently, generally, or elegantly than what you might try at first blush. If each side sees a caricature of the other it's harder to find common ground where you can work together for mutual benefit.

Research potential. Of course, for a collaboration to make sense there has to be publishable research on both the CS and the domain science side. Unfortunately, when first starting out it is rarely evident that this is the case, so some amount of blind faith is necessary to get the ball rolling. Hopefully, by the time you've solved the "easy" problems you've opened the door to some really exciting "hard" problems. I'm better now than I used to be at determining whether a given collaboration has long-term potential, but it's not always straightforward. As an example, when we first started working with clinicians at the Spaulding Rehabilitation Hospital, most of the work was straightforward hacking to stream data from a set of motes to a laptop base station. This has evolved into a long-term research effort involving innovations in sensor network resource management, signal processing, and data quality optimizations. This was not obvious when we first started the project.

Patience. Cross-domain collaborations take a lot of time to get established and yield results. This steep learning curve definitely slows down the research effort. Our first volcano sensor network deployment in 2004 was really about us learning how the seismologists do field work, and how they collect and analyze data. It wasn't for another year that we were able to go back to the field with a system that was remotely useful. Likewise, our colleagues in seismology haven't yet been able to directly use the data we have collected with our sensor networks, since it is not a direct replacement for how they do their work. All the same, we've been able to publish a number of papers together and have learned many things that have fed into our ongoing research and no small number of grant proposals.

Blind luck. It's also true that I've been extremely lucky to find some of the collaborators that I have. I got connected with seismology through a former student of Margo Seltzer's who happened to do some summer work with Jonathan Lees, a geophysicist at UNC who happens to work on volcanoes (and is a huge geek to boot). Some of our medical projects got started by people simply coming up to me after I gave a talk on my work. It helps to be open to opportunities, though it involves building many "bridges to nowhere." In the end I think it's been worth it.


  1. Observing other fields up close, what do you think CS systems folks do better? Flip side, what can we learn?

  2. I think we have a lot to learn from other fields about how we collect and analyze data. So much of CS systems research involves running experiments in controlled settings, and re-running them until you get the results you want. Doing science in the real world means dealing with messy, bad, or incomplete data that you can't sweep under the carpet.

    Domain scientists also have a certain discipline to managing their datasets that I think CS researchers lack - for example, careful timestamping and annotation. Many CS people invent their own approaches and often only after they've been burned by losing some important data (or losing track of where it came from).

    It's a good question what domain scientists could learn from us. I think we generally do a better job at automating tedious processes (like data analysis) and have a lot of tools for things like managing code repositories (SVN and the rest). Would love to hear what others think...


Post a Comment

Popular posts from this blog

Why I'm leaving Harvard

The word is out that I have decided to resign my tenured faculty job at Harvard to remain at Google. Obviously this will be a big change in my career, and one that I have spent a tremendous amount of time mulling over the last few months.

Rather than let rumors spread about the reasons for my move, I think I should be pretty direct in explaining my thinking here.

I should say first of all that I'm not leaving because of any problems with Harvard. On the contrary, I love Harvard, and will miss it a lot. The computer science faculty are absolutely top-notch, and the students are the best a professor could ever hope to work with. It is a fantastic environment, very supportive, and full of great people. They were crazy enough to give me tenure, and I feel no small pang of guilt for leaving now. I joined Harvard because it offered the opportunity to make a big impact on a great department at an important school, and I have no regrets about my decision to go there eight years ago. But m…

Rewriting a large production system in Go

My team at Google is wrapping up an effort to rewrite a large production system (almost) entirely in Go. I say "almost" because one component of the system -- a library for transcoding between image formats -- works perfectly well in C++, so we decided to leave it as-is. But the rest of the system is 100% Go, not just wrappers to existing modules in C++ or another language. It's been a fun experience and I thought I'd share some lessons learned.

Why rewrite?

The first question we must answer is why we considered a rewrite in the first place. When we started this project, we adopted an existing C++ based system, which had been developed over the course of a couple of years by two of our sister teams at Google. It's a good system and does its job remarkably well. However, it has been used in several different projects with vastly different goals, leading to a nontrivial accretion of cruft. Over time, it became apparent that for us to continue to innovate rapidly wo…

Running a software team at Google

I'm often asked what my job is like at Google since I left academia. I guess going from tenured professor to software engineer sounds like a big step down. Job titles aside, I'm much happier and more productive in my new role than I was in the 8 years at Harvard, though there are actually a lot of similarities between being a professor and running a software team.

I lead a team at Google's Seattle office which is responsible for a range of projects in the mobile web performance area (for more background on my team's work see my earlier blog post on the topic). One of our projects is the recently-announced data compression proxy support in Chrome Mobile. We also work on the PageSpeed suite of technologies, specifically focusing on mobile web optimization, as well as a bunch of other cool stuff that I can't talk about just yet.

My official job title is just "software engineer," which is the most common (and coveted) role at Google. (I say "coveted&quo…