Monday, July 9, 2012

In Defense of the Scientific Paper

http://www.flickr.com/photos/openeye/5428831/
Since leaving academia, I still find the time to serve on scientific program committees (recently NSDI, MobiSys, and SOCC) and have plenty of opportunity to read both good and bad scientific papers in various states of preparation. And although I am not required to publish papers in my current job, I certainly hope to do so -- a lot of the work we are doing at Google is imminently publishable -- it's just a matter of finding the time to sit down and write them!

Although I've blogged about how the scientific publication process needs fixing, I still feel that the process of writing a scientific paper is a hugely rewarding experience. Arguably, the primary value of scientific papers isn't in reading them, but writing them. You learn so much in the process.

Writing a paper sharpens your mental focus like nothing else. Like Japanese bonsai art or building a ship in a bottle, paper writing forces you to obsess over every meticulous detail -- word choice, overall tone, readability of graphs -- and of course more mundane details like font size and line spacing. This microscopic attention to every aspect of your work brings out a wonderful, if somewhat exhausting, intellectual rapture. I have never thought so clearly about a piece of research than when I'm in the throes of putting together a paper against a deadline.

You start with nothing, a blank editor window and some LaTeX boilerplate, some half-baked ideas, a few axes to grind and a tremendous apprehension at how much your life is going to suck between now and the deadline. You throw in all of the raw ingredients, the rough ideas, the broken implementation, the confusing data, the missing citations. Over a period of days or weeks you grind it and refine it and throw it out and start over and eventually hone the paper to a razor-sharp, articulate, polished fourteen pages of scientific beauty, and then just hope like hell that you didn't screw up the margins or forget to cite some important piece of related work.

I used to think that writing a paper was something you did after the research was over, but now I realize you should sit down to write the paper as early as possible -- sometimes before even starting the "research work" itself. On a few occasions, it wasn't until I started writing a paper that I knew what the hell the research project was really about. Case in point: Our SenSys 2009 paper on the Mercury wearable sensor platform came out of a project that had been running for nearly two years without a clear set of goals or any real insight into what the interesting research problems were. We had built a prototype and had some stuff working, but we didn't know what was publishable about it, and most of the problems we had to solve seemed mundane.


In a last-ditch measure to revive the project, I got the students together and said, fuck it, let's write a SenSys paper on this. As we started piecing together the story that we wanted to tell in the paper, we realized that none of our work to that point tackled the most important problem: how to ensure that the sensors produced good, and useful, data when there was a hard limit on battery lifetime. With the deadline just weeks away, the students pulled together and reimplemented the system from scratch and cranked out a ton of new measurements. The process of writing the paper resulted in a flood of new ideas, many of which bled over into my other projects, ultimately resulting in a half dozen papers and three PhD theses. It was awesome.


And even if a paper does not get accepted, crystallizing the ideas through the process of putting together the submission can be really energizing. I never assumed any paper I wrote would actually get accepted, so submitting the paper was often the start of a new line of work, riding on that clarity of thought that would emerge post-deadline (and a much-needed break of course).

Thursday, June 21, 2012

Google's Hybrid Approach to Research

This month's Communications of the ACM features an article on Google's Hybrid Approach to Research by Alfred Spector, Peter Norvig, and Slav Petrov. Since this is a topic I've blogged about here before, I thought I'd provide a quick pointer to the article:

http://cacm.acm.org/magazines/2012/7/151226-googles-hybrid-approach-to-research/fulltext

Overall I think the article does a nice job of summarizing Google's approach. The key takeaway is that Google doesn't separate its research and engineering activities: most "research" at Google happens during the day-to-day work of building products.

The benefit of this model is that it's easy to have real world impact, and the pace of innovation is fairly rapid, meaning research results get translated into products quickly. The possible downside is that you don't always get a chance to fork off  long-term (multi-year) projects that will take a long time to translate into a product. However, there are exceptions to this rule -- things like Google Glass, for example -- and plenty of things I can't talk about publicly. It is true that Google tends not to do "pure academic" research just for the purpose of publishing papers. We could have a healthy debate about whether this is good or bad, but I'll leave that for the comments...

Sunday, June 17, 2012

Startup University

The academic research process is incredibly inefficient when it comes to producing real products that shape the world. It can take decades for a good research idea to turn into a product - and of course most research never reaches this phase. However, I don't think it has to be that way: We could greatly accelerate the research-to-product pipeline if we could fix the academic value system and funding model.

Here's the problem: Some of the smartest people in the world have spent their entire careers building throwaway prototypes. I sure never built anything real until I moved to Google, after nearly ten years of college and grad school, and seven years as a faculty member. And by "real," I don't just mean a prototype that we developed for a couple of years and then threw away as soon as the papers got published. In effect, I "wasted" millions of dollars in funding, and countless man-years of development effort by my students and lab staff -- apart from a bunch of papers, nothing of practical value came out of my entire academic research career. (Maybe I'm being a little hard on myself, but let's take this as a given for sake of argument.) And I don't think my lack of real-world impact is at all unusual in a university setting.

What would the world be like if all of this hard work had actually translated into real, shipping products that people could use? How could we change the structure of academic research to close the gap between playing in the sandbox and making things real?

The plight of the academic is that there is often no direct way to translate ideas into reality -- you don't have the resources to do it at the university, and the academic process forces you to bounce between ideas every few years, rather than sticking it out to turn something into a product. In theory, academics are supposed to be patenting their ideas, and companies are supposed to come along and license the patents and turn them into real products. However, I am not aware of a single project from a computer science department that ever been commercialized through this route. This approach is more commonplace in fields like biotech, but in computer science it is rarely done.

A far more common (and successful) approach is for academics to spin out their own startups. However, this involves a high degree of risk (potentially career-ending for pre-tenure faculty), and many universities do not structure their sabbatical and leave policies to make this easy to do. Most universities also make starting a company painfully difficult when it comes to questions of IP ownership, licensing, and forcing the academic's research to be dissociated with their commercial activities. As a result, you get a bunch of super smart academics who play it safe and stay within their tenured faculty jobs, subsisting on grants and rarely commercializing their work. This means that a lot of great ideas never get beyond the prototype phase.

What I'd like to see is a university with a startup incubator attached to it, taking all of the best ideas and turning them into companies, with a large chunk of the money from successful companies feeding back into the university to fund the next round of great ideas. This could be a perpetual motion machine to drive research. Some universities have experimented with an incubator model, but I'm not aware of any cases where this resulted in a string of successful startups that funded the next round of research projects at that university.

Typically, when a startup spins off, the university gets a tiny slice of the pie, and the venture capitalists -- who fill the much-needed funding gap -- reap most of the benefits. But why not close the air gap between the research lab and the startup? Allow the faculty to stay involved in their offspring companies while keeping their research day job? Leverage the tremendous resources of a university to streamline the commercialization process -- e.g., use of space, equipment, IT infrastructure, etc.? Allow students to work at the startups for course credit or work-study without having to quit school? Maintain a regular staff of "serial entrepreneurs" who help get new startups off the ground? Connect the course curriculum to the fledgling startups, rather than teaching based on artificial problems? One might joke that some universities, like Stanford, effectively already operate in this way, but this is the exception rather than the rule.

It seems to me that bringing together the university model with the startup incubator would be a great benefit both for spinning out products and doing better research.

Monday, March 12, 2012

Do you need a PhD?

Since I decamped from the academic world to industry, I am often asked (usually by first or second year graduate students) whether it's "worth it" to get a PhD in Computer Science if you're not planning a research career. After all, you certainly don't need a PhD to get a job at a place like Google (though it helps). Hell, many successful companies (Microsoft and Facebook among them) have been founded by people who never got their undergraduate degrees, let alone a PhD. So why go through the 5-to-10 year, grueling and painful process of getting a PhD when you can just get a job straight out of college (degree or not) and get on with your life, making the big bucks and working on stuff that matters?

Doing a PhD is certainly not for everybody, and I do not recommend it for most people. However, I am really glad I got my PhD rather than just getting a job after finishing my Bachelor's. The number one reason is that I learned a hell of a lot doing the PhD, and most of the things I learned I would never get exposed to in a typical software engineering job. The process of doing a PhD trains you to do research: to read research papers, to run experiments, to write papers, to give talks. It also teaches you how to figure out what problem needs to be solved. You gain a very sophisticated technical background doing the PhD, and having your work subject to the intense scrutiny of the academic peer-review process -- not to mention your thesis committee.

I think of the PhD a little like the Grand Tour, a tradition in the 16th and 17th centuries where youths would travel around Europe, getting a rich exposure to high society in France, Italy, and Germany, learning about art, architecture, language, literature, fencing, riding -- all of the essential liberal arts that a gentleman was expected to have experience with to be an influential member of society. Doing a PhD is similar: You get an intense exposure to every subfield of Computer Science, and have to become the leading world's expert in the area of your dissertation work. The top PhD programs set an incredibly high bar: a lot of coursework, teaching experience, qualifying exams, a thesis defense, and of course making a groundbreaking research contribution in your area. Having to go through this process gives you a tremendous amount of technical breadth and depth.

I do think that doing a PhD is useful for software engineers, especially those that are inclined to be technical leaders. There are many things you can only learn "on the job," but doing a PhD, and having to build your own compiler, or design a new operating system, or prove a complex distributed algorithm from scratch is going to give you a much deeper understanding of complex Computer Science topics than following coding examples on StackOverflow.

Some important stuff I learned doing a PhD:

How to read and critique research papers. As a grad student (and a prof) you have to read thousands of research papers, extract their main ideas, critique the methods and presentation, and synthesize their contributions with your own research. As a result you are exposed to a wide range of CS topics, approaches for solving problems, sophisticated algorithms, and system designs. This is not just about gaining the knowledge in those papers (which is pretty important), but also about becoming conversant in the scientific literature.

How to write papers and give talks. Being fluent in technical communications is a really important skill for engineers. I've noticed a big gap between the software engineers I've worked with who have PhDs and those who don't in this regard. PhD-trained folks tend to give clear, well-organized talks and know how to write up their work and visualize the result of experiments. As a result they can be much more influential.

How to run experiments and interpret the results: I can't overstate how important this is. A systems-oriented PhD requires that you run a zillion measurements and present the results in a way that is both bullet-proof to peer-review criticism (in order to publish) and visually compelling. Every aspect of your methodology will be critiqued (by your advisor, your co-authors, your paper reviewers) and you will quickly learn how to run the right experiments, and do it right.

How to figure out what problem to work on: This is probably the most important aspect of PhD training. Doing a PhD will force you to cast away from shore and explore the boundary of human knowledge. (Matt Might's cartoon on this is a great visualization of this.) I think that at least 80% of making a scientific contribution is figuring out what problem to tackle: a problem that is at once interesting, open, and going to have impact if you solve it. There are lots of open problems that the research community is not interested in (c.f., writing an operating system kernel in Haskell). There are many interesting problems that have been solved over and over and over (c.f., filesystem block layout optimization; wireless multihop routing). There's a real trick to picking good problems, and developing a taste for it is a key skill if you want to become a technical leader.

So I think it's worth having a PhD, especially if you want to work on the hardest and most interesting problems. This is true whether you want a career in academia, a research lab, or a more traditional engineering role. But as my PhD advisor was fond of saying, "doing a PhD costs you a house." (In terms of the lost salary during the PhD years - these days it's probably more like several houses.)


Tuesday, February 7, 2012

My love affair with code reviews

One of the most life-altering events in my move from academia to industry was the discovery of code reviews. This is pretty standard fare for developers in the "real world", but I have never heard of an academic research group using them, and had never done code reviews myself before joining Google.

In short: Code reviews are awesome. Everyone should use them. Heck, my dog should use them. You should too.

For those of you not in the academic research community, you have to understand that academics are terrible programmers. (I count myself among this group.) Academics write sloppy code, with no unit tests, no style guidelines, and no documentation. Code is slapped together by grad students, generally under pressure of a paper deadline, mainly to get some graphs to look pretty without regard for whether anyone is ever going to run the code ever again. Before I came to Google, that was what "programming" meant to me: kind of a necessary side effect of doing research, but the result was hardly anything I would be proud to show my mother. (Or my dog, for that matter.) Oh, sure, I released some open source code as an academic, but now I shudder to think of anyone at a place like Google or Microsoft or Facebook actually reading that code (please don't, I'm begging you).

Then I came to Google. Lesson #1: You don't check anything in until it has been reviewed by someone else. This took some getting used to. Even an innocent four-line change to some "throw away" Python script is subject to scrutiny. And of course, most of the people reviewing my code were young enough to be my students -- having considered myself to be an "expert programmer" (ha!), it is a humbling experience for a 23-year-old one year out of college to show you how to take your 40 lines of crap and turn them into one beautiful, tight function -- and how to generalize it and make it testable and document the damn thing for chrissakes.

So there's a bunch of reasons to love code reviews:

Maintain standards. This is pretty obvious but matters tremendously. The way I think of it, imagine you get hit by a truck one day, and 100 years from now somebody who has never heard of your code gets paged at 3 a.m. because something you wrote was suddenly raising exceptions. Not only does your code have to work, but it also needs to make sense. Code reviews force you to write code that fits together, that adheres to the style guide, that is testable.

Catch bugs before you check in. God, I can't count the number of times someone has pointed out an obvious (or extremely subtle) bug in my code during the code review process. Having another pair of eyes (or often several pairs of eyes) looking at your code is the best way to catch flaws early.

Learn from your peers. I have learned more programming techniques and tricks from doing code reviews than I ever did reading O'Reilly books or even other people's code. A couple of guys on my team are friggin' coding ninjas and suggest all kinds of ways of improving my clunky excuse for software. You learn better design patterns, better approaches for testing, better algorithms by getting direct feedback on your code from other developers.

Stay on top of what's going on. Doing code reviews for other people is the best way to understand what's happening in complex codebase. You get exposed to a lot of different code, different approaches for solving problems, and can chart the evolution of the software over time -- a very different experience than just reading the final product.

I think academic research groups would gain a lot by using code reviews, and of course the things that go with them: good coding practices, a consistent style guide, insistence on unit tests. I'll admit that code quality matters less in a research setting, but it is probably worth the investment to use some kind of process.


The thing to keep in mind is that there is a social aspect to code reviews as well. At Google, you need an LGTM from another developer before you're allowed to submit a patch. It also takes a lot of time to do a good code review, so it's standard practice to break large changes into smaller, more review-friendly pieces. And of course the expectation is you've done your due diligence by testing your code thoroughly before sending it for review.

Don't code reviews slow you down? Somewhat. But if you think of code development as a pipeline, with multiple code reviews in the flight at a time you can still sustain a high issue rate, even if each individual patch has higher latency. Generally developers all understand that being a hardass on you during the review process will come back to bite them some day -- and they understand the tradeoff between the need to move quickly and the need to do things right. I think code reviews can also serve to build stronger teams, since everyone is responsible for doing reviews and ensuring the quality of the shared codebase. So if done right, it's worth it.


Okay, Matt. I'm convinced. How can I too join the code review bandwagon? Glad you asked. The tool we use internally at Google was developed by none other than Guido van Rossum, who has graciously released a similar system called Rietveld as open source. Basically, you install Rietveld on AppEngine, and each developer uses a little Python script to upload patches for review. Reviews are done on the website, and when the review is complete, the developer can submit the patch. Rietveld doesn't care which source control system you use, or where the repository is located -- it just deals with patches. It's pretty slick and I've used it for a couple of projects with success.

Another popular approach is to use GitHub's "pull request" and commenting platform as a code review mechanism. Individual developers clone a master repository, and submit pull requests to the owner of that repository for inclusion. GitHub has a nice commenting system allowing for code reviews to be used with pull requests.

I was floored the other day when I met an engineer from a fairly well-known Internet site who said they didn't use code reviews internally -- and complained about how messy the code was and how poorly designed some pieces were. No kidding! Code reviews aren't the ultimate solution to a broken design process, but they are an incredibly useful tool.

Monday, January 23, 2012

Making universities obsolete

Sebastian Thrun recently announced that he was leaving Stanford to found a free, online university called Udacity. This is based on his experiences teaching the famous intro to AI class, for free, to 160,000 students online.

Is this just Education for the Twitter Generation? Or truly a revolution in how we deliver higher education? Will this ultimately render universities obsolete?

I want to ponder the failings of the conventional higher education model for a minute and see where this leads us, and consider whether something like Udacity is really the solution.

Failure #1: Exclusivity.

In Sebatian's brilliant talk at DLD, he talks about being embarrassed that he was only able to teach a few tens of students at a time, and only to those who can afford $30,000 to attend Stanford. I estimate that I taught fewer than 500 students in total during my eight years on the faculty at Harvard. That's a pretty poor track record by any stretch.

It gets worse. I know plenty of faculty who love to give tough courses, in which they would teach really hard material at the beginning of the semester to "weed out" the weaker students, sometimes being left with only 2 or 3 really committed and really good students in the class. This is so much more satisfying as a professor, since you don't need to worry about tutoring the weaker students, and the fewer students you have, the less work you have to do grading and so on. There is no penalty for doing this - and rarely any incentive given for teaching a larger, more popular course.

Exclusivity is necessary when you only have so much classroom space, or so many dorms, or so many dining halls, so you have to be selective about who enters the hallowed gates of the university. It's also a way of maintaining a brand: even schools, like Harvard, with a "distance education" component go to great lengths to differentiate the "true" Harvard education from a "distance learning certificate," lest they raise the ire of the Old Boys' Network by watering down what it means to get a Harvard degree (not unlike the reaction they got when they started admitting women, way way back in 1977).

Failure #2: Grades.


Can someone remind me why we still have grades? I like what Sebastian says (quoting Salman Khan) about learning to ride a bicycle: It's not as if you get a D learning to ride a bike, then you stop and move onto learning the unicycle. Shouldn't the goal of every course be to get every student to the point of making an A+?

Apparently not. The common argument is that we need grades in order to differentiate the "good" from the "bad" students. Presumably the idea is that if you can't get through a course in the 12-to-13 week semester then you deserve to fail, regardless of whatever is going on in your life and whether you could have learned everything over a longer time span, or with more help, or whatever. And the really smart students, the ones who nail it the first time, and make A's in every class, need to float to the top so they get the first dibs on good jobs or law school or medical school or whatever rewards they have been working all of their young lives to achieve. It would not be fair if everyone made an A+ -- how would the privileged and smart kids gain any advantage over the less privileged, less intelligent kids?

It seems to me that this is completely at odds with the idea of education.

Failure #3: Lectures.


As Sebastian says, universities have been using the lecture format for more than a thousand years. I used to tell students that they were required to come to my lectures, and never provided my lectures by video, lest the students skipped class and watched it on YouTube from their dorms instead. Mostly this was to ensure that everybody in the class got the benefit of my dynamic and entertaining lecture style, which I worked so hard to perfect over the years (complete with a choreographed interpretive dance demonstrating the movement of the disk heads during a Log-Structured Filesystem cleaning operation.) But mostly it was to boost my ego and get some gratification for working so hard on the lectures, by having the students physically there in class as an audience.

Implications


I'm not sure whether Udacity and Khan Academy and iTunes University are really the solution to these problems. Clearly they are not a replacement for the conventional university experience -- you can't go to a frat party, or join a Finals Club, or make out in the library stacks while getting your degree from Online U. (At least not yet.)

But I think there are two important things that online universities bring to the table: (1) Broadening access to higher education, and (2) Leveraging technology to explore new approaches to learning.

The real question is whether broadening access ends up reinforcing the educational caste system: if you're not smart or rich enough to go to a "real university," you become one of those poor, second-class students with a certificate Online U. Would employers or graduate schools ever consider such a certificate, where everyone makes an A+, equivalent to an artium baccalaureus from the Ivy League school of your choice?

If not, is that because we truly believe that students are getting a better education sitting in a dusty classroom and having paid the proverbial $30,000 a year rather than doing the work online? This reminds me of my friends who have been through medical school, where the conventional wisdom is that doctors need to be trained using the classical methods (unbelievable amounts of rote memorization, soul-destroying clinical rotations and countless overnight shifts) because that's how it's been done for hundreds of years -- not because anybody thinks it yields better-trained doctors.

And I think universities have a long way to go towards embracing new technologies and new ways of teaching students. Sebastian makes a great point about the online AI class feeling more "intimate" to some students, in part because it really is a feeling of a one-on-one experience watching a video: you're not sitting in a big lecture hall surrounded by a bunch of other students, you're at home, in your PJs, drinking a beer and watching the video on your own laptop. A lot of this also has to do with Sebastian's teaching style, using a series of short quizzes that are auto-graded by the system. It is not just a lecture. For this reason I think that replacing live courses with videotaped lectures is not going far enough (and may in fact be detrimental).

Another benefit of the video delivery model is that you can replay it infinitely many times. Missed a point? Confused? Rewind and watch it again. What about questions? In large courses almost nobody asks questions, apart from the really smart students who should shut the hell up and not ask questions anyway. There are plenty of ways to deal with questions in an online course format, just not live, during a (limited time) lecture in which your question is likely going to annoy the rest of the class who almost certainly gets it already.

Risks


I'm going to close this little rant with a few caveats. It's fashionable to talk about "University 2.0" and How the Internet Changes Everything and disruptive technologies and all that. But a shallow, 18-minute video on the first 200 years of American History can't replace conventional coursework, deep reading, and essays. You can't tweet your way through college. Learning and teaching are hard work, and need to be taken seriously by both the student and educator.

Although expanding access to education is a great thing, it's simply not the case that everyone is smart enough to do well in any subject. For example, I'm terrible at math (which is why I'm a systems person, natch), and damn near failed to complete my CS theory course requirement at Berkeley as a result. Education should give everyone the opportunity to succeed, but the ultimate responsibility (and raw ability) comes down to the student.

Finally, it goes without saying that the most important experiences I ever had in college were outside of the classroom. I'm not just talking about staying up late and watching "Wayne's World" for the millionth time while drinking Zima, I'm talking about doing research, building things, learning from and being inspired by my fellow students. Making lectures obsolete is one thing; but I'm not sure there can ever be an online replacement for The College Experience writ large. Though 4Chan seems to be a pretty close approximation.

Monday, November 7, 2011

Research without walls

I recently signed the Research Without Walls pledge, which says that I will not do any peer review work for conferences, journals, or other scientific venues that do not make the results available for free via the Web. Like many scientists, I commit hundreds of hours a year to serving on program committees and reviewing journal papers, but the result of that (volunteer) work is essentially that the research results get locked behind a copyright license that is inconsistent with the way in which scientists actually disseminate their results -- for free, via the Web.

I believe that there is absolutely no reason for research results, especially those supported by public funding, not to be made open to the entire world. It's time for the computer science research community to move in this direction. Of course, this is going to mean a big change in the role of the professional societies, such as ACM and IEEE. It's time we made that change, as painful as it might be.


What is open access?

The issue of "open access research" often gets confused with questions such as where the papers are hosted, who owns the copyright, and whether authors are allowed to post their own papers on their website. In most cases, copyright in research publications is not held by the authors, but rather the professional societies that organize a conference or run a journal. For example, ACM and IEEE typically require authors to assign copyright to them, although they might grant the author a license to post their own research papers on their website. However, allowing authors to post papers on the Web is not the same as open access. It is an extremely limited license: posting papers on the Web does not give other scientists or students the right to share or archive those papers, or for anyone to use them for any other purpose other than downloading them for personal use. It is not unlike going to the library and borrowing a book; you still have to return it later, and you can't make copies for others.

With rare exception, every paper I have published is available for download on my website. In most cases, I have a license to do this; in others, I am probably in violation of copyright for doing so. The idea that I might get a cease-and-desist letter one day asking me to take down my own scientific papers bothers me to no end. I worked hard on those papers, and in most cases, spent hundreds of thousands of dollars of public funding to undertake the research that went into each of them.

For most of these publications, I even paid hundreds of dollars to the professional societies -- for membership fees and conference registrations for myself and my students -- to present the work at the associated conference. But yet, I don't own copyright in most of those works, and the main beneficiaries of all of this work are organizations like the ACM. It seems to me that these results should be open for everyone to benefit from, since, well, "we" (meaning, the taxpayers) paid for them.

ACM's Author-izer Service

Recently, the ACM announced a new service called the "Author-izer" (whoever came up with this name will be first against the wall when the revolution comes), that allows authors to generate free links to their publications hosted on the ACM Digital Library. This is not open access, either: this is actually a way for ACM to discourage the spread of "rogue posting" of PDF files and monetize access to the content down the road. For example, those free links will stop working when the website hosting them moves (e.g., when a student graduates). Essentially, ACM wants to control all access to "its" research library, and for good reason: it brings in a lot of revenue.

USENIX's open access policy


USENIX has a much more sane policy. Back in 2008, USENIX  announced that all of their conference proceedings would be open access, and indeed you can download PDFs of all USENIX papers from the corresponding conference website (see, for example, http://www.usenix.org/events/hotcloud11/tech/ for the proceedings from HotCloud'11).

USENIX does not ask authors to assign copyright to them. Instead, for one year from the publication date, USENIX gets an exclusive license to publish the work (both in print and electronic form), with the usual license granted back to the author to post copies on their website. After the one-year exclusivity period, USENIX retains a non-exclusive license to distribute the work forever. This is a good policy, though in my opinion it does not go far enough: USENIX does not require authors to release their work under an open access license. USENIX is kind enough to post PDFs for free on the Web, but tomorrow, USENIX could reverse this decision and put all of those papers behind a paywall, or take them down entirely. (No, I don't think this is going to happen, but you never know.)


University open access initiatives


Another way to fight back is for your home institution to require all of your work be made open. Harvard was one of the first major universities to do this. This ambitious effort, spearheaded by my colleague Stuart Shieber, required all Harvard affiliates to submit copies of their published work to the open-access Harvard DASH archive. While in theory this sounds great, there are several problems with this in practice. First, it requires individual scientists to do the legwork of securing the rights and submitting the work to the archive. This is a huge pain and most folks don't bother. Second, it requires that scientists attach a Harvard-supplied "rider" to the copyright license (e.g., from the ACM or IEEE) allowing Harvard to maintain an open-access copy in the DASH repository. Many, many publishers have pushed back on this. Harvard's response was to allow its affiliates to get an (automatic) waiver of the open-access requirement. Well, as soon as word got out that Harvard was granting these waivers, the publishers started refusing to accept the riders wholesale, claiming that the scientist could just request a waiver. So the publishers tend to win.

Creative Commons for research publications

The only way to ensure that research is de jure open access, rather than merely de facto, is by baking the open access requirement into the copyright license for the work. This is very much in the same spirit as the GPL is for software licensing. What I really want is for all research to be published under something like a Creative Commons Attribution 3.0 Unported license, allowing others to share, remix, and make commercial use of the work as long as attribution is given. This kind of license would prevent professional organizations from locking down research results, and give maximum flexibility for others to make use of the research, while retaining the conventional expectations of attribution. The "remix" clause might seem a little problematic, given that peer review expects original results, but the attribution requirement would not allow someone to submit work that is not their own and claim authorship. And there are many ways in which research can be legitimately remixed: incorporated into a talk, class notes, or collection, for example.

What happens to the publishers?


Traditional scientific publishers, like Elsevier, go out of business. I don't have a problem with that. One can make a strong argument that traditional scientific publishers have fairly limited value in today's world. It used to be that scientists needed publishers to disseminate their work; this has not been true for more than a decade.

Professional organizations, like ACM and IEEE, will need to radically change what they do if they want to stay alive. These organizations do many other things other than run conferences and journals. Unfortunately, a substantial amount of their operating budget comes from controlling access to scientific literature. Open access will drastically change that. Personally, I'd rather be a member of a leaner, more focused professional society that can focus its resources on education and policymaking than supporting a gazillion "Special Interest Groups" and journals that nobody reads.

Seems to me that USENIX strikes the right balance: They focus on running conferences. Yes, you pay through the nose to attend these events, though it's not any more expensive than a typical ACM or IEEE conference. I really do not buy the argument that an ACM-sponsored conference, even one like SOSP, is any better than one run by USENIX. Arguably USENIX does a far better job at running conferences, since they specialize in it. ACM shunts most of the load of conference organization onto inexperienced academics, with predictable results.


A final word

I can probably get away with signing the Research Without Walls pledge because I no longer rely on service on program committees to further my career. (Indeed, the pledge makes it easier for me to say no when asked to do these things.) Not surprisingly, most of the signatories of the pledge have been from industry. To tell an untenured professor that they should sign the pledge and, say, turn down a chance to serve on the program committee for SOSP, would be a mistake.  But this is not to say that academics can't promote open access in other ways: for example, by always putting PDFs on their website, or preferentially sending work to open access venues.

ObDisclaimer: This is my personal blog. The views expressed here are mine alone and not those of my employer.

Startup Life: Three Months In

I've posted a story to Medium on what it's been like to work at a startup, after years at Google. Check it out here.