Monday, November 7, 2011

Research without walls

I recently signed the Research Without Walls pledge, which says that I will not do any peer review work for conferences, journals, or other scientific venues that do not make the results available for free via the Web. Like many scientists, I commit hundreds of hours a year to serving on program committees and reviewing journal papers, but the result of that (volunteer) work is essentially that the research results get locked behind a copyright license that is inconsistent with the way in which scientists actually disseminate their results -- for free, via the Web.

I believe that there is absolutely no reason for research results, especially those supported by public funding, not to be made open to the entire world. It's time for the computer science research community to move in this direction. Of course, this is going to mean a big change in the role of the professional societies, such as ACM and IEEE. It's time we made that change, as painful as it might be.


What is open access?

The issue of "open access research" often gets confused with questions such as where the papers are hosted, who owns the copyright, and whether authors are allowed to post their own papers on their website. In most cases, copyright in research publications is not held by the authors, but rather the professional societies that organize a conference or run a journal. For example, ACM and IEEE typically require authors to assign copyright to them, although they might grant the author a license to post their own research papers on their website. However, allowing authors to post papers on the Web is not the same as open access. It is an extremely limited license: posting papers on the Web does not give other scientists or students the right to share or archive those papers, or for anyone to use them for any other purpose other than downloading them for personal use. It is not unlike going to the library and borrowing a book; you still have to return it later, and you can't make copies for others.

With rare exception, every paper I have published is available for download on my website. In most cases, I have a license to do this; in others, I am probably in violation of copyright for doing so. The idea that I might get a cease-and-desist letter one day asking me to take down my own scientific papers bothers me to no end. I worked hard on those papers, and in most cases, spent hundreds of thousands of dollars of public funding to undertake the research that went into each of them.

For most of these publications, I even paid hundreds of dollars to the professional societies -- for membership fees and conference registrations for myself and my students -- to present the work at the associated conference. But yet, I don't own copyright in most of those works, and the main beneficiaries of all of this work are organizations like the ACM. It seems to me that these results should be open for everyone to benefit from, since, well, "we" (meaning, the taxpayers) paid for them.

ACM's Author-izer Service

Recently, the ACM announced a new service called the "Author-izer" (whoever came up with this name will be first against the wall when the revolution comes), that allows authors to generate free links to their publications hosted on the ACM Digital Library. This is not open access, either: this is actually a way for ACM to discourage the spread of "rogue posting" of PDF files and monetize access to the content down the road. For example, those free links will stop working when the website hosting them moves (e.g., when a student graduates). Essentially, ACM wants to control all access to "its" research library, and for good reason: it brings in a lot of revenue.

USENIX's open access policy


USENIX has a much more sane policy. Back in 2008, USENIX  announced that all of their conference proceedings would be open access, and indeed you can download PDFs of all USENIX papers from the corresponding conference website (see, for example, http://www.usenix.org/events/hotcloud11/tech/ for the proceedings from HotCloud'11).

USENIX does not ask authors to assign copyright to them. Instead, for one year from the publication date, USENIX gets an exclusive license to publish the work (both in print and electronic form), with the usual license granted back to the author to post copies on their website. After the one-year exclusivity period, USENIX retains a non-exclusive license to distribute the work forever. This is a good policy, though in my opinion it does not go far enough: USENIX does not require authors to release their work under an open access license. USENIX is kind enough to post PDFs for free on the Web, but tomorrow, USENIX could reverse this decision and put all of those papers behind a paywall, or take them down entirely. (No, I don't think this is going to happen, but you never know.)


University open access initiatives


Another way to fight back is for your home institution to require all of your work be made open. Harvard was one of the first major universities to do this. This ambitious effort, spearheaded by my colleague Stuart Shieber, required all Harvard affiliates to submit copies of their published work to the open-access Harvard DASH archive. While in theory this sounds great, there are several problems with this in practice. First, it requires individual scientists to do the legwork of securing the rights and submitting the work to the archive. This is a huge pain and most folks don't bother. Second, it requires that scientists attach a Harvard-supplied "rider" to the copyright license (e.g., from the ACM or IEEE) allowing Harvard to maintain an open-access copy in the DASH repository. Many, many publishers have pushed back on this. Harvard's response was to allow its affiliates to get an (automatic) waiver of the open-access requirement. Well, as soon as word got out that Harvard was granting these waivers, the publishers started refusing to accept the riders wholesale, claiming that the scientist could just request a waiver. So the publishers tend to win.

Creative Commons for research publications

The only way to ensure that research is de jure open access, rather than merely de facto, is by baking the open access requirement into the copyright license for the work. This is very much in the same spirit as the GPL is for software licensing. What I really want is for all research to be published under something like a Creative Commons Attribution 3.0 Unported license, allowing others to share, remix, and make commercial use of the work as long as attribution is given. This kind of license would prevent professional organizations from locking down research results, and give maximum flexibility for others to make use of the research, while retaining the conventional expectations of attribution. The "remix" clause might seem a little problematic, given that peer review expects original results, but the attribution requirement would not allow someone to submit work that is not their own and claim authorship. And there are many ways in which research can be legitimately remixed: incorporated into a talk, class notes, or collection, for example.

What happens to the publishers?


Traditional scientific publishers, like Elsevier, go out of business. I don't have a problem with that. One can make a strong argument that traditional scientific publishers have fairly limited value in today's world. It used to be that scientists needed publishers to disseminate their work; this has not been true for more than a decade.

Professional organizations, like ACM and IEEE, will need to radically change what they do if they want to stay alive. These organizations do many other things other than run conferences and journals. Unfortunately, a substantial amount of their operating budget comes from controlling access to scientific literature. Open access will drastically change that. Personally, I'd rather be a member of a leaner, more focused professional society that can focus its resources on education and policymaking than supporting a gazillion "Special Interest Groups" and journals that nobody reads.

Seems to me that USENIX strikes the right balance: They focus on running conferences. Yes, you pay through the nose to attend these events, though it's not any more expensive than a typical ACM or IEEE conference. I really do not buy the argument that an ACM-sponsored conference, even one like SOSP, is any better than one run by USENIX. Arguably USENIX does a far better job at running conferences, since they specialize in it. ACM shunts most of the load of conference organization onto inexperienced academics, with predictable results.


A final word

I can probably get away with signing the Research Without Walls pledge because I no longer rely on service on program committees to further my career. (Indeed, the pledge makes it easier for me to say no when asked to do these things.) Not surprisingly, most of the signatories of the pledge have been from industry. To tell an untenured professor that they should sign the pledge and, say, turn down a chance to serve on the program committee for SOSP, would be a mistake.  But this is not to say that academics can't promote open access in other ways: for example, by always putting PDFs on their website, or preferentially sending work to open access venues.

ObDisclaimer: This is my personal blog. The views expressed here are mine alone and not those of my employer.

Friday, November 4, 2011

Highlights from SenSys 2011

ACM SenSys 2011 just wrapped up this week in Seattle. This is the premier conference in the area of wireless sensor networks, although lately the conference has embraced a bunch of other technologies, including sensing on smartphones and micro-air vehicles. It's an exciting conference and brings together a bunch of different areas.

Rather than a full trip report, I wanted to quickly write up two highlights of the conference: The keynote by Michel Maharbiz on cybernetic beetles (!), and an awesome talk by James Biagioni on using smartphone data to automatically determine bus routes and schedules.

Keynote by Mich Maharbiz - Cyborg beetles: building interfaces between the synthetic and the multicellular

Mich is a professor at Berkeley and works in the interface between biology and engineering. His latest project is to adding a "remote control" circuit to a live insect -- a large beetle -- allowing one to control the flight of the insect. Basically, they stick electrodes into the beetle's brain and muscles, and a little microcontroller mounted on the back of the insect sends pulses to cause the insect to take off, land, and turn. A low-power radio on the microcontroller lets you control the flight using, literally, a Wii Mote.

Oh yes ... this is real.
There has been a lot of interest in the research community in building insect-scale flying robots -- the Harvard RoboBees project is just one example. Mich's work takes a different approach: let nature do the work of building the flyer, but augment it with remote control capabilities. These beetles are large enough that they can carry a 3 gram payload, can fly for kilometers at a time, and live up to 180 days.

Mich's group found that by sending simple electrical pulses to the brain and muscles that they could activate and deactivate the insect's flying mechanism, causing it to take off and land. Controlling turns is a bit more complicated, but by stimulating certain muscles behind the wings they can cause the beetle to turn left or right on command.

They have also started looking at how to tap into the beetle's sensory organs -- essentially implanting electrodes behind the eye and antennae -- so it is possible to take electrical recordings of the neural activity. And they are also looking at implanting a micro fuel cell that generates electricity from the insect's hemolymph -- essentially turning its own internal fuel source into a battery.

Mich and I were actually good friends while undergrads at Cornell together. Back then he was trying to build a six-legged insect inspired walking robot. I am not sure if it ever worked, but it's kind of amazing to run into him some 15 years later and see he's still working on these totally out-there ideas.

EasyTracker: Automatic Transit Tracking, Mapping, and Arrival Time Prediction Using Smartphones
James Biagioni, Tomas Gerlich, Timothy Merrifield, and Jakob Eriksson (University of Illinois at Chicago)


James, a PhD student at UIC, gave a great talk on this project. (One of the best conference talks I have seen in a long time. I found out later that he won the best talk award - well deserved!) The idea is amazing: To use GPS data collected from buses to automatically determine both the route and the schedule of the bus system, and give users real-time indications of expected arrival times for each route. All the transit agency has to do is install a GPS-enabled cellphone in each bus (and not even label which bus it is, or which route it would be taking - routes change all the time anyway). The data is collected and processed centrally to automatically build the tracking system for that agency.

The system starts with unlabeled GPS traces to extract routes and locations / times of stops. They use kernel density estimation with a Gaussian kernel function to “clean up” the raw traces and come up with clean route information. Some clever statistical analysis to throw out bogus route data.

To do stop extraction, they use a point density estimate with thresholding for each GPS location, which results in clusters at points where buses tend to stop. This will produce a bunch of "fake" stops at traffic lights and stop signs - the authors decided to err on the side of too many stops than too few, so they consider this to be an acceptable tradeoff.

To extract the bus schedule, they look at the arrival times of buses on individual days and use k-means clustering to determine the “centroid time” of each stop. This works fine for first stop on route (which should be close to true schedule). For downstream stops this data ends up being to be too noisy, so instead they compute the mean travel time to each downstream stop.

Another challenge is labeling buses: Need to know which bus is coming down the road towards you. For this, they use a history of GPS traces from each bus, and build an HMM to determine which route the bus is currently serving. Since buses change routes all the time, even during the same day, this has to be tracked over time. Finally, for arrival time prediction, they use the previously-computed arrival time between stops to estimate when the bus is likely to arrive.

I really liked this work and the nice combination of techniques used to take some noisy and complex sensor data and distill it into something useful.

Wednesday, November 2, 2011

Software is not science

Very often I see conference paper submissions and PhD thesis proposals that center entirely on a piece of software that someone has built. The abstract often starts out something like this:

We have designed METAFOO, a sensor network simulator that accurately captures hardware level power consumption. METAFOO has a modular design that achieves high flexibility by allowing new component models to be plugged into the simulation. METAFOO also incorporates a Java-based GUI environment for visualizing simulation results, as well as plugins to MATLAB, R, and Gnuplot for analyzing simulation runs....


You get the idea.  More often than not, the paper reads like a technical description of the software, with a hairy block diagram with a bunch of boxes and arrows and a detailed narrative on each piece of the system, what language it's implemented in, how many lines of code, etc. The authors of such papers quite earnestly believe that this is going to make a good conference submission.

While this all might be very interesting to someone who plans to use the software or build on it, this is not the point of a scientific publication or a PhD dissertation. All too often, researchers -- especially those in systems -- seem to confuse the scientific question with the software artifact that they build to explore that question. They get hung up on the idea of building a beautiful piece of software, forgetting that the point was to do science.

When I see a paper submission like this, I will start reading it in the hopes that there is some deeper insight or spark of inspiration in the system design. Usually it's not there. The paper gets so wrapped up in describing the artifact that it forgets to establish the scientific contributions that were made in developing the software. These papers do not tend to get into major conferences, and they do not make a good foundation for a PhD dissertation.

In computer systems research, there are two kinds of software that people build. The first class comprises tools used to support other research. This includes things like testbeds, simulators, and so forth. This is often great, and invaluable, software, but not -- in and of itself -- the point of research itself. Countless researchers have used ns2, Emulab, Planetlab, etc. to do their work and without this investment the community can't move forward. But all too often, students seem to think that building a useful tool equates to doing research. It doesn't.

The second, and more important, kind of software is a working prototype to demonstrate an idea. However, the point of the work is the idea that it embodies, not the software itself. Great examples of this include things like Exokernel and Barrelfish. Those systems demonstrated a beautiful set of concepts (operating system extensibility and message-passing in multicore processors respectively), but nobody actually used those pieces of software for anything more than getting graphs for a paper, or maybe a cute demo at a conference.

There are rare exceptions of "research" software that took on a life beyond the prototype phase. TinyOS and Click are two good examples. But this is the exception, not the rule. Generally I would not advise grad students to spend a lot of energy on "marketing" their research prototype. Chances are nobody will use your code anyway, and time you spend turning a prototype into a real system is time better spent pushing the envelope and writing great papers. If your software doesn't happen to embody any radical new ideas, and instead you are spending your time adding a GUI or writing documentation, you're probably spending your time on the wrong thing.

So, how do you write a paper about a piece of software? Three recommendations:

  1. Put the scientific contributions first. Make the paper about the key contributions you are making to the field. Spell them out clearly, on the first page of the paper. Make sure they are really core scientific contributions, not something like "our first contribution is that we built METAFOO." A better example would be, "We demonstrate that by a careful decomposition of cycle-accurate simulation logic from power modeling, we can achieve far greater accuracy while scaling to large numbers of nodes." Your software will be the vehicle you use to prove this point.
  2. Decouple the new ideas from the software itself. Someone should be able to come along and take your great ideas and apply them in another software system or to a completely different problem entirely. The key idea you are promoting should not be linked to whatever hairy code you had to write to show that the idea works in practice. Taking Click as an example, its modular design has been recycled in many, many other software systems (including my own PhD thesis).
  3. Think about who will care about this paper 20 years from now. If your paper is all about some minor feature that you're adding to some codebase, chances are nobody will. Try to bring out what is enduring about your work, and focus the paper on that.




Monday, September 26, 2011

Do we need to reboot the CS publications process?

My friend and colleague Dan Wallach has an interesting piece in this month's Communications of the ACM on Rebooting the CS Publication Process. This is a topic I've spent a lot of time thinking about (and ranting about) the last few years and thought I should weigh in. The TL;DR for Dan's proposal is something like arXiv for CS -- all papers (published or not) are sent to a centralized CSPub repository, where they can be commented on, cited, and reviewed. Submissions to conferences would simply be tagged as such in the CSPub archive, and "journals" would simply consist of tagged collections of papers.

I really like the idea of leveraging Web 2.0 technology to fix the (broken) publication process for CS papers. It seems insane to me that the CS community relies on 18th-century mechanisms for peer review, that clearly do not scale, prevent good work from being seen by larger audiences, and create more work for program chairs having to deal with deadlines, running a reviewing system, and screening for plagiarized content.

Still, I'm concerned that Dan's proposal does not go far enough. Mostly his proposal addresses the distribution issue -- how papers are submitted and archived. It does not fix the problem of authors submitting incremental work. If anything, it could make the problem worse, since I could just spam CSPub with whatever random crap I was working on and hope that (by dint of my fame and amazing good looks) it would get voted up by the plebeian CSPub readership irrespective of its technical merit. (I call this the Digg syndrome.) In the CSPub model, there is nothing to distinguish, say, a first year PhD student's vote from that of a Turing Award winner, so making wild claims and writing goofy position papers is just as likely to get you attention as doing the hard and less glamorous work of real science.

Nor does Dan's proposal appear to reduce reviewing load for conference program committees. Being a cynic, it would seem that if submitting a paper to SOSP simply consisted of setting a flag on my (existing) CSPub paper entry, then you would see an immediate deluge of submissions to major conferences. Authors would no longer have to jump through hoops to submit their papers through an arcane reviewing system and run the gauntlet of cranky program chairs who love nothing more than rejecting papers due to trivial formatting violations. Imagine having your work judged on technical content, rather than font size! I am not sure our community is ready for this.

Then there is the matter of attaining critical mass. arXiV already hosts the Computing Research Repository, which has many of the features that Dan is calling for in his proposal. The missing piece is actual users. I have never visited the site, and don't know anyone -- at least in the systems community -- who uses it. (Proof: There are a grand total of six papers in the "operating systems" category on CORR.) For better or worse, we poor systems researchers are programmed to get our publications from a small set of conferences. The best way to get CSPub to have wider adoption would be to encourage conferences to use it as their main reviewing and distribution mechanism, but I am dubious that ACM or USENIX would allow such a thing, as it takes a lot of control away from them.

The final question is that of anonymity. This is itself a hotly debated topic, but CSPub would seem to require authors to divulge authorship on submission, making it impossible to do double-blind reviewing. I tend to believe that blind reviewing is a good thing, especially for researchers at less-well-known institutions who can't lean on a big name like MIT or Stanford on the byline.

The fact is that we cling to our publication model because we perceive -- rightly or wrongly -- that there is value in the exclusivity of having a paper accepted by a conference. There is value for authors (being one of 20 papers or so in SOSP in a given year is a big deal, especially for grad students on the job market); value for readers (the papers in such a competitive conference have been hand-picked by the greatest minds in the field for your reading pleasure, saving you the trouble of slogging through all of the other crap that got submitted that year); and value for program committee members (you get to be one of the aforementioned greatest minds on the PC in a given year, and wear a fancy ribbon on your name badge when you are at the conference so everybody knows it).

Yes, it's more work for PC members, but not many people turn down an opportunity to be on the OSDI or SOSP program committee because of the workload, and there are certainly enough good people in the community who are willing to do the job. And nothing is stopping you from posting your preprint to arXiv today. But act fast -- yours could be the seventh systems paper up there!

Saturday, September 10, 2011

Programming != Computer Science

I recently read this very interesting article on ways to "level up" as a software developer. Reading this article brought home something that has been nagging me for a while since joining Google: that there is a huge skill and cultural gap between "developers" and "Computer Scientists." Jason's advice to leveling-up in the aforementioned article is very practical: write code in assembly, write a mobile app, complete the exercises in SICP, that sort of thing. This is good advice, but certainly not all that I would want people on my team spending their time doing in order to be true technical leaders. Whether you can sling JavaScript all day or know the ins and outs of C++ templates often has little bearing on whether you're able to grasp the bigger, more abstract, less well-defined problems and be able to make headway on them.

For that you need a very different set of skills, which is where I start to draw the line between a Computer Scientist and a developer. Personally, I consider myself a Computer Scientist first and a software engineer second. I am probably not the right guy to crank out thousands of lines of Java on a tight deadline, and I'll be damned if I fully grok C++'s inheritance rules. But this isn't what Google hired me to do (I hope!) and I lean heavily on some amazing programmers who do understand these things better than I do.

Note that I am not defining a Computer Scientist as someone with a PhD -- although it helps. Doing a PhD trains you to think critically, to study the literature, make effective use of experimental design, and to identify unsolved problems. By no means do you need a PhD to do these things (and not everyone with a PhD can do them, either).

A few observations on the difference between Computer Scientists and Programmers...

Think Big vs. Get 'er Done 

One thing that drove me a little nuts when I first started at Google was how quickly things move, and how often solutions are put into place that are necessary to move ahead, even if they aren't fully general or completely thought through. Coming from an academic background I am used to spending years pounding away at a single problem until you have a single, beautiful, general solution that can stand up to a tremendous amount of scrutiny (mostly in the peer review process). Not so in industry -- we gotta move fast, so often it's necessary to solve a problem well enough to get onto the next thing. Some of my colleagues at Google have no doubt been driven batty by my insistence on getting something "right" when they would rather just (and in fact need to) plow ahead.

Another aspect of this is that programmers are often satisfied with something that solves a concrete, well-defined problem and passes the unit tests. What they sometimes don't ask is "what can my approach not do?" They don't always do a thorough job at measurement and analysis: they test something, it seems to work on a few cases, they're terribly busy, so they go ahead and check it in and get onto the next thing. In academia we can spend months doing performance evaluation just to get some pretty graphs that show that a given technical approach works well in a broad range of cases.

Throwaway prototype vs. robust solution

On the other hand, one thing that Computer Scientists are not often good at is developing production-quality code. I know I am still working at it. The joke is that most academics write code so flimsy that it collapses into a pile of bits as soon as the paper deadline passes. Developing code that is truly robust, scales well, is easy to maintain, well-documented, well-tested, and uses all of the accepted best practices is not something academics are trained to do. I enjoy working with hardcore software engineers at Google who have no problem pointing out the totally obvious mistakes in my own code, or suggesting a cleaner, more elegant approach to some ass-backwards page of code I submitted for review. So there is a lot that Computer Scientists can learn about writing "real" software rather than prototypes.

My team at Google has a good mix of folks from both development and research backgrounds, and I think that's essential to striking the right balance between rapid, efficient software development and pushing the envelope of what is possible.

Thursday, August 4, 2011

Measuring the mobile web is hard

I believe strongly that you can't solve a problem until you can measure it. At Google, I've been charged with making the mobile web fast, so naturally, the first step is measuring mobile web performance across a wide range of devices, browsers, networks, and sites. As it turns out, the state of the art in mobile measurement is a complete mess. Different browsers report completely different timings for the same events. There is very little agreement on what metrics we should be optimizing for. Getting good timing out of a mobile device is harder than it should be, and there are many broken tools out there that report incorrect or even imaginary timings.

The desktop web optimization space is pretty complicated, of course, although there's a lot more experience in desktop than in mobile. It's also a lot easier to instrument a desktop web browser than a mobile phone running on a 3G network. Most mobile platforms are fairly closed and fail to expose basic performance metrics in a way that makes it easy for web developers to get at them. We currently resort to jailbreaking phones and running tcpdump and other debugging tools to uncover what is going on at the network and browser level. Clearly it would be better for everyone if this process were simpler.

When we talk about making the mobile web fast, what we are really trying to optimize for is some fuzzy notion of "information latency" from the device to the user. The concept of information latency will vary tremendously from site to site, and depend on what the user is trying to do. Someone trying to check a sports score or weather report only needs limited information from the page they are trying to visit. Someone making a restaurant reservation or buying an airline ticket will require a confirmation that the action was complete before they are satisfied. In most cases, users are going to care most about the "main content" of a page and not things like ads and auxiliary material.

If I were a UX person, I'd say we run a big user study and measure what human beings do while interacting with mobile web sites, using eye trackers, video recordings, instrumented phones -- the works. Unfortunately those techniques don't scale very well and we need something that can be automated.

It also doesn't help that there are (in my opinion) too many metrics out there, many of which have little to do with what matters to the user.

The HTTP Archive (HAR) format is used by a lot of (mostly desktop) measurement tools and is a fairly common interchange format. Steve Souders' httparchive.org site collects HAR files and has some nice tools for visualizing and aggregating them. The HAR spec defines two timing fields for a web page load: onLoad and onContentLoad. onLoad means the time when the "page is loaded (onLoad event fired)", but this has dubious value for capturing user-perceived latency. If you start digging around and trying to find out exactly what the JavaScript onLoad event actually means, you will be hard-pressed to find a definitive answer. The folklore is that onLoad is fired after all of the resources for a given page have been loaded, except that different browsers report this event at different times during the load and render cycle, and JavaScript and Flash can load additional resources after the onLoad event fires. So it's essentially an arbitrary, browser-specific measure of some point during the web page load cycle.

onContentLoad is defined in the HAR Spec as the time when the "Content of the page loaded ... Depeding [sic] on the browser, onContentLoad property represents DOMContentLoad [sic -- should be DOMContentLoaded] event or document.readyState == interactive." Roughly, this seems to correspond to the time when "just" the DOM for the page has been loaded. Normally you would expect this to happen before onLoad, but apparently in some sites and browsers it can happen after onLoad. So, it's hard to interpret what these two numbers actually mean.

The W3C Navigation Timing API goes a long way towards cleaning up this mess by exposing a bunch of events to JavaScript including redirects, DNS lookups, load times, etc. and these times are fairly well-defined. While this API is supported by WebKit, many mobile browsers platforms do not have it enabled; notably iOS (I hope this will be fixed in in iOS5, we will see). The HAR spec will need to be updated with these timings, and someone should carefully document how effectively different browser platforms implement this API in order for it to be really useful.

The W3C Resource Timing API provides an expanded set of events for capturing individual resource timings on a page, which is essential for deep analysis. However, this API is still in the early design stages and there seems to be a lot of ongoing debate about how much information can and should be exposed through JavaScript, e.g., for privacy reasons.

A couple of other metrics depend less on the browser and more on empirical measures, which I tend to prefer.

Time to first byte generally means time to the first byte of the HTTP payload reception on the browser. For WebPageTest, this includes redirects (so redirects are factored into time to first byte). Probably not that useful by itself, but perhaps in conjunction with other metrics. (And God bless Pat Meenan for carefully documenting the measures that WebPageTest reports -- you'd be surprised how often these things are hard to track down.)

WebPageTest also reports time to first paint, which is the first time anything non-white appears in the browser window. This could be as little as a single pixel or a background image, so it's probably not that useful as a metric.

My current favorite metric is the above-the-fold render time, which reports the time for the first screen ("above the fold") of a website to finish rendering. This requires screenshots and image analysis to measure, but it's browser-independent and user-centric, so I like it. It's harder to measure than you would think, because of animations, reflow events, and so forth; see this nice technical presentation for how it's done. Video capture from mobile devices is pretty hard. Solutions like DeviceAnywhere involve hacking into the phone hardware to bring out the video signal, though my preference is for a high-frame-rate video camera in a calibrated environment (which happens to scale well across multiple devices).

One of my team's goals is to provide a robust set of tools and best practices for measuring mobile websites that we can all agree on. In a future post I'll talk some more about the measurements we are taking at Google and some of the tools we are developing.

Sunday, July 31, 2011

Making Seattle my home

I moved to Seattle about 4 months ago, after having lived in Boston for a little more than seven years. Now that I've settled in a bit I thought now would be a good time to write up some of my thoughts on the city and lifestyle here.

The view from Kerry Park in Queen Anne, which was about a 10-minute walk from my house in Queen Anne - before I moved to Wallingford recently.
Upon leaving Boston, I could have moved pretty much anywhere. Most of the cities with a strong tech industry had good job opportunities for my wife, as well, and of course Google has offices in most major cities in the US. So we had plenty of options. We both went to Berkeley for grad school and absolutely love the Bay Area, but we decided not to move back there for a bunch of reasons. The main one being that I would have been working in Mountain View and my wife would have been in SF, and that would have meant a hell of a commute for either of us. It was also not clear that we would have been able to afford a decent house in the Bay Area in any neighborhoods that we would want to live. Our preference would have been to live in the East Bay, bit that would have made the commute problem even worse. With a two-year old son, I'm not willing to go through  an hour commute twice a day -- it's simply not worth it to me.

Seattle has a lot of what we were looking for. We live right in the middle of the city (in Wallingford) and for me it's a 10-minute bike commute (to the Google office in Fremont) along the shore of Lake Union, with views of downtown, the Space Needle, and Mount Rainier. It is a fantastic neighborhood with shops, bars, restaurants, playgrounds, and one of the best elementary schools in Seattle (John Stanford) just a few blocks away.

I realized at one point that I probably know more people in Seattle than any other city -- including Boston -- with the University of Washington, Microsoft, Amazon, and Google all here I had this large pre-fab social network already in place. The tech industry is huge here and there seems to be a very active startup community.

The geography here is absolutely stunning. Anywhere you go in Seattle you are surrounded by water, trees, snow-capped mountains. From our house we have a beautiful view to downtown Seattle and Lake Union, with seaplanes taking off and landing overhead. It is also a dense enough city that we can walk or bike to pretty much everything we would need; of course, a big part of this is because we live in Seattle proper, rather than the Eastlake communities of Kirkland, Bellevue, or Redmond, which tend to be more spread out.

This is totally the view from my house in Wallingford. Yes, I would like for that damn tree to not be in the way, but what can you do?

It is no surprise that Seattle is a far more relaxed and progressive place than Boston. A lot of this is, of course, the West Coast vs. East Coast distinction, and in a lot of ways Seattle exemplifies the West Coast aesthetic, much as Boston does the East. Way way more fixie bikes, tattoos, farmers markets, lesbians, hippies, and hippie lesbians with tattoos riding fixie bikes through farmers markets here in Seattle than anywhere in New England. In a lot of ways it's like San Francisco Lite -- a bit less edgy, more approachable, more gentrified, but still very forward-thinking. I feel very much like I belong here, whereas in Boston I always felt like a bit of an outsider.

So far I'm digging the restaurant and cocktail scene in Seattle, which is more adventurous and less stuffy than what you find in Boston (although Boston has some damn good food). I miss really good Chinese food (which is harder to find than you would expect), and surprisingly Seattle doesn't have a ton of great Mexican food options, although I happen to live about a block from the best taco truck in town. Thai and sushi are excellent here, and there seems to be a lot more casual, foodie-type places all over town which do crazy shit like Korean comfort food and ice cream sandwiches

What am I not so crazy about? Well, I'm on the fence about the weather. The summer has (mostly) been beautiful - 75 degrees, sunny, no humidity at all. Mixed in have been some cooler rainy days that feel out of place for the season. The first couple of months we were here, in April and May, it was rainy and overcast pretty much every day. I take it this is typical for Seattle. The long term question is whether I will be more or less content with this pattern than Boston, which has a much wider temperature range, a couple of months of unbearably cold and snowy weather each year, and sweltering humid summers. It remains to be seen.

Second, everyone in Seattle appears to be white. This is not true of course, but at least in the neighborhoods where I spend most of my time, there is a lot less racial and cultural diversity than Boston. My understanding is that this is due to largely historical reasons where minorities were shut out of many neighborhoods, but the effects persist today. I will ponder this more deeply the next time I'm sitting at a sidewalk café with my dog while sipping an organic soy latte and checking Google+ on my MacBook Pro. It's the thing to do here, you know.



Startup Life: Three Months In

I've posted a story to Medium on what it's been like to work at a startup, after years at Google. Check it out here.