Skip to main content

Measuring the mobile web is hard

I believe strongly that you can't solve a problem until you can measure it. At Google, I've been charged with making the mobile web fast, so naturally, the first step is measuring mobile web performance across a wide range of devices, browsers, networks, and sites. As it turns out, the state of the art in mobile measurement is a complete mess. Different browsers report completely different timings for the same events. There is very little agreement on what metrics we should be optimizing for. Getting good timing out of a mobile device is harder than it should be, and there are many broken tools out there that report incorrect or even imaginary timings.

The desktop web optimization space is pretty complicated, of course, although there's a lot more experience in desktop than in mobile. It's also a lot easier to instrument a desktop web browser than a mobile phone running on a 3G network. Most mobile platforms are fairly closed and fail to expose basic performance metrics in a way that makes it easy for web developers to get at them. We currently resort to jailbreaking phones and running tcpdump and other debugging tools to uncover what is going on at the network and browser level. Clearly it would be better for everyone if this process were simpler.

When we talk about making the mobile web fast, what we are really trying to optimize for is some fuzzy notion of "information latency" from the device to the user. The concept of information latency will vary tremendously from site to site, and depend on what the user is trying to do. Someone trying to check a sports score or weather report only needs limited information from the page they are trying to visit. Someone making a restaurant reservation or buying an airline ticket will require a confirmation that the action was complete before they are satisfied. In most cases, users are going to care most about the "main content" of a page and not things like ads and auxiliary material.

If I were a UX person, I'd say we run a big user study and measure what human beings do while interacting with mobile web sites, using eye trackers, video recordings, instrumented phones -- the works. Unfortunately those techniques don't scale very well and we need something that can be automated.

It also doesn't help that there are (in my opinion) too many metrics out there, many of which have little to do with what matters to the user.

The HTTP Archive (HAR) format is used by a lot of (mostly desktop) measurement tools and is a fairly common interchange format. Steve Souders' httparchive.org site collects HAR files and has some nice tools for visualizing and aggregating them. The HAR spec defines two timing fields for a web page load: onLoad and onContentLoad. onLoad means the time when the "page is loaded (onLoad event fired)", but this has dubious value for capturing user-perceived latency. If you start digging around and trying to find out exactly what the JavaScript onLoad event actually means, you will be hard-pressed to find a definitive answer. The folklore is that onLoad is fired after all of the resources for a given page have been loaded, except that different browsers report this event at different times during the load and render cycle, and JavaScript and Flash can load additional resources after the onLoad event fires. So it's essentially an arbitrary, browser-specific measure of some point during the web page load cycle.

onContentLoad is defined in the HAR Spec as the time when the "Content of the page loaded ... Depeding [sic] on the browser, onContentLoad property represents DOMContentLoad [sic -- should be DOMContentLoaded] event or document.readyState == interactive." Roughly, this seems to correspond to the time when "just" the DOM for the page has been loaded. Normally you would expect this to happen before onLoad, but apparently in some sites and browsers it can happen after onLoad. So, it's hard to interpret what these two numbers actually mean.

The W3C Navigation Timing API goes a long way towards cleaning up this mess by exposing a bunch of events to JavaScript including redirects, DNS lookups, load times, etc. and these times are fairly well-defined. While this API is supported by WebKit, many mobile browsers platforms do not have it enabled; notably iOS (I hope this will be fixed in in iOS5, we will see). The HAR spec will need to be updated with these timings, and someone should carefully document how effectively different browser platforms implement this API in order for it to be really useful.

The W3C Resource Timing API provides an expanded set of events for capturing individual resource timings on a page, which is essential for deep analysis. However, this API is still in the early design stages and there seems to be a lot of ongoing debate about how much information can and should be exposed through JavaScript, e.g., for privacy reasons.

A couple of other metrics depend less on the browser and more on empirical measures, which I tend to prefer.

Time to first byte generally means time to the first byte of the HTTP payload reception on the browser. For WebPageTest, this includes redirects (so redirects are factored into time to first byte). Probably not that useful by itself, but perhaps in conjunction with other metrics. (And God bless Pat Meenan for carefully documenting the measures that WebPageTest reports -- you'd be surprised how often these things are hard to track down.)

WebPageTest also reports time to first paint, which is the first time anything non-white appears in the browser window. This could be as little as a single pixel or a background image, so it's probably not that useful as a metric.

My current favorite metric is the above-the-fold render time, which reports the time for the first screen ("above the fold") of a website to finish rendering. This requires screenshots and image analysis to measure, but it's browser-independent and user-centric, so I like it. It's harder to measure than you would think, because of animations, reflow events, and so forth; see this nice technical presentation for how it's done. Video capture from mobile devices is pretty hard. Solutions like DeviceAnywhere involve hacking into the phone hardware to bring out the video signal, though my preference is for a high-frame-rate video camera in a calibrated environment (which happens to scale well across multiple devices).

One of my team's goals is to provide a robust set of tools and best practices for measuring mobile websites that we can all agree on. In a future post I'll talk some more about the measurements we are taking at Google and some of the tools we are developing.

Comments

  1. Interesting post. I simply consume through a mobile device, so my relevant metric is WhenCanICloseThisWindowAndPlayAngryBirds, but the broader topic made me think of aggregation results from social choice. In a nutshell, they suggest that there may be no "best" metric, and---furthermore---that the selection of any metric as best will engender an incentiveodor a competitor to utilize a different one as the basis for optimization. Again, great post, and thanks for getting neurons firing!

    ReplyDelete
  2. Nice closer and hook. Maybe you can invent something brilliant and simple with hooks into basic core OS commands. You can't manage what you can't measure as the old mainframers like to say...

    ReplyDelete
  3. Matt -

    Well done. One of the best posts about mobile Web performance measurement. Bravo. As a marketer, I'm particularly sensitive to the last bit about "Above the Fold Render Time." Determines whether or not a user will abandon. Mobile is really different than desktop. Understanding is limited and the tools are just being put in place. Lot's of exciting things to come. Thank you.

    ReplyDelete
  4. Everything is really good content your

    ReplyDelete
  5. Good stuff! You have seen some of the telemetry data for crash debugging (Debugging in the Very Large: Ten Years of Implementation and Experience at last SOSP) and perf anomalies (Practical performance models for complex, popular applications at Sigmetrics'10). I am curious in seeing what kind of telemetry ends up being used for the mobile web.

    ReplyDelete
  6. I think every request to the web from a mobile device (like any other generic request even from a desktop) triggers a series of RPC calls to various services in order to fulfill that request. Gideon Mann in Google NY has been looking into a problem related to this, viz., latency measurement of a service based on the latencies of the services that it invokes.

    In general, if you were to scour the literature for performance prediction in SOA-based solutions, you should be able to get some good insights.

    ReplyDelete

Post a Comment

Popular posts from this blog

Why I'm leaving Harvard

The word is out that I have decided to resign my tenured faculty job at Harvard to remain at Google. Obviously this will be a big change in my career, and one that I have spent a tremendous amount of time mulling over the last few months.

Rather than let rumors spread about the reasons for my move, I think I should be pretty direct in explaining my thinking here.

I should say first of all that I'm not leaving because of any problems with Harvard. On the contrary, I love Harvard, and will miss it a lot. The computer science faculty are absolutely top-notch, and the students are the best a professor could ever hope to work with. It is a fantastic environment, very supportive, and full of great people. They were crazy enough to give me tenure, and I feel no small pang of guilt for leaving now. I joined Harvard because it offered the opportunity to make a big impact on a great department at an important school, and I have no regrets about my decision to go there eight years ago. But m…

Rewriting a large production system in Go

My team at Google is wrapping up an effort to rewrite a large production system (almost) entirely in Go. I say "almost" because one component of the system -- a library for transcoding between image formats -- works perfectly well in C++, so we decided to leave it as-is. But the rest of the system is 100% Go, not just wrappers to existing modules in C++ or another language. It's been a fun experience and I thought I'd share some lessons learned.

Why rewrite?

The first question we must answer is why we considered a rewrite in the first place. When we started this project, we adopted an existing C++ based system, which had been developed over the course of a couple of years by two of our sister teams at Google. It's a good system and does its job remarkably well. However, it has been used in several different projects with vastly different goals, leading to a nontrivial accretion of cruft. Over time, it became apparent that for us to continue to innovate rapidly wo…

Running a software team at Google

I'm often asked what my job is like at Google since I left academia. I guess going from tenured professor to software engineer sounds like a big step down. Job titles aside, I'm much happier and more productive in my new role than I was in the 8 years at Harvard, though there are actually a lot of similarities between being a professor and running a software team.

I lead a team at Google's Seattle office which is responsible for a range of projects in the mobile web performance area (for more background on my team's work see my earlier blog post on the topic). One of our projects is the recently-announced data compression proxy support in Chrome Mobile. We also work on the PageSpeed suite of technologies, specifically focusing on mobile web optimization, as well as a bunch of other cool stuff that I can't talk about just yet.

My official job title is just "software engineer," which is the most common (and coveted) role at Google. (I say "coveted&quo…