Thursday, April 30, 2015

Flywheel: Google's Data Compression Proxy for the Mobile Web

Next week, we'll be presenting our work on the Chrome Data Compression proxy, codenamed Flywheel, at NSDI 2015. Here's a link to the full paper. Our wonderful intern and Berkeley PhD student Colin Scott will be giving the talk. (I'm happy to answer questions about the paper in the comments section below.)

It's safe to say that the paper would have never happened without Colin -- most of us are too busy building and running the service to spend the time it takes to write a really good paper. Colin's intern project was specifically to collect data and write a paper about the system (he also contributed some features and did some great experiments). It was a win-win situation since we got to offload most of the paper writing to Colin, and he managed to get a publication out of it!


Rather than summarize the paper, I thought I'd provide some backstory on Flywheel and how it came about. It's a useful story to understand how a product like this goes from conception to launch at a company like Google.

(That said, standard disclaimer applies: This is my personal blog, and the opinions expressed here are mine alone.)

Backstory: Making the mobile web fast

When I moved to Seattle in 2011, I was given the mission to start a team with a focus on improving mobile Web performance. I started out by hiring folks like Ben Greenstein and Michael Piatek to help figure out what we should do. We spent the first few months taking a very academic approach to the problem: Since we didn't understand mobile Web performance, of course we needed to measure it!

We built a measurement tool, called Velodrome, which allowed us to automate the process of collecting Web performance data on a fleet of phones and tablets -- launching the browser with a given URL, measuring a bunch of things, taking screenshots, and uploading the data to an AppEngine-based service that monitored the fleet and provided a simple REST API for clients to use. We built a ton of infrastructure for Velodrome and used it on countless experiments. Other teams at Google also started using Velodrome to run their own measurements and pretty soon we had a few tables full of phones and tablets churning away 24/7. (This turned out to be a lot harder than we expected -- just keeping them running continuously without having to manually reboot them every few hours was a big pain.)

At the same time we started working with the PageSpeed team, which had built the gold standard proxy for optimizing Web performance. PageSpeed was focused completely on desktop performance at the time, and we wanted to develop some mobile-specific optimizations and incorporate them. We did a bunch of prototyping work and explorations of various things that would help.

The downside to PageSpeed is that sites have to install it -- or opt into Google's PageSpeed Service. We wanted to do something that would reach more users, so we started exploring building a browser-based proxy that users, rather than sites, could turn on to get faster Web page load times. (Not long after this, Amazon announced their Silk browser for the Kindle Fire, which was very much the same idea. Scooped!)

Starting a new project

Hence we started the Flywheel project. Initially our goal was to combine PageSpeed's optimizations, the new SPDY protocol, and some clever server-side pre-rendering and prefetching to make Web pages load lightning fast, even on cellular connections. The first version of Flywheel, which we built over about a year and a half, was built on top of PageSpeed Service.

Early in the project, we learned of the (confidential at the time) effort to port Chrome to Android and iOS. The Chrome team was excited about the potential for Flywheel, and asked us to join their team to launch it as a feature in the new browser. The timing was perfect. However, the Chrome leadership was far more interested in a proxy that could compress Web pages, which is especially important for users in emerging markets, on expensive mobile data plans. Indeed, many of the original predictive optimizations we were using in Flywheel would have resulted in substantially greater data usage for the user (e.g., prefetching the next few pages you were expected to visit). It also turned out that compression is way easier than performance, so we decided to focus our efforts on squeezing out as many bytes as possible. (A common mantra at the time was "no bytes left behind".)

Rewriting in Go

As we got closer to launching, we were really starting to feel the pain of bolting Flywheel onto PageSpeed Service. Originally, we planned to leverage many of the complex optimizations used by PageSpeed, but as we focused more on compression, we found that PageSpeed was not well-suited to our needs, for a bunch of reasons. In early 2013, Michael Piatek convinced me that it was worth trying to rewrite the service, from scratch, in Go -- as a way of both doing a clean redesign from scratch but also leveraging Go's support for building Google-scale services. It was a big risk, but we agreed that if the rewrite wasn't bearing fruit in just a couple of months that we'd stop work on it and go back to PageSpeed.


Fortunately, Michael and the rest of the team executed at lightning speed and in just a few months we had substantially reimplemented Flywheel in Go, a story documented elsewhere on this blog. In November 2013 I submitted a CL to delete the thousands of lines of the PageSpeed-based Flywheel implementation, and we switched over entirely to the new, Go-based system in production.

PageSpeed Service in C++ was pushing 270 Kloc at the time. The Go-based rewrite was just 25 Kloc, 13Kloc of which were tests. The new system was much easier to maintain, faster to develop, and gave our team sole ownership of the codebase, rather than having to negotiate changes across the multiple teams sharing the PageSpeed code. The bet paid off. The team was much happier and more productive on the new codebase, and we managed to migrate seamlessly to the Go-based system well before the full public launch.

Launching

We announced support for Flywheel in the M28 beta release of Chrome at Google I/O in 2013, and finally launched the service to 100% of Chrome Mobile users in January 2014. Since then we've seen tremendous growth of the service. More than 10% of all Chrome Mobile users have Flywheel enabled, with percentages running much higher in countries (like Brazil and India) where mobile data costs are high. The service handles billions of requests a day from millions of users. Chrome adoption on mobile has skyrocketed over the last year, and is now the #1 mobile browser in many parts of the world. We also recently launched Flywheel for Chrome desktop and ChromeOS. Every day I check the dashboards and see traffic going up and to the right -- it's exciting.

We came up with the idea for Flywheel in late 2011, and launched in early 2014 -- about 2.5 years of development work from concept to launch. I have no idea if that's typical at Google or anywhere else. To be sure, we faced a couple of setbacks which delayed launch by six months or more -- mostly factors out of our control. We decided to hold off on the full public release until the Go rewrite was done, but there were other factors as well. Looking back, I'm not sure there's much we could have done to accelerate the development and launch process, although I'm sure it would have gone faster had we been doing it as a startup, rather than at Google. (By the same token, launching as part of Chrome is a huge opportunity that we would not have had anywhere else.)

What's next?

Now that Flywheel is maturing, we have a bunch of new projects getting started. We still invest a lot of energy into optimizing and maintaining the Flywheel service. Much of the work focuses on making the service more robust to all of the weird problems we face proxying the whole Web -- random website outages causing backlogs of requests at the proxy, all manner of non-standards-compliant sites and middleboxes, etc. (Buy me a beer and I'll tell you some stories...) We are branching out beyond Flywheel to build some exciting new features in Chrome and Android to improve the mobile experience, especially for users in emerging markets. It'll be a while until I can write a blog post about these projects of course :-)


2 comments:

  1. Hi Matt,

    Thank you for sharing the story of Flywheel's development. Say your favorite and I can give you pack of beers for more "stories" :) Well, I really liked the part where you planned to branch out from PageSpeed to Go based implementation and that decision became success. And the idea that proxy based Flywheel really really helps in developing countries where people have limited data access.

    I have couple questions:
    1. Are there any open source version for Velodrome available in web?

    2. When does the performance of Flywheel in terms of compression/latency becomes flat?

    3. What are the factors need to keep in mind when rewriting large code base for short term delivery? How to manage testing such big changes?

    Hope to watch the video of your project at NSDI 2015.

    Thank you.

    ReplyDelete
    Replies
    1. Thanks.

      Velodrome is not open source. We probably could open source it but it's a few years old and likely doesn't work anymore... We did open source another measurement tool (called Speedometer) which ended up forming the basis for Mobiperf - https://github.com/Mobiperf/MobiPerf. This tool runs in the background on Android phones and collects mobile network performance data.

      Not sure what you mean by "becomes flat" :-)

      I think the main thing that helped us with the Go rewrite was deciding to fail fast. Rather than giving ourselves an arbitrary amount of time to complete the rewrite, we decided to hit a major milestone (basic service up and running without a complete feature set) quickly, and use that to gauge whether it made sense to (a) keep working on it and (b) add more engineers to the project. The early results gave us confidence that things would go smoothly, which was great.

      Delete

Startup Life: Three Months In

I've posted a story to Medium on what it's been like to work at a startup, after years at Google. Check it out here.