In March 2011, I drafted an article explaining how the team responsible for Google Chrome ships software. Then I promptly forgot about it.
I stumbled across the draft a few days ago. Though it is now somewhat outdated (e.g., Chrome forked WebKit into Blink in 2013, and I don’t even work for Google anymore), I still think the underlying ideas are very valid.
How Chromium Works
Hundreds of engineers work on the Chromium project. Together we commit about 800 changes to the codebase every single week. We also depend on many other large and active projects like V8, Skia, and WebKit.
We push a new stable release to hundreds of millions of users every six weeks, on a fixed schedule. And we maintain several other early-access channels which update even faster. The fastest, the canary channel, silently auto-updates nearly ever single weekday.
How does all of this keep working? How are there still wheels on this bus? Why haven’t all the developers gone insane yet?
On the technology side, Chromium’s velocity is made possible by reliable, efficient, and silent autoupdates.
On the people side, there’s dedicated, hard-working, and smart teams of QA, release managers, and infrastructure types, without whom the entire thing would fall apart in weeks. And there’s the designers, product managers, writers, PR, lawyers, security teams, and everyone else that needs to work together smoothly for each stable release. I’m going to gloss over all that today in order to focus on engineering things, and in an attempt to keep this post from growing to Yegge-proportions.
What I’m going to talk about instead is Chromium’s development process, which was specifically designed to make rapid releases possible. It has some interesting features that I think could improve many projects, regardless of release schedule. It also comes with some challenges that I’ll point out at the end.
On many projects, it’s common to branch the source code to work on major new features. The idea is that temporary destabilization from the new code won’t affect other developers or users. Once the feature is complete, its branch is merged back into trunk, and there is usually a period of instability while integration issues are ironed out.
This wouldn’t work in Chrome because we release every day. We can’t tolerate huge chunks of new code suddenly showing up in trunk because it would have a high chance of taking down the canary or dev channels for an extended period. Also, the trunk of Chrome moves so fast that it isn’t practical for developers to be isolated on a branch for very long. By the time they merged, trunk would look so different that integration would be difficult and error-prone.
We do create maintenance branches before each of our beta releases, but these are really short-lived. Just six weeks at most, until the next beta release. And we never develop directly on these branches. Any late fixes that need to go into a release are first made on trunk, and then cherry-picked into the branch.
One happy side-effect of this is that there’s no B-team of developers stuck working on a maintenance branch. All developers are always working with the latest and greatest source.
We don’t create branches, but we still need some way to hide incomplete features from users. One natural way to do this would be with compile-time checks. The problem with those is they aren’t much different from just branching the code. You still effectively have two separate codebases that must eventually be merged. And since the new code isn’t compiled or tested by default, it’s very easy for developers to accidentally break it.
The Chromium project uses runtime checks instead. Every feature under development is compiled and tested under all configurations from the very beginning. We have command-line flags that we test for very early in startup. Everywhere else, the codebase is mostly ignorant of which features are enabled. This strategy means that new feature work is integrated as much as possible from the beginning. It’s at least compiled, and any changes to core code that need to be done are tested and exposed to users as normal. And we can easily write automated tests that exercise disabled features by temporarily overriding the command line.
When a feature gets close to completion we introduce an option in chrome://flags so that advanced users can start trying it out and giving us feedback. Finally, when we think the feature is ready to ship, we remove the command-line flag and enable it by default. By this time the code has been extensively tested with automation and used by many people. So the impact from turning it on is minimized.
Gigantic amounts of automated testing
In order to release every day, we need to have confidence that our codebase is always in a good state. This requires automated tests. Lots of automated tests. As of this writing Chrome has about 12k class-level unit tests, 2k automated integration tests, and a huge assortment of perf tests, bloat tests, thread safety tests, memory safety tests, and probably more that I can’t think of. And that’s just for Chrome. WebKit, V8, and our other dependencies are tested on their own. WebKit alone has about 27k tests that ensure web pages layout and function correctly. Our general rule is that every change has to come with tests.
We run a public buildbot that constantly runs new changes to our code against our test suite on every configuration, and we enforce a “green tree” policy. If a change breaks a test, it is immediately reverted. The developer must fix the change and re-land it. We don’t leave broken changes in the tree because:
It makes it easy to accidentally land more broken changes because nobody notices the tree go from red to even redder
It slows down development because everyone has to work around whatever is broken
It encourages developers to make sloppy quick fixes to the get the tests passing
It prevents us from releasing!
To help developers avoid breaking the tree, we have try bots, which are a way to test a change under all tests and configurations before landing it. The results are emailed to the developer. We also have a commit queue, which is a way to try a change and have it landed automatically if the try succeeds. I like to use this after a long night of hacking. I press the button, go to bed, and wake up — hopefully to my change having landed.
With all this automated testing, we can get away with giving our dev channel very minimal manual testing, and the canaries get none at all.
Because we have pretty comprehensive test coverage, we can afford to be aggressive with refactoring. There are always a few major refactor efforts going on in Chrome. Right now the main ones are Carnitas and Aura.
At our scale and pace, it’s critical to keep our codebase clean and understandable. We even view it as more important than preventing regressions. Engineers throughout Chrome are empowered to make improvements anywhere in the system (though we may require a review by a module owner). If a refactor breaks something that wasn’t exposed by failing tests, our outlook it that it wasn’t the fault of the engineer who did the refactor, but the one whose feature had insufficient test coverage.
WebKit moves really fast too. And just like we cannot have feature branches that suddenly land, we can’t try and merge a month’s worth of WebKit changes all at once. It would destabilize the tree for days.
Instead, we try and keep Chrome compiling against a very recent version of WebKit. That version is almost always less than a half-day old. There’s a file in the root of Chrome that contains the version of WebKit we currently compile against. When you check out or update the Chrome source code, a tool called gclient automatically pulls the version of WebKit indicated in this file.
Several times each day, an engineer updates this version number, investigates any new integration issues that come up, and assigns bugs to the relevant engineers. The result is that we only ever pull in small changes to WebKit at once, and the impact on our tree is usually minimal. We have also added bots to WebKit’s buildbot so that when WebKit engineers make a change that will end up breaking Chrome, it shows up to them immediately.
A big benefit of the DEPS system is that we can get changes out to the web platform very quickly. A feature that lands in WebKit will be available to Chrome users on the canary channel within a few days. This incentivizes us to make improvements upstream in WebKit, where they will help all WebKit clients, rather than locally where they only help Chrome. In fact, our general rule is that we don’t make any local changes to WebKit (or the other projects we rely on) at all.
Testing thoroughly is still an unsolved problem. In particular, flaky integration tests are a constant issue for us. Chrome is big, complex, asynchronous, multiprocess, and multithreaded. It’s easy for integration tests to have subtle timing issues that make them fail intermittently. On a project our size, a test that fails 1% of the time is guaranteed to fail multiple times per day.
Once a test becomes flaky, the team quickly gets in the habit of ignoring it, and that makes it easy to miss other legitimate test failures in that area of code. Thus, we end up disabling flaky tests and losing coverage, making it easier to release major regressions to users.
Another challenge is that it’s hard to keep polish high at this velocity. I think it’s easier to focus a team to get every detail right for the rare big bang-style releases, than to try and maintain that focus indefinitely. And since it is frequently hard to test little details like the spacing of buttons in a toolbar, it’s easy for mistakes to creep in.
Finally, I think there’s a very real issue of stress. With all the code changing all the time, even if people try to focus just on their area, it’s easy for them to be affected by what’s going on elsewhere. It can easily feel like you can never rest if you want to keep your part of Chrome working.
We’re attacking this last problem by doing some major modularization of the codebase. The Carnitas task force is trying to draw cleaner, tighter interfaces between some of our major components. So far, it’s cleaning up a lot of code, but it’s a bit early to say how much it helps the bigger picture stress level.
So, how are the wheels still on the bus? In short: no branches, runtime switches, tons of automated testing, relentless refactoring, and staying very close to HEAD of our dependencies.
These techniques will be most helpful to large projects that have fast-moving upstream dependencies, but perhaps some of the ideas are applicable to smaller projects too.