An exchange I had today with another senior manager here at Zango gave me an opportunity to put words to my philosophy around changes to our production environment. It can be summarized as:
- Move fast enough to make mistakes.
- When you make a mistake, slow down enough to figure out what went wrong, and don't do it again.
Both parts of this philosophy are critical. If you don't have #1, you'll never get anything done. If you don't have #2, you'll keep screwing up the same way over and over again.
I've worked in environments where #1 wasn't the case, where change control processes were out of control, and crippled people's ability to get things done. At one company I worked at, before you could make a change to the environment, you had to do a formal write-up, then send it to a committee which met once a week, and only after that committee had approved it could you actually make the change. The net result was that nothing got done, and employees (not all of them in IT) got very frustrated. If you're AT&T, and any outage is immediately visible to millions of people, I admit that you do need to be this careful. But for most companies – and Zango is one of these – you'll end up getting more done in the long run if you're willing to take a few risks and make a few mistakes along the way.
I've also worked in environments where #2 wasn't the case, where we kept making the same mistakes over and over. But at Zango, any outage triggers our post-mortem process: we describe what happened, what the impact on revenue or installs was, and do our best to drill down to the root-cause of the problem. Then we document the steps we'll take to make sure it doesn't happen again, and follow up to make sure those steps were taken. Over time, this process cuts down on the number of needless mistakes more than any decision to "go slowly and carefully" could.
And a final unnecessary point: out of a somewhat macabre sense of humor, I insist on calling them "post-mortems," not "outage reports" or anything like that. Ever since I worked at a funeral home during college, the image of a body – or in this case, a project or a rollout -- lying open on a slab, with people gathered around and pointing, has been a rather potent one for me. You can't help but pay attention.