It's all about iteration speed

This week I read a tweet about big tech not relying on multiple pre-production environments. Many people were surprised about it, but to be honest, it's not something unimaginable.

More and more teams are prioritizing iteration speed, and quick feedback cycles to improve their products.

And one proven mechanism to do it is by getting rid of pre-production environments, in favour of better monitoring, alarming, and rollback tools.

Why do we have so many environments?

One of the main reasons I hear is to avoid bugs going into production, they hope to catch them earlier in the funnel. By doing that, you avoid disrupting your users' experience due to a bug.

Another reason frequently heard, is to ensure the stability of production, you don't want to have an unstable production environment. With a pre-production environment, you can really be sure that after you promote, things will remain in good shape.

In the end, it's all about confidence. Confidence that the changes you are implementing won't have any undesired side effects on your users, degrading their interactions with your app.

The cost of the process

But in order to achieve that confidence, you are introducing new costs into your pipeline. Costs that will also affect your users.

Besides the obvious costs you'll incur such as maintaining the infrastructure you'll need to have those multiple environments. You'll be also taking a hit on iteration speed.

Every change will take longer to be in the hands of your users, which means you are increasing the time it takes for a user to perceive value, give feedback, and use that feedback to iterate on your product.

If you are following me, you can do the maths. All this extra time will start adding up in each iteration. Making you, probably, iterate considerably slower.

Prioritizing iteration speed

Nowadays, iteration speed and faster feedback cycles are keys to successful product development. A clear example of it is Figma, a really vibrant product that iterated at a really nice pace while hearing their community.

One way to make massive gains in iteration speed is by deprecating most of the pre-production environment.

You don't need to sacrifice quality

When you think about disabling pre-production environments, you might feel like it will come at the cost of quality. But that's not necessarily true.

You can speed your iteration speed and keep the quality by having the right setup. The key parts to doing it are:

  • Having a good local environment, so the team can make changes there and catch the most obvious things (local environment doesn't necessarily mean in their local machine, it can be an isolated server for example)
  • Setting up CI/CD systems to catch obvious regressions. This must include having updated tests all the time and reinforcing the code review culture across the teams.

Taking risks to iterate faster

You have to accept the fact that bugs will happen, but it's all about how you react to them. Once you've accepted that, it will be easier to increase your iteration speed.

Getting a good system for on-call rotation, and clear runbooks on how to tackle different problems, or how to debug/rollback certain systems is important to increase the team's confidence when merging changes.

You need to take calculated risks, and you should know which kind of bugs can get slipped into production, but you also need to have mitigation plans.

Smart rollout

Merging into the main branch doesn't mean you are releasing it to all of your users. You must have some kind of flagging system in place to allow you to perform a smart rollout.

This way, you can also merge into production but only enable it for selected people in an internal release, and manage early access programs in a more agile way.

Also, you can be smart on how you start enabling new high-risk features, such as a huge change in how you process certain data that might lead to a regression, by regions or volumes of users.

Monitoring, alerting, and rollback

One of the most important things is that you are required to be good at monitoring and alerting. This will give that extra confidence to your team to rollout stuff.

If you are confident in your system to detect off numbers and then the right people are alerted so rollbacks can be made with almost no friction. Then you are in a great position to start releasing under a flag and ditching pre-production environments in favour of increasing iteration speed.

Final Thoughts

  • You don't need to sacrifice quality in order to iterate faster
  • Take time into setting a proper CI/CD to increase confidence when merging new changes
  • Releasing behind a flag should be the rule and not the exception
  • Accept some bugs will slip into production, take calculated risks