· Brittany Ellich · notes · 5 min read
Availability
Overview
What Exactly is Availability?
In simple terms, availability refers to the percentage of time your application is up and running—ready for use whenever your users need it. Imagine this: you open an app or a webpage, and you expect it to load instantly without issues. When your app is available, that’s exactly what happens.
On the flip side, if your users encounter downtime or delays, your app’s availability is compromised. It’s about making sure your service is reliable and responsive, no matter when or where it’s accessed.
How Do You Measure Availability?
Measuring availability isn’t one-size-fits-all—it depends on your app’s goals, scale, and the needs of your users. Here are some of the most common ways to measure it:
- Uptime Percentage: The most straightforward way to measure availability is by tracking uptime. This simply refers to the amount of time your application is functioning properly. For example, if your service is operational for 99 hours out of every 100, your availability is 99%. The higher the percentage, the better the user experience.
- Service-Level Agreements (SLAs): For some applications, especially those that businesses rely on to run other critical software, SLAs become important. An SLA is a formal agreement that guarantees a certain level of availability. These contracts often set specific uptime goals, ensuring that the service provider is held accountable for any downtime. For instance, cloud services or hosting platforms typically provide SLAs to reassure clients that their systems will be up and running within defined parameters.
The Different ‘9’s of Availability and What They Mean for Downtime
When we talk about high availability, you might hear terms like “99% availability” or “99.99% availability,” but what do these numbers actually represent in terms of downtime? The higher the number of 9’s, the less downtime your service can have. Here’s a breakdown of what the different “9’s” mean for total downtime in a given year:
- 99% Availability – Also called “two nines” With 99% availability, your application can experience 3.65 days (87.6 hours) of downtime per year. While this may sound acceptable for some less critical apps, it’s a lot of time lost for most businesses.
- 99.9% Availability – “Three nines” At this level, your downtime is reduced to 8.76 hours annually. This is often the baseline for many consumer-facing apps and businesses that want a more stable experience for users.
- 99.99% Availability – “Four nines” With 99.99% availability, you can only afford about 52.56 minutes of downtime per year. This is a common standard for critical systems, like financial apps or enterprise-level software, where even a small amount of downtime can cause significant disruption.
- 99.999% Availability – “Five nines” At 99.999%, downtime is reduced to just 5.26 minutes per year. This level of availability is typically required for mission-critical systems in industries like healthcare, finance, and telecommunications.
- 99.9999% Availability – “Six nines” For the most demanding applications, 99.9999% availability allows for just 31.5 seconds of downtime annually. Achieving this level of availability is rare and often only required for extremely high-stakes systems, like those in aerospace or military applications.
The more “nines” you have, the less downtime your users experience, but achieving higher availability comes with increasing complexity, cost, and technical demands. It’s crucial to balance your uptime goals with the resources and infrastructure you have in place.
Why Does Availability Matter?
In today’s digital landscape, downtime isn’t just inconvenient—it can be costly. Whether you’re running an e-commerce site, a SaaS platform, or a customer service app, availability directly impacts user satisfaction, revenue, and brand reputation. A single minute of downtime can lead to missed sales opportunities, frustrated users, and a drop in trust.
I have a strong memory of working at Nike and making a bad change. That change resulted in customers being unable to buy shoes on the Nike.com website for around five minutes.
Five minutes doesn’t sound long, luckily because we had feature flags in place to help out. But I remember my tech lead standing behind me while we were waiting for the change. He was counting up the minutes of the outage and multiplying the thousands of dollars per minute we were likely losing as a result.
If your website makes money, outages can cause significant losses. Either through the money lost or through the damage to your brand caused by the customer perception of poor availability. Thus it’s very important to optimize for!
CAP Theorem
CAP Theorem refers to a theory that each application can only optimize for 2 of the 3 following things:
- Consistency: All users will see the same data
- Availability: All users will be able to access the data when they want to
- Partition Tolerance: Network failures that are bound to happen in distributed systems