Microsoft Azure is designed to minimize downtime, and Microsoft offers guarantees about the maximum amount of downtime service users should expect in the form of Service Level Agreements. It provides services and guidance that help Azure users to decrease downtime and offers better SLAs when Azure users take advantage of redundancy features built into the platform.
Before we explore some of the Azure SLAs and what users can do to improve availability, it’s useful to understand what connectivity and availability numbers mean in real terms.
What Does ‘Four Nines’ Mean for Azure Uptime?
Availability is often expressed as a percentage of a period of time. A service might be available for 99 percent of the time in a given month; on Azure, it’s usually the uptime percentage for a billing month.
You may also hear uptime expressed in terms of “nines” — four nines or five nines. Four nines means 99.99 percent, five nines means 99.999 percent, and so on. The SLA for Azure VMs guarantees three nines uptime for a single instance, so 99.9 percent. Azure’s SLAs also guarantee availability in proportions other than nines, so it might guarantee 99.95 percent connectivity.
It’s not easy to get an idea about how much downtime to expect when we’re talking about nines or percentages, so it’s useful to convert them into days, hours, and minutes. We’ve rounded these figures to the nearest minute.
|Uptime||Downtime Per Month||Downtime Per Year|
|99%||7h 18m||3d 15h|
As you can see, adding an extra 0.9 or 0.5 percent to the uptime makes a big difference to the total downtime. If you host business-critical apps on Azure, it could also make a big difference to your business’s revenue and productivity
Exploring Azure SLAs
Except for free tiers, each Azure service has an SLA; it’s worth taking the time to understand the SLA of the services you use, not least because the uptime guarantees change depending on how you use the service, which can be a useful guide to optimizing the availability of your apps.
Azure VMs SLA
As we mentioned earlier, a single-instance virtual machine guarantees 99.9 percent VM connectivity, which means the VM has a two-way connection with IP addresses, including public IP addresses.
But, if two or more virtual machines are deployed in an Availability Set, the guaranteed connectivity rises to 99.95 percent for at least one instance. If VMs are deployed in two or more Availability Zones, guaranteed connectivity rises again to 99.99 percent. Deploying instances in different Availability Zones reduces expected downtime by a factor of ten. If uptime is a primary concern, Availability Zones are the key to minimizing downtime and service disruption. As always, there is a tradeoff between cost and availability: it might not be cost effective to deploy less important infrastructure across Availability Zones, whereas for applications that support the business’s main revenue stream, downtime of even an hour a month could be enormously expensive.
Azure SQL Database
Azure SQL Database is a managed relational database service. Its SLA offers a variety of availability guarantees, depending on whether the user takes advantage of optional redundancy features.
The DTU-based basic, standard, and premium SQL Database tiers offer an Uptime SLA of 99.99 percent — that’s an adequate uptime guarantee for most purposes. However, for businesses that require higher uptime guarantees, Azure provides Business Critical tiers, which leverages replication across availability zones to offer a higher uptime SLA of 99.995 percent—that’s a maximum annual downtime of less than half an hour.
The Limitations of Service Level Agreements
Service Level Agreements are a useful guide, but they aren’t cast-iron guarantees of availability—it’s not unusual for unforeseen events to cause significant outages. Furthermore, an SLA’s availability guarantees doesn’t guarantee your app’s availability; they pertain to Azure infrastructure, not the code that runs on that infrastructure.
Consider a situation in which a web application hosted on a VM is hit by a traffic spike that consumes its available resources, reducing performance and causing the app to be unavailable for some users. From Microsoft’s perspective, it is fulfilling its SLA because the VM and its network connection are available and working as intended. You won’t be notified because, as far as Azure is concerned, nothing is wrong.
However, Azure does provide a wealth of metrics, logs, and monitoring services, including Azure Monitor, to help businesses keep track of the status of their infrastructure and applications. Azure Monitor lacks out-of-the-box notifications for the information businesses care about most, so they need to configure alerts and notifications relevant to their applications and infrastructure, including for performance and availability issues we discussed in the last paragraph.
At VIAcode, we build end-to-end Azure monitoring and incident management solutions that give businesses the infrastructure visibility they need to optimize performance and availability. Led by former Microsoft employees, VIAcode’s Azure managed services team offer best-in-class Azure solutions for small and medium businesses.