How to build high availability(HA) for modern infrastructure

Apr 5, 20243 min read

Online businesses must ensure their services/products are available for operation and use as committed or agreed to with their customers. To full-fill this commitment businesses must define some Key Performance Indicators (KPIs). These are measurable values that demonstrate how effectively a company is meeting its business commitments.

Modern infrastructure needs high availability to ensure it meets availability KPIs.

The choice lies on how much availability you need to sustain business goals which are are measured by these KPIs :

Service Level Agreement(SLAs): The agreement you make with your clients or users on service uptime & downtimes.
Service Level Objective(SLOs): The objectives you must hit to meet that agreement
Service Level Indicators (SLIs): The real numbers on your performance.

For example, (ref: https://uptime.is/)

SLA level of 99.99 % uptime/availability results in the following periods of allowed downtime/unavailability:

Daily: 8s
Weekly: 1m 0s
Monthly: 4m 22s
Quarterly: 13m 8s
Yearly: 52m 35s

To achieve the KPIs you need an H/A model where all services and data are actively or passively available across multiple locations.

High Availability(HA) Models:

There are various types of availability models, as above, depending on how you want communication to happen between various layers of infrastructure. It can either be data driven or state driven.

It can be an active-active model as shown in the image above.For instance, multiple availability zones or zonal/regional cloud services. Broadly I classify them into two categories:

Data Driven: Where the data and services are actively replicated or mirrored across multiple sites so that if one site encounters a failure the services are still available. These can be active/active sites where both serve the load simultaneously, or can be active and hot standby where one is primary while other is an active mirror, or it can be a blue/green site mainly used for zero downtime upgrades or deployments. The choice really depends on what availability you are committing for your services. Will they be always online(100% H/A) or can have less than a 100% availability.
State Driven: In this model the stateful services are kept on one site mostly on-premise and the stateless services are kept on a cloud. These kinds of H/A models are mostly seen in organisations where data privacy and localisation is a requirement such as in Banks, financial institutions or healthcare etc.

Irrespective of which H/A model suit you, one key component that plays a key role in any availability configuration is Load Balancers for both load balancing and sharing.

Load Balancing & Load Sharing:

You need load balancing and sharing between these multisite applications or clusters so that if one goes down the load switches seamlessly to another without any downtime for customers.

There are two kinds of load balancers available in most clouds:

Network or TCP load balancers: These work for layer 3 where routing and switching of data occurs between various devices like routers, firewalls, etc.

Application or http load balancers: These are L7 appliances which provide a means user to access information on the network using an application.

Based on your use cases you will need one or both times.

Availability Zones:

Every cloud hosts their physical infrastructure or hardware in multiple data centers within the same city or country. These data centers are referred to as availability zones(AZs) and the city or country they are in are referred to as regions.

Cloud provides capability of stretched infrastructure across these AZs. For example, you can create a database cluster that has nodes spread across multiple AZs.

For H/A active/active setup, multi az is a must on cloud. Always use multi az while configuring your network, system or storage services on cloud.

Summary:

Create an active-active infrastructure across creating two or more locations.
Replicate data across these sites in real time.
Load balancing & load Sharing are essential for ensuring the availability levels.
Dry run your High Availability(HA) clusters at least once a year to ensure you are meeting the SLAs defined.

The aim of almost every business is to grow. That’s why it’s baffling to find so many infrastructure services not equipped to handle scaling up.

The 10-Factor Infrastructure is set up for this from the get go, meaning that however big your business gets, there’ll be no need for any expensive, large-scale infrastructure changes later.

If you like this article, I am sure you will find the 10-Factor Infrastructure even more useful. It compiles all these tried and tested methodologies, design patterns & best practices into a complete framework for building secure, scalable and resilient modern infrastructure.

Don’t let your best-selling product suffer due to an unstable, vulnerable & mutable infrastructure.

Be fit to launch & scale on a compliance ready cloud from Day 1

with 10factorinfra

Thanks & Regards

Kamalika Majumder

The 10-Factor
Infrastructure