top of page

How to build a BCP/DR compliant modern infrastructure

Updated: Jul 30, 2024

Disaster Recovery For Modern Infrastructure


Disaster recovery is a process to recover from such unforeseen, unplanned events that impact our business drastically. There are various factors that can influence disaster recovery plans, most critical among them is physical distance and latency between the DC(Primary Datacenter) and DRC(Disaster Recovery Site). DC and DRC sites must not reside in the same disaster prone zone. 


For example, as per ISO27001, the DC and DRC must be at least 40 km away from each other. Some availability zones on clouds might not abide by this rule. If you have a DC/DRC requirement, be sure to validate the physical distance between your cloud provider's availability zones.


Likewise, for network based data transmission the latency between sender and receiver must be < 1 ms. Data replication or mirroring between multiple sites is a must for faster recovery and lower data loss from any disaster. That is why the latency between DC and DRC must be less than 1 ms. This can be achieved with dedicated interlinks between both sites. Ensure that the availability zones on your cloud are connected via dedicated links. Do not rely on site to site vpns, as it will not guarantee any dedicated bandwidth and latency.

Here are some more key components that must be included in a robust DR plan:


Business Continuity Plan:

A business continuity plan (BCP) is a document that outlines how a business will continue operating during an unplanned disruption in service.

This plan must document the potential risk factors and corresponding mitigation policies. It must be reviewed regularly to keep it inline with changes that happen in the architecture.

BCP is a key element of DR process as it defines the availability level of business and hence the software infrastructure.


RTO/RPO:

  • RTO - How fast can you recover

  • RPO - How much can you recover


RTO/RPO are documented and in-line with your business availability and SLA requirements.

For example, if your SLA is 99.99% meaning yearly downtime of 52m 35s. So your RTO becomes approximately 1 hr, that means, in the event of a disaster, you must be capable of recovery within 1 hr.

Likewise , an RPO of lets say 5 mins, means you must recover the data up until 5 mins from when the disaster occurred, in other words there can be a data loss for 5 mins.

Good news is the RPO and RTO numbers are what you can decide. So fix only that much that you can provide or else you might default on legal and regulatory terms if you cannot prove it with actual results.


Backups:

As old as it may sound, it's still a compulsory requirement for many regulatory compliances and believe it or not sometimes when crazy hits the sky, backups become life saver. Here are some important tips to keep in mind:


  • On-line and scheduled backups or off-site backups for critical systems and data, weekly full backup, daily diffs and 2 hourly transaction backups or better must be placed.

  • Backups must be encrypted if necessary. Look for low-cost encrypted archives if available in the cloud provider.

  • If required, backup policy must include specific provisions for transactional DB and auth systems ensuring consistency at restore.

  • Image, file and db backups must be in place where required.

  • These backups must be tested and restored regularly to prove they work.


BCP & DR Compliance:

Ensure your DR plan is in line with compliance and regulatory needs such as

  • Data residency and localisation.

  • Data Privacy & Confidentiality

  • Data Sharing Policies

Technically a DR site can be anywhere as long as it provides required connectivity and latency. For instance, GCP Singapore can be DR for GCP Jakarta. Technically there is nothing wrong with that. But it becomes wrong when you are in an industry line Banking where as per OJK(Financial Services Regulator of Indonesia) regulation any data must not leave the country.

That is why ensure you abide by the law of the land. Have a well defined and detailed NDA with cloud providers on data privacy and localisations.


Disaster Recovery Drills with DC/DRC:


  • DR Drills are mandatory at least once in a year and must be conducted to test and prove the RTO/RPO numbers.

  • Regularly tested BCP and DR plans on evenly distributed and fully-independent sites need to be recorded and certified by auditors especially for applications dealing with essential services like banking, healthcare etc.

  • This also builds confidence in in house processes.


You get DR certified only when you prove what you define as RTO & RPO. Let me explain,

If you have defined your RTO/RPO as 1hr/10mins, you must ensure you can recover from any disaster within 1hr and you can recover the data from 10 mins since the disaster time.


Summary:

  • Application should be evenly distributed into fully-independent physically distanced datacenters.

  • Multi Site Data Replication for DR.

  • You will need dedicated interlinks between datacenters to achieve latency less than 1ms needed for data replication.

  • BCP and DR Plans should be tested.

  • Have a well defined and detailed NDA with cloud providers on data privacy and localisations.


A lot goes unasked behind the ease of accessibility of infrastructure as service. It only comes back to haunt you when you have to prove its resilience and sustainability.

That’s why in the 10-Factor Infrastructure DR is not just a plan, it’s a first class member of infrastructure design. 


If you like this article, I am sure you will find the 10-Factor Infrastructure even more useful. It compiles all these tried and tested methodologies, design patterns & best practices into a complete framework for building secure, scalable and resilient modern infrastructure.  


 

Don’t let your best-selling product suffer due to an unstable, vulnerable & mutable infrastructure.



 


Thanks & Regards

Kamalika Majumder


12 views0 comments

Recent Posts

See All

Comments


Join the 10factorinfra Club

Learn about secure, scalable & sustainable modern infrastructure development & delivery.

Thank You for Subscribing!

©2024 by Staxa LLP. All Rights Reserved.

bottom of page