![Complaring Disaster Recovery Plans: Cloud vs On-Premise](https://static.wixstatic.com/media/981170_0c1d78c7cded43f9aa9c58d5f3dbfa03~mv2.png/v1/fill/w_980,h_565,al_c,q_90,usm_0.66_1.00_0.01,enc_avif,quality_auto/981170_0c1d78c7cded43f9aa9c58d5f3dbfa03~mv2.png)
Disaster recovery is a process to recover from such unforeseen, unplanned events that impact our business. There are various factors that can influence disaster recovery plans, most critical among them is physical distance and latency between the DC(Primary Datacenter) and DRC(Disaster Recovery Site).
DC and DRC sites must not reside in the same disaster zone. For example, as per ISO27001, the DC and DRC must be at least 40 km away from each other. Some availability zones on clouds might not abide by this rule. If you have a DC/DRC requirement be sure to validate the physical distance between your cloud provider's availability zone.
Likewise, for network based data transmission the latency between sender and receiver must be < 1 ms. Data replication or mirroring between multiple sites is a must for faster recovery and lower data loss from any disaster.
That is why the latency between DC and DRC must be less than 1 ms. This can be achieved with dedicated interlinks between both sites. Ensure that the availability zones on your cloud are connected via dedicated links. Do not rely on site to site vpns, as it does not guarantee any bandwidth and latency.
The Disaster Recovery Checklist:
1. BCP or Business Continuity Plan:
A business continuity plan (BCP) is a document that outlines how a business will continue operating during an unplanned disruption in service.
This plan must document the potential risk factors and corresponding mitigation policies. It must be reviewed regularly to keep it inline with changes that happen in the architecture.
BCP is a key element of DR process as it defines the availability level of business and hence the software infrastructure.
Cloud-Based BCP:
Cloud solutions integrate seamlessly with BCP due to their flexibility and accessibility. Cloud providers offer automated failover and failback processes, ensuring business continuity with minimal manual intervention.
The ability to access data and applications from anywhere with an internet connection also supports remote work scenarios, enhancing overall business resilience.
On-Premise BCP:
On-premise DR can be effective if well-planned and regularly updated. However, it requires meticulous coordination and significant internal resources to manage failover processes and maintain backup sites.
This can be challenging, especially for smaller organizations with limited IT staff. The reliance on physical infrastructure can also be a vulnerability if the primary site is compromised.
2. RTO/RPO:
RTO or Recovery Time Objective is the goal your organisation sets for the maximum length of time it should take to restore normal operations following an outage or data loss.
RPO or Recovery Point Objective is your goal for the maximum amount of data the organisation can tolerate losing. In Short,
RTO means - How fast can you recover
RPO means - How much can you recover
For example, if your SLA is 99.99% meaning yearly downtime of 52m 35s. So your RTO becomes approximately 1 hr, that means, in the event of a disaster, you must be capable of recovery within 1 hr. Likewise , an RPO of lets say 5 mins, means you must recover the data up until 5 mins from when the disaster occurred, in other words there can be a data loss for 5 mins.
Good news is the RPO and RTO numbers are what you can decide. So fix only that much that you can provide or else you might default on legal and regulatory terms if you cannot prove it with actual results.
Cloud-Based RTO/RPO:
Cloud DR solutions often excel in providing lower RTO and RPO due to their inherent scalability and resource availability. Cloud providers use geographically distributed data centers to ensure high availability and redundancy. This means that in the event of a disaster, resources can be quickly reallocated, and data can be restored from the nearest data center, minimising downtime and data loss.
However, this might not be true for all Cloud regions. Newly launched regions especially in South East Asia have been found to not comply with the basic requirements of physically separated DC/DR. In one of the past projects, GCP Indonesia had only one region launched with AZs located physically within the same IT park.
Cloud availability zones and regions may imply different meanings for different cloud providers. DR must be tested regularly on cloud as its done on premise.
On-Premise RTO/RPO:
On-premise solutions, while potentially offering control over RTO and RPO, often require significant investment in infrastructure and ongoing maintenance. Achieving low RTO and RPO with on-premise solutions involves having redundant systems and high-speed backups, which can be cost-prohibitive for many businesses. Additionally, recovery speed can be affected by the physical state of the hardware and the local environment.
3. Backups:
To meet DR compliance you will need to have on-line and scheduled backups or off-site backups for critical systems and data. Weekly full backup, daily diffs and 2 hourly transaction backups or better must be placed.
Backups must be encrypted if necessary. Support must be considered for low-cost encrypted archives if available.If required, backup policy must include specific provisions for transactional DB and auth systems ensuring consistency at restore.
These backups must be tested by restoring regularly to prove they work.
Cloud Based:
Cloud backup solutions offer automated, frequent backups with minimal disruption to operations. Data is often stored in multiple locations, providing redundancy and reducing the risk of data loss. Additionally, cloud backups are scalable, allowing businesses to adjust storage capacity as needed without investing in additional hardware.
However, Organisations with Data localisation and confidentiality compliance must consider where these backups are being stored by the Cloud provider. It might break compliance regulations like OJK or ISAE3000 if the backups are stored outside the country or being shared with Cloud partners. That’s where a mandatory NDA comes into play.
On-Premise:
On-premise backups require significant investment in hardware and software. They also necessitate rigorous processes to ensure data is backed up regularly and stored securely. Physical backups can be vulnerable to local disasters, such as fires or floods, unless they are regularly moved to an off-site location. The management and maintenance of these backups can be resource-intensive.
4. Compliance:
Technically a DR site can be anywhere as long as it provides required connectivity and latency. For instance, GCP Singapore can be DR for GCP Jakarta. Technically there is nothing wrong with that. But it becomes wrong when you are in an industry line Banking where as per OJK(Financial Services Regulator of Indonesia) regulation any data must not leave the country.
That is why ensure you abide by the law of the land. Have a well defined and detailed NDA with cloud providers on data privacy and localisations.
Ensure your DR plan is in line with compliance and regulatory needs such as
Data residency and localisation.
Data Privacy & Confidentiality
Data Sharing Policies
DR Compliance On Cloud:
Leading cloud providers offer compliance certifications and adhere to industry standards such as GDPR, HIPAA, and ISO/IEC 27001. They provide tools and features to help businesses meet regulatory requirements, including data encryption, audit logs, and secure access controls. This can significantly reduce the compliance burden on businesses.
However, country specific regulations such especially for PII data are still lacking on Cloud. One of my Financial services clients in Indonesia had migrated back to on-premise because the Cloud provider did not have a region there.
Though the provider will tell you they follow the rules of the land, it's better to be self-prepared with all the right configurations, policies and documents.
DR Compliance On-Premise:
On-premise solutions require businesses to ensure their infrastructure and processes comply with relevant regulations. This involves implementing robust security measures, conducting regular audits, and maintaining detailed records. While this approach offers control, it also places the full compliance responsibility on the organisation, which can be challenging and resource-intensive.
5. DR Drills:
DR Drills are mandatory at least once in a year and must be conducted to test and prove the RTO/RPO numbers. Regularly tested BCP and DR plans on evenly distributed and fully-independent sites need to be recorded and certified by auditors especially for applications dealing with essential services like banking, healthcare etc. This also builds confidence in in house processes.
DR Drills On Cloud:
Cloud-based DR drills can be more straightforward and less disruptive. Many cloud providers offer simulation tools that allow businesses to test their DR plans in a controlled environment. These simulations can be automated and scheduled regularly, providing valuable insights into potential weaknesses and ensuring readiness.
DR Drills On-Premise:
On-premise DR drills require careful planning and coordination, often involving significant downtime and resource allocation. These drills are critical to ensure all components of the DR plan work as intended, but they can be disruptive to normal operations. Additionally, they need to be performed more frequently to account for changes in the IT environment.
Irrespective of the hosting, an organisation gets DR certified only when the RTO & RPO defined by them in their DR plan is proved with real time DR Drills.
If you have defined your RTO/RPO as 1hr/10mins, you must ensure you can recover from any disaster within 1hr and you can recover the data from until 10 mins since the disaster time.
Conclusion:
Some of the key factors that will decide your DR certification are:
Application should be evenly distributed into fully-independent physically distanced data centres.
Multi Site Data Replication for DR.
You will need dedicated interlinks between datacenters to achieve latency < 1ms needed for data replication.
BCP and DR Plans should be tested.
Have a well defined and detailed NDA with cloud providers on data privacy and localisations.
When disaster strikes, whether due to natural calamities, cyber-attacks, or human errors, the ability to quickly recover data and resume operations is critical. This is where disaster recovery (DR) comes into play.
Cloud-based DR offers flexibility, scalability, and often superior RTO and RPO, making it an attractive option for many businesses. It also simplifies compliance and reduces the burden of regular DR drills. On the other hand, on-premise DR provides control and can be tailored to specific organisational needs, but it requires significant investment and resources.
Ultimately, the choice between cloud and on-premise DR depends on factors such as budget, regulatory requirements, and the specific needs of the business. In many cases, a hybrid approach, leveraging the strengths of both, can offer the most robust protection against disasters.
If you like this article do like 👍 and share ♻ it in your network and follow Kamalika Majumder for more.
Don’t let your best-selling product suffer due to an
unstable, vulnerable & mutable infrastructure!
Thanks & Regards
Comments