A Journey to Multi-Cloud – Disaster Recovery

Girish Raja

May 5, 2022

This is Girish Raja here again resuming the series about VMware Cloud from where I left in my previous journal.

We saw in our previous blog how VMware Cloud addresses the challenges faced by enterprise in their adoption towards multi-cloud, land there at a rapid pace and expand into cloud. Now let’s dig one level down and understand the use cases suited for VMware Cloud to be able to relate to the customer business requirements. Broadly speaking we advocate 4 Areas for VMware Cloud adoption, which if drilled deeper would lead to more use cases. Those areas are:

  1. Disaster Recovery
  2. Data-center Migration
  3. Data-center Expansion
  4. Application Modernization
Let’s explore them one by one, this series will concentrate on the Disaster Recovery and the offering surrounding them. As part of my initial blogs, I will write on the overview aspects and deep dive them one by one as it progresses…

Disaster Recovery :

For every enterprise, Disaster Recovery is considered important and part of the availability planning. Disasters and Cyber-attacks are prevalent, but many organizations are not ready to recover from them. In the past, there were organizations that had gone bankrupt, due to an ineffective or no DR strategy in place.

Typically, Disaster Recovery is only one part of the entire scheme of things, Business Continuity Planning along with Disaster Recovery (BCP-DR) is the exact strategy that enterprises would look for when it comes to production-grade applications. BCP-DR is two parts of the story, one is BCP which deals with Availability or the amount of time within which the business can be brought back online i.e., in simple terms RTO (Recovery Time Objective). The other part DR emphasizes on the recoverability aspect and the latest data that would be available to recover in case of an eventuality, which is RPO (Recovery Point Objective).

RPO depends on various factors like Incremental Changes in Data, Amount of Data to be transferred, Bandwidth between sites, frequency of backup per day (in case of backup), Bandwidth optimization, compression, etc. RTO depends on the manual effort to bring the recover data online and ensure the business is back online. Depending on the two parameters, BCP-DR solution is architected to align with the organizational strategic requirements.

Disaster Recovery
as a scenario is just one, but when we deep dive further we can see many scenarios where an enterprise can be further benefited by the VMware Cloud offering. Some of the DR Scenarios that I can visualize are:

1 New DR Infrastructure Deploying a new DR Infrastructure for an enterprise with no existing DR
2 Complement Existing DR Build another DR Infrastructure that can complement existing DR to expand DR Infrastructure
3 Replace Existing DR Move existing DR Investments to Cloud
4 DR On-Premises Running Production on Cloud or on-premises and looking to have the DR on premises infrastructure
5 DR on Another Region / Hyperscaler Deploying DR Infrastructure on another region of same Cloud where Production Infra is running or have a DR on another hyperscaler

Recoverability is an important factor for DR strategy. An application can fail due to many reasons like DC Failure, Hardware failures, Rack Failures, Application / DB Failures, OS crashes, Virus Attacks, Ransomware, File System corruption, Admin Errors, etc. An effective DR solution must be able to overcome each of these failures and be able to recover the data to a specific point in time and bring back the applications for businesses to resume operations.

Today most of the enterprises have an effective DR Strategy and Investments in place, but most of them are not confident to trigger the DR plans. This makes them feel vulnerable about Business Continuity/Availability if a needful situation arises. The reasons for this are mainly due to manual DR process, unreliable scripts, infrequent testing, worst case sizing of DR Infra etc. To overcome these caveats, there is a need for a DR Orchestration tool is required which can replace the age-old DR Run Books or manual process with workflows that can be created once and reused multiple times. The usage of electronic workflows also minimizes the administrative overhead and can reduce the RTO needed to bring the applications back as soon as possible.

VMware Disaster Recovery Portfolio can align with all the scenarios and address the BCP-DR challenges faced by enterprises. The portfolio consists of 3 different solutions addressing a specific customer requirement based on their need

Let's overview them one by one ….

VMware Cloud Disaster Recovery :

VMware Cloud Disaster Recovery (VCDR) is an on-demand, SaaS-based disaster recovery as a service from VMware which includes the benefits of cloud economics as part of the offering. The solution allows customers to protect vSphere virtual machines running on-premises by replicating them to the cloud and recover them on the VMware Cloud Software-Defined DataCenter if a need arises. The important aspect to note here is that it is not mandatory to keep the infrastructure provisioned on the cloud while replicating.

Instead, the infrastructure can be provisioned and made available only when the DR Infrastructure needs to be brought up for testing or during an eventuality. Due to this, only the storage utilization cost is included under normal replication scenarios, and the DR Infrastructure cost comes in only for the time the hosts are brought up for testing or running the workloads as part of the DR exercise.

VCDR is billed to customers on per TiB of scale-out storage used on the cloud either as a monthly or yearly subscription. So, this solution is apt for most of the enterprises who would want the DR Site as an insurance to their primary on-premises infrastructure, and an RTO of >4hrs is good for their DR strategy. Alternatively, enterprises can also have a pilot light infrastructure provisioned and running on the VMware Cloud to ensure that we have the infrastructure pre-provisioned to be able to immediately bring up the workloads in the DR Site.

Typical scenarios for VMware Cloud Disaster Recovery are as below:

  1. Ransomware Protection
  2. Higher RPO and RTO Workloads
  3. Organizations looking to invest in DR for the first time
  4. Replace existing DR Infrastructure to reduce DC Footprint and optimize the resources with cloud economics
  5. DR for RoBo (Remote Office Branch Office) Sites
  6. DR on VMware Cloud on AWS

VMware Site Recovery Service :

VMware Site Recovery is Site Recovery Manager as a Service offered on VMware Cloud. This is an on-demand activation of Service on the VMware Cloud on AWS that allows customer to use vSphere Replication Engine to act as a data mover to replicate individual virtual machines from On-Premises to VMware Cloud. vSphere Replication allows enterprise to replicate at an RPO as less as 5 min to 24hrs and ensure the data is compressed and encrypted during the transit to ensure security of the data leaving the datacenter.

VMware Site Recovery Manager sits on top of the replication engine and provides the orchestration facilities like DR Plans, Resource Mappings (Storage, Network etc.), Online DR Testing, clean up and fail back options. VMware Site Recovery Services supports storage agnostic vSphere virtual machines to be replicated to DR Site on Cloud.

The aspect that needs to be understood while choosing VMware Site Recovery Service is that the SDDC needs to be provisioned and running for the replication to happen. This ensures that the RTO and RPO of the workloads will be minimum and the workloads can be switched or failed over to the DR Infrastructure with a single click of a button.

Scenarios where VMware Site Recovery Manager Services fits:

1. Lower RPO and RTO
2. Complement Existing DR which is running Site Recovery Manager
3. DR on Cloud but with lower RPO and RTO for Mission-Critical Production Workloads
4. DR for RoBo Sites

VMware Site Recovery Manager :

VMware Site Recovery Manager is the traditional DR Solution from VMware. This is not provided as a service and needs to be manually installed/configured on the Private / Public Cloud Datacenters. Site Recovery Manager uses vSphere Replication as the data mover to asynchronously replicate vSphere workloads to cloud or back to on-premises datacenter at an RPO as minimum as 5 mins. When used as a replication engine between private clouds, it also supports Storage Based Replication and Orchestrate them.

Typical Scenarios where Site Recovery manager is a fit are as follows :

1. DR on Cloud (VMC on AWS, Azure VMware Service, Oracle Cloud VMware Services, Google Cloud VMware Engine, Alibaba Cloud VMware Services, IBM Cloud, etc.)
2. DR from Cloud back to On-Premises
3. On-Premises to On-Premise DR Strategy

That’s it for this episode, up next will write on yet another interesting topic about Cloud Migrations With VMware Cloud Till then Stay Safe… Stay Healthy


Related Articles