skip to Main Content
Ten Tips For Disaster Recovery In The Public Cloud Era

Ten Tips for Disaster Recovery in the Public Cloud Era

As information continues to become increasingly valuable to companies worldwide, creating and implementing an Advanced Disaster Recovery (DR) plan is becoming the critical path that innovative organizations need to embark on to adopt Disaster Recovery as a Service (DRaaS).

The traditional approach to backup and DR is unequipped to tackle the challenges faced by evolving organizations. Policies around data-retention and security requirements, designed to facilitate and support business continuity plans, continue to evolve rapidly. These factors make it essential for IT and data management strategies to move beyond traditional backups, tape backups, and various outdated methods in order to support the complex requirements of the business to survive beyond the point of a disaster, all while keeping cyber-security intact. Here are some good tips to consider as part of your strategy. You’ll find that some of these tips are not typical in DR planning, but then again, we’re not in typical times.

COVID-19 & Social Unrest: The elephants in the room

Let’s face it, it’s not often that businesses face a pandemic. However, it’s more than likely that businesses will deal with natural disasters and cybersecurity threats such as malware attacks, not to mention the typical network/data outages before it is faced with another pandemic of this magnitude. If an organization wishes to ensure a smooth recovery process and continuity of operations, it is critical to complete a DR risk assessment and rehearse the recovery strategy used within the Business Continuity Planning (BCP)/DR procedures.

Let’s start by asking ourselves the right questions. As an IT leader, what guarantees do you have in place that give you confidence that your plan will work? How can you be sure that your infrastructure, data and applications will recover? Are you confident your storage and retention plans meet the needs of the business to operate at a reasonable level of business continuity? What about the specific needs of your employees in light of events such as COVID-19 and external social impacts that have changed the world?  Does your plan leave room for this new dynamic? Are your employees and their connectivity to your infrastructure part of your DR plans once you no longer control speed/bandwidth?

Furthermore, it is critical to determine how you will manage strategic vendor relationships during this time. Technical representation from major OEMs may be limited or at least prioritized based on contract SLAs that may not always be favorable to you. Dispatch times will likely see an impact on on-site arrival, and supply lines depending on parts arrival may be adversely affected.

Here are some things to keep in mind as you develop your strategy while considering innovation:

1. How will current socioeconomic conditions and considerations impact your disaster recovery and business continuity plans? Create a living, people-powered plan.

2020 started with a bang. By now, we have all learned more about the unfortunate impacts of Covid-19 on families and business around the world, all of which is still ongoing. This is all quite a bit more than we bargained for when we set our 2020 new year’s resolutions. If that wasn’t enough, by now all of us have witnessed through social media and possibly within our neighborhoods, massive amounts of social unrest, specifically we’re seeing the fallout and effects of systemic racism and its effects on the black community, which is at its core is a human issue. This is heartbreaking, and it is challenging the very fabric of our nation to think beyond the borders of our self-contained ideology, to find its moral compass, and truly entertain the first people focused paradigm shift of this generation.

With all of that said, It stands to reason that leaving room to actively identify, be open to and address, and plan to support of your employees through both the disasters that you are prepared for as well as the ones that you are not is should inform your strategies in 2020. Moments matter, and people matter most, it is extremely important — now more than ever.

Ensuring that you understand and are respectful of the idea that while the business wishes to recover, and while the speed and efficacy of how we do so may be enhanced by technology and the cloud, ultimately your people are the real value to a business. People, power innovation, and in the end your people decide if your business survives this disaster or not.

How do you do this? Ensuring that you are attuned and respectful of the personal impacts, sacrifices, and factors most important to your employees, their families, and loved ones prior to the moment you declare a disaster should be your first task as a business when laying out your DR Plan. Of this, I am certain.

In the world of technology and operations, we’ve all heard of and have referenced the ITIL (Infrastructure Technology Information Library) methodology of People, Process, Technology. Why would your BCP/DR plans be distanced from this people-first methodology in the face of a disaster? Think about it.

2. Simplify your approach: Work with business stakeholders on your SLAs (recovery point objective and recovery time objective)

First, let’s get over the idea that DR plans should be complicated. Say it with me; disaster recovery leverages complex technology but we must divest from the idea that it needs to be a complicated process. Complex technology should not equate to complicated solutions, nor to complicated execution. A good approach is to constantly engage the business to gain understanding and alignment to its tolerance for the loss of data and its requirement to operate and maintain/generate revenues as a primary focus. Establishing the minimum and optimum level of a required operational efficiency will ensure that you are measuring what truly matters should the plan need to be enacted. Understanding the recovery time objective (RTO) and recovery point objective (RPO) of information will help you anticipate the measure of data loss you can sustain.

3. The COVID-19 factor: Is your remote workforce connectivity and collaboration requirements data part of your RPO/RTO planning? It should be.

In a current and eventually post-COVID-19 world, connectivity will continue to be the new currency of cloud. Overnight, entire technology industries and applications have been deemed critical to maintain significant productivity in remote settings. A great example is Microsoft Teams & Cisco Webex Teams both of which has seamlessly facilitated remote collaboration within various organizations quite literally overnight. The quality of your users’ internet connection and their ability to connect to your applications or infrastructure will govern the efficacy to which they execute their functions. This level of access and collaboration will also dictate user experience and how quickly your teams can be ready to serve your clients. Don’t over complicate things with file-servers and even more access requirements for your team to manage in the middle of your DR. Considering that with other solutions for file management, you will also need to keep your data and files in sync to allow teams to collaborate and email is not the way – why not simplify this effort? This is where collaboration software like Microsoft Teams and Cisco Webex Teams shines. You users will thank you for one less thing to worry about in the middle of a disaster.

What about the edge of your networks and application delivery for a team(s) no longer centralized in your offices? As technologists and business leaders, we often calculate business continuity in order of magnitude – however, it has become increasingly clear that when your remote workforce edge connectivity isn’t a factor in your plans, you inevitably significantly decrease productivity. Taking a serious look at SD-WAN edge computing technology and application edge computing will also be important in your plans. Having this discussion with your technical partners and team as part of your disaster recovery plan will be important.

4. Validate your data: Data corruption and recovery

Check your backups and check them often! On the off chance that a data store comes up short when needed, there may be information irregularities and patterns indicating the loss of data integrity, particularly if the information was duplicated. Good news; cloud-ready tools already exist to validate this for you. Ensure that you are using them every day rather than waiting for when you need the data. Ensuring that your teams have implemented a strong data validation process, alerting and response will be critical to your success before declaring a disaster. Just keep your plans simple and always ask the commonsense questions like; Is the data you require to recover present? Is it valid? When was the last time you tested a known good copy of this data?

5. Put your plan to the test & include your 3rd party maintenance vendors like Cisco & Microsoft.

Don’t rely solely on the traditional twice-yearly DR-Testing strategy. In the public cloud era, cloud platforms combined with powerful 3rd party tools have the ability to test and load test your environment. Run your tabletop exercises using only your business-critical data and application regularly, monthly if possible quarterly in at worst. There is no such thing as too much preparation in terms of developing the technical and muscle memory required to execute well when the moment arises.

6. Disrupt traditional IT stack: Decentralize your production infrastructure

COVID-19 has changed the way we think about DR; therefore, it stands to reason it should change the way you structure and manage your production environments, which then impacts how you prepare for and recover from a disaster. Invest time in understanding challenges in your IT environment, but never stop asking “why.” Challenge your teams to think beyond your server rooms, your VMs, your maintenance and cooling plans and ask yourself, “why?”. Why do you need these servers and the data this close to your users? Why do you need this database to be on-premise, which increases the opportunity for data corruption and data loss during replication? Why should we accept the argument that cloud adoption as part of our network and infrastructure plans is to lose control over the quality of replication/speed/ETL? Challenge the arguments we’ve all come to know as “truths” and you’ll find that your traditional approach with regard to your production environment and what should be considered essential to your DR plans may very well be misconceptions instantly cleared up by educating your teams about cloud.

7. Make Disaster Recovery as a Service (DRaaS) is part of your strategy

If the introduction to a cloud platform was an innovative disruption to your backup and DR strategies, consider DRaaS a lightning bolt. As a technology leader, I find there’s one thing I never encounter – an organization leveraging DRaaS, leveraging things like Microsoft Azure Site Recovery as part of their cloud solution and deciding that their lives were better prior to making this decision. It just doesn’t happen in my experience. All of the previous considerations discussed in this article have associated hard and soft costs. Let’s talk about DR Costs for just a moment. By far, the number one conversation you’ll encounter as experts talk to you about DRaaS is the notion that being able to divest from fixed DR costs to a variable cost model, as well as automatically managing how much of your data needs to be touched by identifying only the changes and keeping those deltas in sync – this is a valid benefit and is one discussion you should pay close attention to. You also gain the primary benefit of mass interoperability which helps you and your teams solve for technology compatibility challenges as your organization begins to use applications from another data center or cloud region, while automatically and consistently testing your data and application integrity. As a result, you gain more assurance that your application will work once you declare a disaster.

8. Secure & monitor your backups

To discuss DR plans in 2020 without anticipating and planning for the fact that the disaster you will most likely face is in relation to security is among the biggest mistakes you might make as an organization. Ensuring security and alerting is part of your plan might seem like a no-brainer to many; however, never underestimate how much focus is given to some of the most complex aspects of disaster recovery while omitting the very foundations of basic security strategy.

We need to be informed if /when a portion of your data may be compromised or at an increased risk. A good approach would be:

  1. Ensuring your backups are encrypted, with multi-factor authentication (MFA)enabled
  2. Ensuring the right Identity Access Management (IAM) privileges are assigned to the right people. Always challenge why someone needs access to your data and for how long.
  3. Configure alerts and notifications for critical operations that affect the overall availability of the data you will depend on in a disaster.

9. Don’t lose focus on replication, Put your data in multiple Microsoft Azure regions. 

The cloud is not infallible—you still need to have a replication plan. You and your cloud service provider must ensure your DR strategy accounts for even the most unlikely disaster, which in this case is the unavailability of your primary IaaS data stores (which accounts for more than your local infrastructure region). First, understand what Azure GRS, RA-GRS, and LRS are. In simple terms;

  1. Locally redundant storage (LRS) ensures that your data is replicated three times within a single Azure data centre. This is the default azure zone option.
  2. Geo-Redundant Storage (GRS) comes with all the features of LRS you now also get secondary data storage in a neighboring Azure region however your data will not be readable from this neighboring zone, only written there as a means of added data loss prevention.
  3. Let’s talk about RA-GRS which depending on the scale of your business is a critical success factor. Read-Access GRS now allows that second copy (which is part of standard GRS) to now be readable. Well, what does that mean in the event of a disaster? Glad you asked! This means that in the event of a disaster or azure zone availability issues, endpoints, and applications if configured properly will be able to read data from the RA-GRS storage zones. What about SLAs? Write access SLAs would remain at 99.9% but read access for hot data stores and applications that are configured to use RA-GRS bumps up the SLA to 99.99%. The difference between the three will depend on your organization’s needs and tolerance during a disaster.

It is important to note that you won’t be able to choose the RA-GRS/GRS secondary region Microsoft chooses this for you, however, you’ll have visibility and access to the data. Lastly, this replication is asynchronous which means you’ll need to ensure that you factor your RPO/RTO requirements into the plan.

This got mildly technical. Just remember, you and your service provider need to have these discussions. As the business, all you need to focus on is what are your RPO/RTO requirements. A fantastic service provider like Vology (shameless plug) can help you!

10. Be realistic in your recovery expectations: Define success in terms of disaster recovery

Never feel that your DR plan is locked in and cannot change. Technology changes rapidly while it is still required to be a precise endeavor; as such, the approach must be precise. However, this does not mean that it should not be adjusted for reality based on the needs of your business and the expectations for what success looks like. If rapid deployment, low-cost retention, ease of use and data portability are important to your business, enlisting supporting interoperable tools such as VEEAM to manage and pair with Azure Backup would be a strong candidate with which you can isolate and secure backup data from production while maintaining data integrity across multiple geo-redundant regions and subscriptions.

Why is this important? Speed to recovery and setting expectations for success is arguably one of the most important parts of your disaster recovery plan. You should feel free to use all applicable cheat codes (integrated and native tools like VEEAM) when and where possible! If connectivity and access are the new currencies of cloud, then data is its yield. Finally, set expectations within your organization around these recovery expectations and share what success looks like. In doing so, this will ensure everyone is measuring what matters in terms of recovering from a disaster.

For more details or to have one of our DR specialist get in touch with you, please visit our website at