Keeping the lights on

For many years I was responsible for what we used to call DRP – Disaster Recovery Planning. Back then, technology platforms were likely to be stand-alone, and while they were important, our business could keep going for a few hours until a failed system was restored. So, the process really was about planning what to do in the event of a technology disaster.

Jump ahead a couple of decades and things have changed.  Now we have multiple technology platforms feeding data to and from each other, and business volumes make it impossible to keep the operation current if any piece of the technology fails. And where at one time, only high-volume transaction-processing systems were in scope, reliance on other technology for decision support roles means that the entire technology suite has become business-critical.

The focus today is on BCP – Business Continuity Planning. This is so much more than a plan for how to “keep the lights on” no matter what. It is a detailed analysis of each piece of technology and the associated infrastructure to assess what the contingency plan is to maintain operations under virtually any circumstance. To support these plans, firms might implement alternate working locations, real-time system mirroring, multiple processors, dual data centres, redundant network connections, uninterruptable power supplies, diesel generators and more.

Offsite operations are required if the primary workspace is inaccessible or unusable for some reason, and most firms have an alternate site where the business can continue to operate if that primary site is not functional. For companies with operations spread across multiple locations, that alternate site might be one of these other locations, but for large firms it is common to have a dedicated offsite business recovery centre used only for emergency situations. In this case, the functionality of systems at the offsite location must be tested regularly, and generally, at least once a year, employees will have to work in the offsite location to make sure they are familiar with the location and the functionality available there.

These dedicated offsite recovery centres may be less critical now that so many employees can work at home, but a thorough business continuity plan is still likely to include a recovery centre in a location some distance from the primary work location.

A thorough BCP will contemplate every conceivable event and will provide a detailed plan for supporting business operations until the event has been resolved or mitigated; and then will continue to plan for a cascading sequence of events all occurring at the same time. 

Beyond a certain point, it may be impossible to maintain all business functions, in which case a triage process is required to determine which functions must be performed, and which can be deferred with the least impact.

And, if multiple events occur concurrently, or a major catastrophic event occurs (many readers may be too young to remember the Eastern Seaboard collapse of the electrical power grid in August 2003, but I am sure you all remember the Rogers outage on July 8 2022 which resulted in 12 million Rogers clients losing cell phone and internet access for a day or more) then some or all operations will be suspended, and now there is a disaster to recover from.

Human resources planning was not really in scope in the early days of disaster recovery planning but has become vital as employees are as critical to business continuity planning and testing as are technology and infrastructure. Ensuring employee safety and dealing with health epidemics and pandemics are part of a thorough Business Continuity Plan. Emergency evacuation plans are needed and must be tested at least twice a year. Monitoring of weather concerns, potential protest marches, dangerous substance leaks, bomb threats and so on needs to take place continuously, and plans to ensure employee safety while keeping the business running must take place as the events arise. For outbreaks of flu and other ailments, while managing this is easier now that so many employees can work at home, it makes sense to segregate employees into teams – some work in the office, some at home and some at the offsite Business Continuity site – to reduce the spread of infection throughout the organization. If the epidemic or pandemic is more serious such that infected employees are too unwell to work for days or weeks, then a triage plan is needed to allocate critical tasks to those well enough to work and with the skills required to complete the tasks.

These plans need to be tested, at least annually, and are complex to maintain. If your business is one that is subject to regulatory oversight, your regulator(s) may want to review and approve your BCP to make sure it is sufficiently robust.  

Even the most comprehensive and well-tested business continuity plan needs to be re-assessed to consider changes in the business model, regulatory changes, business volumes and client expectations.  For example, which teams are “business-critical”? At one time, only those managing trading functions may have been in scope, but those providing trading decision-support data must now be included as well. Call centre operations are always critical, but these teams may need to be expanded during unusual market conditions and business disruptions to deal with higher numbers of calls from anxious clients. Business continuity plans need to be modified on an ongoing basis to recognize these needs.

All the planning in the world cannot anticipate every possible interruption, or series of interruptions, that may occur. The goal is to prepare employees to be ready to act when adverse conditions arise and plan and act quickly and effectively to maintain critical business operations. One way to instill this thought process to a greater degree is to run theoretical simulations with groups of employees, where a series of catastrophic events is presented to them, from which they must discuss and agree on the steps they would take to accommodate these disruptions. This can increase employees’ ability to respond calmly and effectively during challenging times.

The costs of planning, testing and managing operations during challenging times are very high, but not nearly as high as those resulting from being unable to maintain business operations during these times. Once clients lose faith in an organization’s ability to manage their business effectively, the reputational harm is very difficult to recover from.

Against this backdrop it’s easy to understand why firms may be reluctant to change technology platforms, as each change will necessitate a review of the firm’s Business Continuity Plan. However, if new technology offers better features and reduces the impact on the Business Continuity Plan, the decision becomes much easier to make.

The Accio Analytics ecosystem fits this profile. Functionally, it is a comprehensive, powerful and flexible portfolio analytics platform, able to gather and organize data from multiple sources, check for potential errors, and generate a broad range of analytical metrics.

Also, as Accio Analytics is a SaaS (Software as a Service) offering, it reduces a client’s BCP footprint, as the service is hosted by Microsoft in their Azure cloud services, and all data back-ups, physical and logical redundancy, and every other aspect of ensuring the application is accessible are managed by Microsoft. The client’s responsibility is limited to ensuring that the data feeds into Accio Analytics are maintained, and that the users have access to the application via a web-based connection.