A little bit of a technology post today: Backups including redundant solutions are increasingly important in organisations as we seek to keep our IT services up and running for our own internal users and also for external users or clients/customers. This might be taking backup copies of data to tapes, having a redundant firewall or internet connection or having a cloud-based service available to replicate on-premise services in the event of a disaster. My concern however is that we can feel better for having these solutions in place happy in the knowledge that we are better off and more protected than if we don’t have them. The issue is that this sense of additional protection is false. Just by having a backup solution of one type of another doesn’t mean that it will work when things go wrong. We also need to be cognisant of the fact that when things do go wrong the result is often one of stress and urgency as we seek to restore services while under pressure from users, business leaders and process owners among others. We need to adopt a scientific mindset and test the backup solution to make sure it works as intended. It is much better to test our backup solutions to a timetabled plan than having the first test of a solution being a full blown real life incident where failure of the system could result in difficulties for the organisation. We also need to bear in mind that just because it works on the day the solution was put in place, or even works today doesn’t mean it will work in a weeks or months’ time, or in a years’ time when we truly need it. We need to have a robust programme of testing our backup solutions to ensure that they work, that we are aware of how they work and any implications and that those who need to use them are comfortable with their use. Only by doing this can we be more comfortable in the knowledge that, when something does go wrong, we have a solution in place and are ready to put it to use.
The perfect example of the above, for me, was a recent test of our own backup solutions which included a service which indicated that recovery to a redundant system would be complete in 4 hours plus would be based on data backup taken regularly. Upon testing the solution we found that the 4 hours recovery period was exceeded due to issues with the backup and the data was 3 days old. We also found that there were implications for other systems when the test failure occurred.
It might be tempting to look on the above in a wholly negative fashion focussing on why the solution didn’t work however I want to avoid this and intend to focus more on the positive side of things. We now at least know the solution didn’t perform as anticipated, we know more about the implications of the tested failure area, we are basically now more knowledgeable than we were before the test. We will therefore now work internally and with the backup solution vendor to arrive at solutions that better meet our needs and are hopefully more robust and reliable.
The moral of the story; Nothing works until you test it to confirm so test your backup provision and test it often.