How to handle your system during AWS S3 outage

Amazon S3 outage shook many company's data systems and they learned the hard way that there is a possibility of an outage. But you can still keep your apps running even though the Amazon cloud isn’t running.

During Amazon S3 outage on the east coast, last year pretty much took a large number of Java cloud apps also with it. Nike’s website had slowed down but on the other, it had no impact of Netflix. To achieve the resilience of handling your Java cloud apps when the Amazon outage occur you will need a great amount of upfront investment for automating recovery.

The following are the ways of protecting your apps or data from Amazon S3 outage:-

1. First figure out the infrastructure your applications are running on and then identify the applications that are dependent. Keystone Applications are used by a variety of other apps. The loss of the metadata management system killed Amazon S3 in the entire region. Therefore you need to check your application that how much reliability and redundancy they contain so that in the case of failure it shouldn’t turn into a large problem.

2. The best strategy is to constantly break things with the help of Chaos Monkey. It will constantly check the production system, services offline, cloud availability zones, databases to evaluate the performance impact on the other systems. This procedure will help you to identify the breaking points and the chains of failure that can damage the system. It’s also easier to test it out in the cloud by spinning an accurate duplicate of the entire infrastructure. You can also create models of apps running on expensive hardware using the method such as service virtualization to achieve the same objective. It’s important to check if the application dependent on the third party services for data.

3. Breaking application infrastructure process can help you understand how the application rollover and recover in case of any failure. Copying critical apps to hot backups in other availability zones or cloud providers can be a good idea. But for this, you may have to invest in data storage, networking, and unused capacity. The apps that are less critical will simply restart from another instance from the same cloud provider. There are tools for automating and provisioning of new instances that can dramatically reduce the recovery time. By knowing how to use these tools can make it much easier to meet the unexpected demand.

4. When cloud services outage’s you can provide mechanisms to provide users with some level of functionality. It is easier to on a native mobile app that supports its own data. But very few developers take advantage of the caching mechanism of modern browsers.

But it is difficult for mobile apps that are dependent on sporadic networks. Catching will increase the performance and reliability of the desktop applications as well. For this to be possible the developers need to make strategies for managing data locally and also for automating synchronizing in the background.

Labels

Thursday, 6 July 2017