Business Cloud News
Azure's outages raises questions about the suitability of public cloud platforms for mission-critical applications

Azure’s outages raises questions about the suitability of public cloud platforms for mission-critical applications

Microsoft Azure has been hit by yet another global outage, this time impacting even its own online Office services and websites. The outage raises serious questions about whether these kinds of public cloud platforms are ready for mission-critical workloads.

Early this morning or late yesterday, depending on where you’re based, Azure experienced connectivity issues to its services including storage, websites and visual studio online, with some users experiencing other issues accessing Sharepoint online and other Office apps. Xbox Live seems to have been hit with similar outages, too.

The outage has seemingly impacted every region except Australia, its latest availability zone addition, and has even briefly affected some of Microsoft’s own websites.

The company said it’s still in the process of investigating and resolving the issue. Most of the services are back online except for Application Insights for app monitoring, which is down globally. In North and Western Europe, customers are still experiencing issues with their virtual machines.

This is the second large scale Azure outage in less than three months. In August the company’s cloud services experienced significant outages in the US, Japan and Brazil simultaneously, which went on for nearly a week in some cases.

What are enterprises to make of this? Recently published research from Infosys suggests about four in five enterprises plan to move their mission-critical workloads into the cloud, which in light of recent events, is a worrying sign.

It has been about 12 hours since the outages began, and if you’re hosting an application in a virtual machine in Europe, you’re likely still experiencing severe latencies or an outage altogether, despite the fact that the Azure status page currently leads with “All good! Everything is running great”. Everything is not running great.

Most SLA’s for Azure allow for just over 50 minutes of downtime annually, which in many cases has already been eclipsed multiple times depending on which geography a user’s workload is hosted in. Microsoft hasn’t yet commented on how it will compensate impacted users, but if one were to go by its standard SLAs and previous outages it’s likely service credits are on the table.

Microsoft certainly isn’t alone in experiencing global outages of its cloud services – AWS users have had their fair share. With so many active regions one would think the level of redundancy built into Azure would be more robust. But there will come a time when the service credit approach, which is tantamount to saying “we’re sorry for the crap service – here’s some more crap service,” simply won’t placate customers.

At a time when so much commercial activity relies so heavily on online channels, when an increasing number of internal applications – for sales, marketing, HR, and productivity – are deployed via the cloud, and when so many of these applications still include complex service chains, these outages raise some serious questions about the limits of public cloud, in its seemingly infinite scalability and elasticity, for mission-critical applications.

  • David November 19, 2014 at 3:10 pm

    It’s truly dumbfounding that this scale of outage is even possible in a global infrastructure spanning numerous data centers. Microsoft have something very wrong in their procedures and/or infrastructure design. Bear in mind they sell high availability server configurations in Azure that are apparently rendered pointless in this kind of scenario.

    • jimbo0117 November 19, 2014 at 5:49 pm

      Why is it that when AWS or Google have outages it’s attributed to “this stuff is hard, and stuff happens”. But whenever it has to do with MS it’s always attributed to something that MS has done wrong?

      • CAT November 20, 2014 at 11:23 am

        Because MS ads claim “We can do your business better” ;-)

  • Chris Conder November 20, 2014 at 9:32 am

    Everyone knows the internet is holding together with wet string.

  • vicky November 24, 2014 at 4:41 am

    perhaps the architecture of the cloud platform itself is very complicated and when there are technical issues it takes lots of time to sort out..but i think this is irrespective of MS,GOOGLE or AWS.

  • Post a comment

    Threaded commenting powered by interconnect/it code.