Cloud Computing - Channel Insider
Empowering the next generation Channel
 

Sponsored Links
  • Try Windows Azure free for 90 days

  • Introducing the world's first family of systems with integrated expertise

  • FREE Securing Smartphones & Tablets for Dummies Book from Sophos
  • 5 New Technologies That Will Change Enterprise ITAdvertisement
  • Build an IT Infrastructure That Delivers the Future

  •  

    Google Apps Cloud Outage Lessons Learned

    in Cloud Computing



    Article Rating:starstarstarstarstar / 1
    Article Views: 5431

    Google's most recent cloud outage caused chaos among Java-based applications on the Google App Engine. It's the most recent cloud outage to grab headlines, and not the first one for Google.

    Rate This Article:
    Add This Article To:

    Google has confirmed that the Google App Engine experienced a Java App Engine outage on the evening of July 14, 2011, causing chaos amongst various Java-based applications on Google App Engine for about 4 and a half hours.

    The outage began at 7 pm PT, at which point applications affected by the downtime experienced high latency and error rates. According to Google, approximately 1.9 percent of App Engine traffic was affected at peak. On the Google Developer Blog, the App Engine team noted that the outage began not too long after a scheduled maintenance period, but Google assured developers that the scheduled maintenance and unexpected outage were unrelated.

    The service outage “gradually increased in magnitude over time” before Google engineers were dispatched to deal with the problem. It took Google’s engineers 2.5 hours (9:30 pm PT) to get started on making repairs to the Java App Engine, at first with the intention of reducing the impact of the outage.

    The Java element of Google App Engine wasn’t fully back online until 11:30 pm PT, at which point all Java App Engine applications had been restored to normal operations. Google apologized for the outage and promised to look at its procedures to improve performance in the future.

    “Overall reliability, quick return to service, and fast, accurate communication to our customers are some of the core goals of Google App Engine's service offering. While we restored service relatively quickly, it's clear to us that we fell short in prompt communication of status updates,” posted Wesley Chun, a member of the Google App Engine team, to the blog.

    Currently, the team is still investigating the causes of the outage, but the blog post noted that it has a preliminary understanding of what happened to cause the Java outage. More information is promised one the investigation has been completed.

    This isn’t the first time that Google’s platform for developing cloud applications in its managed data centers has experienced an outage (an unfortunate reality in a business environment where cloud computing service providers are dodging accusations of unreliability).

    Here’s a quick (and incomplete) history lesson in Google App Engine’s failures in recent years:

    On February 24, 2010, Google App Engine applications experienced degraded operational states for varying amounts of time (from 20 minutes to two hours) between 7:48 am and 10:09 pm PT. The cause? A power failure in the primary data center that engineers said was an issue that had been planned for but not everyone on staff was aware of the processes.

    On July 2, 2009, the outage that occurred between 6:45 am PT and 12:35 pm PT caused varying degrees of chaos with Google App Engine applications – from partial to complete applications outages. The cause of the outage was a bug on the GFS Master server that Google stated was caused by another client in the data center. An improperly formed file handle hadn’t been sanitized by the systems on the server side and caused a stack overflow when it was processed. Google later discovered the bug had been live for at least a year.

    On June 17, 2008, Google App Engine was hit by a datastore outage at 6:30 am PT. According to Google, only a small number of requests were returned as errors, but the number of errors continued to increase throughout the morning until engineers isolated the incident at 1:40 pm PT. Problem solved, and another bug (this one affecting datastore servers) was found and dealt with.

    In other areas of Google cloud computing, the company has often had to deal with surly customers complaining about Gmail or Google Apps outages, but it’s hardly a new tale in the realm of cloud. Google certainly isn’t the only cloud computing service provider that experiences its share of outages, and fingers can easily pointed towards unreliability in a variety of directions.

     

     




    comments dic


     
     
    >>> More Cloud Computing Articles          >>> More By Chris Talbot
     


     



    channel chatter


    HTML PLAIN TEXT

    Keep on top of news for VARs and Resellers with CI's Weekly Newsletter and Alerts.


    [ci] feeds
    XML
    Add Channel News, Product Reviews, Trends and Analysis to your RSS newsreader or My Yahoo!


     


    CHANNEL SPONSORED RESOURCE CENTER
     
     
     
    Start the New Year with business intelligence—it’s a smart move
    Join us on February 1 for an encore rebroadcast at either 5 am or 12 noon EST and discover how business intelligence (BI) supports companies in uncertain business and economic climates. Get expert advice on how to create a strategy that fits your organization's needs and budget and see how quickly it can pay for itself.
    Click Here
     
    Security and Availability Essentials for Running Your Business in the Cloud
    Are you moving to the cloud? Find out what every IT professional should know about security and availability before moving to the cloud. Hear what a security provider’s own CSO has to say.
    Watch Video
    A new algorithm automatically identifies relationships between variables to help reduce researcher prejudice.
    Click HereAdvertisement