From Bhavin Turakhia’s Desk:
I would like to start off with an apology for the unprecedented downtime we are currently experiencing on our OrderBox systems. This post contains some notes on the status, as well as additional information.
This downtime was a result of an unexpected hardware failure on one of our storage systems. Despite the adequate redundancy and failover, this resulted in a data inconsistency that needed to be fixed. There has never been a single point of failure in our architecture, and yet this inexplicable situation resulted in an unanticipated error. The problem resolution consisted of isolating the failed hardware, replacing it, and then repairing the data inconsistency. Our entire team including myself have been working through the day and night to resolve this.
We have been closely working with contributors to postgresql, our hardware manufacturer, and other experts to ensure that the steps that our team has determined are appropriate and will not result in any data inconsistencies. Fortunately all the disaster recovery steps we had in place have ensured that we have adequate backup measures.
We are well on our way to fixing the problem. Infact, at this stage we already have a running system in the backend. However, we would like to run a battery of tests to ensure that the system is stable, consistent and operational. These tests and the associated safety measures have been taking a considerable amount of time. As a result we have had to push our ETA twice.
At this point we believe we should be up and running in another hour (at 8:00 am GMT). I would like to personally apologize for the inconsistency in our ETA due to a scenario that we are faced with for the very first time. This is the first time in 8 years of running our business that we have faced a hardware malfunction of this nature which transcended our redundancy measures. My team and I are profusely sorry about the loss of business that this will have caused all of you. Currently we have our hands full and have been incessantly working towards dotting all i’s and crossing all t’s.
However as soon as we are operational, I personally intend to announce certain programs as an effort to make up for the business loss that you have experienced during this downtime.
Additionally, in the course of resolving this problem, we have also commissioned, on an immediate basis, additional measures to protect from a repeat of such an event. I will be detailing this out in a separate email.
I thank you all for your patience. Many of you have sent in personal notes that demonstrate paramount understanding and great confidence in our services.
I do not wish to spend additional time on further details. We will be continuing to work in the background to restore services quickly.
Once again I apologize for all the inconvenience and assure you of my personal commitment that we will make up for this, as well as have measures in place within 2-3 weeks that will mitigate this type of a situation from ever taking place again.