Switzerland, Zurich Status Page
Switzerland, Geneva Status Page
Germany, Frankfurt Status Page
Germany, Dusseldorf Status Page
Philippines, Manila Status Page
Philippines, Clark Status Page
Philippines, Manila-2 Status Page
Saudi Arabia, Riyadh Status Page
United Kingdom, Cardiff Status Page
United Kingdom, London Status Page
United States, Honolulu Status Page
United States, San Jose Status Page
We have performed a root cause analysis regarding the hosts crashes yesterday.
Findings identified thus far:
The issue was traced to legacy storage nodes that had deduplication enabled which were identified last year as unstable when drive deletion is enabled. Those storage nodes were removed from allocation at that time but are still in production in relation to customer legacy drives still stored on them.
The instability identified last year was traced to a bug related to iSCSI connections that can cause a full compute host crash when storage nodes are slow to respond during drive delete operations involving deduplicated data.
A recent patch of our agent software running on the storage nodes accidentally re-enabled deletion on these storage nodes. Unfortunately drives from these storage nodes were mounted across a significant proportion of the compute nodes in our Zurich cloud.
In short a human error resulted in a rollback of configuration settings on a small number of storage nodes that created significant instability issues in the cloud.
Current Lessons Learned & Action Taken: Going forward we have modified our procedures to ensure that equipment which was scheduled for decommission to NOT be updated as it might cause unpredictable behaviours unless it presents a security risk to client computing. We are also accelerating the decommission procedures in relation to the limited number of storage nodes with none problems in relation to deduplication.
As this outage was due directly to human error we do not expect a repeat of the instability as a result of this issue.
Please accept our sincere apologies for the caused inconveniences.
While it is important for CloudSigma to perform maintenance such as this to ensure the quality of our services, we try to do everything possible to minimize any inconvenience to our customers. We appreciate your patience and welcome any feedback.