[ZRH] Partial Network outage/packet loss
Incident Report for CloudSigma
Postmortem

We are now able to provide you with a root cause analyses in regards to the network issue earlier today in Zurich.

ADDITIONAL INFORMATION: Earlier today we suffered an extensive DDoS attack which was mitigated by our automated DDoS protection system. The high network traffic that was generated filled the nf_conntrack table of some of our hosts as the value of those hosts.

Current Lessons Learned & Action Taken: Our operations team fixed the values of the nf_conntrack table and added it to the configuration script so it does not get changed if another reboot is required. In addition we are currently migrating away from AMD to Intel hosts which will increase the networking throughput capacity of the compute nodes further.

Please accept our sincere apologies for the disruption this situation has caused. We do believe we have identified and mitigated the source of the problems to avoid any repeat.

CloudSigma tries to do everything possible to minimize any inconvenience to our customers. We appreciate your patience and welcome any feedback.

Thank you for your understanding.

Posted Jul 23, 2015 - 15:42 UTC

Resolved
All routes are up and accessible without packet loss. We are investigating why the packet loss occurred. No running cloud servers were impacted outside of external network availability.
Posted Jul 23, 2015 - 09:14 UTC
Identified
We've identified the problem traffic flows and are mitigating them currently, most client servers should be fully accessible again, a few routes remain overloaded however and we continue to work to improve them.
Posted Jul 23, 2015 - 08:50 UTC
Investigating
Some routes are experiencing heavy packet loss. We are investigating with our upstream carriers currently.
Posted Jul 23, 2015 - 08:40 UTC