We received alarms about fl1ovz01 becoming unresponsive and were unable to access the server. We had the data center power cycle the server since both the server and the remote management network were unavailable. At this time all of the VPSs are still coming online but you can manually power on your VPS via SolusVM if you do not wish to wait for the automated process to power on your VPS. We are still looking into the cause of this and will provide an RFO once it is determined. Only fl1ovz01 was impacted by this outage and all other nodes in Tampa remained online.

It appears the outage was caused by a DDOS attack against one of our clients on the node. Unfortunately our network monitoring stopped working at the time so we are unsure of the size of the attack but the timing of the outage coincides with the same time a VPS was suspended by our automated system for exceeding our Packets Per Second limit. It appears that our primary NIC failed for some reason which made the backup NIC the active one and the backup switch (100Mbps) was unable to handle the size of the attack which resulted in the outage. A reboot of the server forced it back onto the primary NIC which is on our primary router (1Gbps) and as soon as the server came back online the client received another DDOS attack but the IP was nullrouted before causing any downtime.

Outage started: Fri Dec 06 2013 18:08:48 GMT-5.0
Outage resolved: Fri Dec 06 2013 18:39:15 GMT-5.0
Total downtime for the node: ~30 minutes

Dec  6 18:07:25 fl1ovz01 kernel: [6166168.098461] igb 0000:05:00.0: eth0: Reset adapter
Dec  6 18:07:25 fl1ovz01 kernel: [6166168.166158] bonding: bond0: link status definitely down for interface eth0, disabling it
Dec  6 18:07:25 fl1ovz01 kernel: [6166168.166167] bonding: bond0: making interface eth1 the new active one.
Dec  6 18:07:26 fl1ovz01 kernel: [6166169.774455] CT: 1340: stopped

As of 20:47PM EST, we still have a handful of VPSs offline that require manual intervention along with some additional DDOS attacks against other clients. We will continue to monitor the network as we work on these VPSs and hope to have an e-mail out to clients once everything is resolved and stable. We apologize for this outage which was compounded by a malfunctioning remote management network which has also been fixed at this time and we will be putting proper tests and monitoring in place to ensure future accessibility when we need it. Thank you.

-The Secure Dragon Staff


Friday, December 6, 2013





« Back