This morning at 10:46AM, the packet-forwarding engine failed on a distribution switch that services a large part of datacentre network.
For some inexplicable reason yet to be understood, this failure also affected the secondary backup distribution switch which exists solely to mitigate issues in the event of a failure with the primary distribution switch on that particular network segment.
As a result of this a number of dedicated servers, VPS’s and retail hosting services experienced degraded connectivity resulting in either a slow down or unavailability of service.
Once the cause of the issue was identified by datacentre network engineers both switches were restarted and the packet-forwarding engine resumed operation, restoring network-connectivity to affected services by 11:18AM.
TIMELINE OF EVENTS:
10:46 – Monitoring alerted technical operations team to a network issue.
10:47 - Datacentre network engineers commenced investigating the issue.
10:52 – Cause of issue identified, troubleshooting process commences.
10:59 - Restart switches to restore connectivity.
11:00 – Datacentre engineers ready in case hardware-replacement required.
11:03 – The reboot process takes around 15 minutes.
11:18 – Restart of both primary and backup switches completed. Issue resolved.
Datacentre engineers have advised they are confident the issue is now resolved and are working closely with the hardware vendor so that further incidents can be avoided.
Of particular interest to us is the cause of the secondary backup distribution switch failure. Datacentre engineers and the hardware vendor are working closely to ensure they understand the fault and put in place measures that will ensure the secondary backup unit does what it is there to do.
We would like to sincerely apologise for the inconvenience this incident has caused.
If you have any further concerns, please do not hesitate to contact our support team.