service issue – RCA (root cause analysis)

We apologize for the disruption to service last night, causing “server not available” errors to be returned for a proportion of the traffic processed through the Bango system. We’ve been scrutinizing the processing of system traffic throughout the day and everything is operating normally and stably.

The system actively load balances traffic across a range of servers to maximize responsiveness and deal with the variations in traffic volumes. For a period of time yesterday requests directed to one server range backed up, generating the server unavailable errors that some of you saw. While this did not affect all traffic it will have been experienced by many of you.

As soon as the problem had been diagnosed yesterday, we manually deployed additional capacity at our disposal to process the traffic that was becoming backed up until the system stabilized again. Timings for the initial problems, diagnosis and resolution are as described in earlier blog entries below. The additional capacity used yesterday remains in place with more being added, so in the unlikely event of any future load balancing issue, traffic back-up will not become a problem.

Our apologies once more and please get in touch if you’d like to discuss further.

