Website outage
Incident Report for Bottlenose
Postmortem

First of all I would like to personally apologize for the outage this evening. I understand how important it is to maintain a highly available service for your businesses.

We made a recent change to the orders dashboard designed to make it easier to navigate. When we deployed the change we realized there was a bug in the code. We executed a standard rollback. However, during the rollback the navigation caches did not get primed properly. This caused the database servers to become overloaded. Once this condition occurred it became difficult to get out from under to restore service.

We have had incidents like this before. They have been infrequent. We have come up with a way to mitigate this situation faster if it should occur again.

Sincerely,

William Carr

Posted Apr 10, 2020 - 18:46 EDT

Resolved
This incident has been resolved.
Posted Apr 10, 2020 - 18:19 EDT
Update
We are continuing to monitor for any further issues.
Posted Apr 10, 2020 - 18:15 EDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Apr 10, 2020 - 18:15 EDT
Identified
During a routine deployment, there was an issue with a cache priming routine which loads the navigation for the sites. This caused an overload of the database and a race condition which would not let the site load properly. We are scrambling as hard as we can to restore service.
Posted Apr 10, 2020 - 17:51 EDT
This incident affected: eCommerce (Homepage, Site Search, API, Admin Dashboard) and Management Portal, Site Search.