Mail platform outage

Incident Report for Bottlenose

Postmortem

We would like to apologize for the interruption of email services caused by this incident.

At approximately 4:00 PM Eastern Time instances responsible for hosting all email related services became unavailable. This caused major interruptions of SMTP (outbound email service), POP and IMAP (inbound email service) and email marketing services.

Bottlenose staff, working with Amazon Web Services (AWS) technical support team, determined the cause of the problem was degraded hardware on the hosts serving the email service instances. Once the instances were redeployed on different hardware the email services started to become available. However, the ssh service was not running requiring further maintenance on the instances. Just before 7:00 PM Eastern Time all service was restored.

We could have reduced outage period significantly had our monitoring processes performed better. Our monitoring system detected the problem immediately but failed to get alerts to the on-call engineer. We will look at our notification and alerting system to see what can be improved going forward.

Posted Aug 30, 2018 - 22:44 EDT

Resolved

This incident has been resolved.
Posted Aug 30, 2018 - 22:29 EDT

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Aug 30, 2018 - 18:57 EDT

Update

We are continuing to work on a fix for this issue.
Posted Aug 30, 2018 - 18:56 EDT

Identified

Amazon Web Services reports a hardware failure with the hosts running our mail service instances. We have worked with the AWS support team to get the instances running on new hardware. We anticipate a resolution to this issue shortly.
Posted Aug 30, 2018 - 18:55 EDT

Update

All of of the instances the mail services run on have become unresponsive. We are opening a case with our service provider, Amazon Web Services.
Posted Aug 30, 2018 - 18:25 EDT

Investigating

We have seen a major outrage with our email service platform. We have had numerous reports from customers. We are seeing problems in SMTP (outbound email), POP and IMAP (inbound email) as well as email marketing and transactional email services.

We do not know the cause of the issue yet. We are investigating as a top priority. We will update this incident as soon as we have more information.
Posted Aug 30, 2018 - 18:18 EDT
This incident affected: Email (SMTP Email, Transactional Emails, Email Marketing, POP3/IMAP Email).