Increased error rate
Incident Report for Happeo
Postmortem

Start time: 2019-11-04 09:59

Resolution time:: 2019-10-09 10:23

Outage time: 23 minutes

Problem:

Happeo was inaccessible from web and mobile due to the main authentication database being down.

Affected:

All customers, all users.

Root cause:

Google CloudSQL crashed at 08:46:47 and started a recovery at 08:53:09. With Cloud SQL being down our authentication did not work and therefore all requests were stopped in the API Gateway.

Posted Nov 04, 2019 - 12:23 UTC

Resolved
We identified an increased error rate in our application affecting some parts of Happeo, including login, notifications and the admin panel functions. The error rate increased from 0 to 8,3% on average and peaked at 19,51%. The increased error rates lasted for 24 minutes from 09:59 to 10:23.
Posted Nov 04, 2019 - 11:54 UTC