Issue description and impact:
On Dec 5th between 11:39 UTC - 14:58 UTC there was an incident which affected some 360dialog Customer’s use of the WhatsApp API.
Cause of the issue:
Due to an issue with our hosting provider, Google Cloud Platform, all spot pools in Netherlands (eu-west4 region) experienced problems with sporadic rebooting of cluster instances.
Mitigation Steps:
When we detected the issues with the spot nodes, we completely removed all spot instances and switched to a backup pool with dedicated instances.
Prevention of recurrence: After reviewing the case internally have adjusted our procedure to immediately replace malfunctioning spot nodes with dedicated instances nodes.
Prevention of recurrence:
After reviewing the case internally have adjusted our procedure to immediately replace malfunctioning spot nodes with dedicated instances nodes.