Meta have applied an exception to our app.
(The exception was incorrectly applied on Fri 27/09/24 but the issue was fixed by Meta's engineering team). We do not expect the issue to re-occur.
360dialog - Root Cause Analysis
Issue description and impact:
On 12th Sept 2024 ~14:45 utc+1, waba management operations were blocked:
Template creation
Template syncing
Number registration
WABA profile management
Migrations
Onboarding
Channel updates
Cause of the issue:
The issue arose due to Meta setting the rate limit for one of our applications too low. This threshold, controlled and enforced at the application level by Meta, led to a disruption in our service. This happened because 360dialog has significantly more active WABAs than any other BSP and Meta has failed to apply proper limits according to our strong growth. While we strive to maintain optimal performance, the constraints set by Meta directly impacted the functionality of our application in this instance.
Mitigation Steps:
Sept 12, 14:45 - Issue started
Sept 13, 07:46 - Issue escalated to Meta
Sept 13, 15:20 - Attempted fix
No response from Meta
Sept 14, 13:32 - Initiated workaround
Sept 14, 15:40 - App unblocked. Issue mitigated temporarily.
—
Sept 16, 06:51 - Issue re-occurs
Sept 16, 13:30 - Meta provides support that results in root cause diagnosis
Sept 16, 14:46 - Issue mitigated temporarily
—
Sept 25, 12:55 - Issue re-occurred
Sept 27, 13:30 - Workaround measure deployed
Sept 27, 15:10 - Issue mitigated temporarily
Sept 27, 17:00 - Meta confirm they applied an exception to our app
–
Sept 30, 14:23 - Issue reoccurs
Sept 30, 16:53 - Meta confirmed the exception was not applied correctly on their side but they fixed it.
Issue Resolved
Prevention of recurrence & suggestions:
We have obtained an adequate rate limit again and are working with Meta on a process to increase our limit accordingly to our growth in the future. Additionally we have completed an internal review with relevant teams & stakeholders and we’re working on adjusting our system architecture to accommodate Meta’s app rate limiting practices.
Separately, Meta is conducting a thorough investigation to understand where the escalation process failed and how they can improve it, particularly during weekends.