360dialog - Webhook events not arriving for On-prem numbers – Incident details

All systems operational

Webhook events not arriving for On-prem numbers

Resolved
Degraded performance
Started 6 months ago

Affected

Meta Infrastructure

Degraded performance from 11:00 PM to 2:30 PM, Operational from 11:00 PM to 12:00 AM

WhatsApp On-Premise API (Message Delivery)

Degraded performance from 11:00 PM to 2:30 PM, Operational from 11:00 PM to 12:00 AM

Updates
  • Update
    Update
    The webhook functionality for On-Premise configurations has been fully restored.
  • Resolved
    Resolved

    Post-Mortem Incident Summary

    Date and Time of Incident:
    ● Start: 27/06/2024, 11:15 UTC
    ● End: 27/06/2024, 15:30 UTC

    Description of Incident:
    360dialog experienced problems delivering and receiving messages to and from Meta servers. This was due to an outage in the callbacker component handling the messaging webhooks for On-Premise WhatsApp Business API numbers. The outage occurred during a maintenance procedure.

    After the cause of the problem was identified, the callbacker component was migrated to an environment with additional resources to ensure messaging continuity.

    General Remarks:

    The On-premise WABA solution is complex and presents numerous challenges in terms of management, and maintenance. A thorough investigation that includes Meta's involvement is necessary to uncover the underlying issues accurately for an in-depth RCA analysis.

    Given this context, 360dialog strongly recommends migrating all accounts to the Cloud API. The Cloud API offers several advantages, including improved reliability and easier scalability. Transitioning to the Cloud API ensures that your operations remain smooth and well-supported, while also aligning with the latest technological advancements and best practices in the industry.

    Robert Konopka,

    360dialog GmbH

  • Investigating
    Investigating
    We are aware of an issue that is impacting webhook events delivering to On-prem numbers. Our engineering team is currently investigating it. We do not currently have a resolution ETA but we'll update again as soon as possible.
  • Monitoring
    Monitoring
    The majority of functionality has been restored to normal. Webhook events have started arriving at the callback URL's.