Product: QStash
Impact: Degraded performance, delayed processing of events, and duplicate event deliveries for some customers
QStash experienced an incident marked by a sudden and extreme load on our servers. This caused a degradation in performance, with extremely high latency for event processing for all users. We also noticed some of the events being delivered multiple times to some of the users. To mitigate the high load, we have increased the capacity as our initial response while investigation proceeds. Eventually, fixes for the issues are confirmed with an issue reproducer and deployed to production.
In a certain type of usage, failure handling of failureFunction can cause recursive calls which causes a leak in the queue of the tasks, causing a severe load on the QStash servers. This also triggered an edge case which caused some of the events to be delivered multiple times.
Two hotfixes to the QStash processes are deployed
High latency of event processing is observed for all users. Some users received duplicate event deliveries. No events were lost, and all were delivered as part of our "at least once delivery" guarantee. Customers do not need to take any corrective action, as workflows have returned to normal and preventive fixes are deployed.