QStash US Region: Schedule Degradation

Incident Report for Upstash Status

Postmortem

Incident Postmortem: Scheduled Jobs Inconsistency in US Region

On May 1, 2026, we experienced an incident affecting a subset of schedules in the US region following a recent infrastructure update.

The issue has been resolved, and all affected schedules have been restored.

Summary

As part of an ongoing scalability improvement, we recently updated scheduling infrastructure in the US region to a new architecture. During this transition, a legacy execution path remained in the codebase as a fallback mechanism.

On May 1, a bug caused the system to revert to the legacy path. This resulted in inconsistent state between the old and new scheduling systems for some users.

Impact

The incident affected a limited number of users in the US region.

Most users were not affected, and the vast majority of schedules continued operating normally throughout the incident.

Users who did not update schedules during the transition window continued operating normally throughout the incident.

A subset of users who created, edited, paused, or deleted schedules between April 24 and May 1 may have experienced one or more of the following:

  • Schedule updates not being reflected
  • Paused schedules becoming active again
  • Deleted schedules reappearing
  • Newly created schedules not executing as expected

Schedules created after the transition may have stopped executing briefly before recovery.

Root Cause

During the transition, the new scheduling infrastructure became the source of truth for schedule state.

Due to a bug, the system unexpectedly reverted traffic to the legacy scheduling path, which began accepting updates independently from the new system.

This caused the two systems to diverge and resulted in inconsistent schedule state for affected users.

Resolution

After identifying the issue, we:

  1. Restored the new scheduling system as the active source of truth
  2. Reconciled data between the legacy and new systems
  3. Updated missing schedule changes back into the new infrastructure
  4. Performed conflict resolution to preserve user data and schedule continuity

In some cases, schedules that had previously been paused or deleted were restored to avoid permanent data loss.

Preventive Measures

We are implementing several changes to prevent similar incidents:

  • Removing obsolete fallback execution paths after transitions complete
  • Adding automated safeguards and alerts for unexpected system fallback behavior
  • Improving consistency validation between systems
  • Expanding rollback and reconciliation testing

We apologize for the disruption and appreciate everyone’s patience while we resolved the issue.

Posted May 06, 2026 - 08:56 UTC

Resolved

This incident has been resolved.
Posted May 06, 2026 - 08:05 UTC

Update

Main schedule functionality is back to normal. We are currently checking if previously created schedules are delivered as expected before marking the incident as resolved.
Posted May 05, 2026 - 21:31 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted May 05, 2026 - 15:44 UTC

Update

We are continuing to work on a fix for this issue.
Posted May 05, 2026 - 14:29 UTC

Identified

We are currently experiencing issues in the US region.

- Duplicate Deliveries: During this period, some scheduled jobs may be executed twice.
- Schedule Disruption: Schedules created between April 24, 2026 and May 2, 2026 are currently not running.

Our team is actively working on a fix. Once the migration is complete, affected schedules will resume normal operation.

We will provide updates as progress continues.
Posted May 05, 2026 - 14:24 UTC
This incident affected: QStash (US-EAST-1).