Files
medusa-store/packages/modules
Sebastian Rindom bad0858348 fix: prevent jobId collisions on workflow step retries (#13786)
## Summary

**What** — What changes are introduced in this PR?

This PR fixes a bug where async workflow steps with retry intervals would get stuck after the first retry attempt due to Bull queue jobId collisions preventing retry jobs from executing.

**Why** — Why are these changes relevant or necessary?  

Workflows using async steps with retry configurations (e.g., `retryInterval: 1`, `maxRetries: 5`) would fail once, schedule a retry, but the retry job would never execute, causing workflows to hang indefinitely.

**How** — How have these changes been implemented?

**Root Cause:** Bull queue was rejecting retry jobs because they had identical jobIds to the async execution jobs that already completed. Both used the format: `retry:workflow:transaction:step_id:attempts`.

**Solution:** Modified `getJobId()` in `workflow-orchestrator-storage.ts` to append a `:retry` suffix when `interval > 0`, creating unique jobIds:
- Async execution (interval=0): `retry:...:step_id:1`
- Retry scheduling (interval>0): `retry:...:step_id:1:retry`

Updated methods: `getJobId()`, `scheduleRetry()`, `removeJob()`, and `clearRetry()` to pass and handle the interval parameter.

**Testing** — How have these changes been tested, or how can the reviewer test the feature?

Added integration test `retry-interval.spec.ts` that verifies:
1. Step with `retryInterval: 1` and `maxRetries: 3` executes 3 times
2. Retry intervals are approximately 1 second between attempts
3. Workflow completes successfully after retries
4. Uses proper async workflow completion pattern with `subscribe()` and `onFinish` event

---

## Examples

```ts
// Example workflow step that would previously get stuck
export const testRetryStep = createStep(
  {
    name: "test-retry-step",
    async: true,
    retryInterval: 1, // 1 second retry interval
    maxRetries: 3,
  },
  async (input: any) => {
    // Simulate failure on first 2 attempts
    if (attempts < 3) {
      throw new Error("Temporary failure - will retry")
    }
    return { success: true }
  }
)

// Before fix: Step would fail once, schedule retry, but retry job never fired (jobId collision)
// After fix: Step properly retries up to 3 times with 1-second intervals
```

---

## Checklist

Please ensure the following before requesting a review:

- [ ] I have added a **changeset** for this PR
    - Every non-breaking change should be marked as a **patch**
    - To add a changeset, run `yarn changeset` and follow the prompts
- [ ] The changes are covered by relevant **tests**
- [ ] I have verified the code works as intended locally
- [ ] I have linked the related issue(s) if applicable

---

## Additional Context
-

Co-authored-by: Carlos R. L. Rodrigues <37986729+carlos-r-l-rodrigues@users.noreply.github.com>
2025-10-21 18:27:21 +00:00
..
2025-10-21 09:24:59 +02:00
2025-10-21 09:24:59 +02:00
2025-10-21 09:24:59 +02:00
2025-10-21 09:24:59 +02:00
2025-10-21 09:24:59 +02:00
2025-10-21 09:24:59 +02:00
2025-10-21 09:24:59 +02:00
2025-10-21 09:24:59 +02:00
2025-10-21 09:24:59 +02:00
2025-10-21 09:24:59 +02:00
2025-10-21 09:24:59 +02:00
2025-10-21 09:24:59 +02:00
2025-10-21 09:24:59 +02:00
2025-10-21 09:24:59 +02:00