21 Commits

Author SHA1 Message Date
Adrien de Peretti
1ea932a56f fix(): Identify step that should require a save on checkpoint (#14037)
* fix(): Identify step that force save checkpoint

* Create sour-peas-provide.md
2025-11-17 12:47:57 +01:00
Sebastian Rindom
bad0858348 fix: prevent jobId collisions on workflow step retries (#13786)
## Summary

**What** — What changes are introduced in this PR?

This PR fixes a bug where async workflow steps with retry intervals would get stuck after the first retry attempt due to Bull queue jobId collisions preventing retry jobs from executing.

**Why** — Why are these changes relevant or necessary?  

Workflows using async steps with retry configurations (e.g., `retryInterval: 1`, `maxRetries: 5`) would fail once, schedule a retry, but the retry job would never execute, causing workflows to hang indefinitely.

**How** — How have these changes been implemented?

**Root Cause:** Bull queue was rejecting retry jobs because they had identical jobIds to the async execution jobs that already completed. Both used the format: `retry:workflow:transaction:step_id:attempts`.

**Solution:** Modified `getJobId()` in `workflow-orchestrator-storage.ts` to append a `:retry` suffix when `interval > 0`, creating unique jobIds:
- Async execution (interval=0): `retry:...:step_id:1`
- Retry scheduling (interval>0): `retry:...:step_id:1:retry`

Updated methods: `getJobId()`, `scheduleRetry()`, `removeJob()`, and `clearRetry()` to pass and handle the interval parameter.

**Testing** — How have these changes been tested, or how can the reviewer test the feature?

Added integration test `retry-interval.spec.ts` that verifies:
1. Step with `retryInterval: 1` and `maxRetries: 3` executes 3 times
2. Retry intervals are approximately 1 second between attempts
3. Workflow completes successfully after retries
4. Uses proper async workflow completion pattern with `subscribe()` and `onFinish` event

---

## Examples

```ts
// Example workflow step that would previously get stuck
export const testRetryStep = createStep(
  {
    name: "test-retry-step",
    async: true,
    retryInterval: 1, // 1 second retry interval
    maxRetries: 3,
  },
  async (input: any) => {
    // Simulate failure on first 2 attempts
    if (attempts < 3) {
      throw new Error("Temporary failure - will retry")
    }
    return { success: true }
  }
)

// Before fix: Step would fail once, schedule retry, but retry job never fired (jobId collision)
// After fix: Step properly retries up to 3 times with 1-second intervals
```

---

## Checklist

Please ensure the following before requesting a review:

- [ ] I have added a **changeset** for this PR
    - Every non-breaking change should be marked as a **patch**
    - To add a changeset, run `yarn changeset` and follow the prompts
- [ ] The changes are covered by relevant **tests**
- [ ] I have verified the code works as intended locally
- [ ] I have linked the related issue(s) if applicable

---

## Additional Context
-

Co-authored-by: Carlos R. L. Rodrigues <37986729+carlos-r-l-rodrigues@users.noreply.github.com>
2025-10-21 18:27:21 +00:00
Adrien de Peretti
d7692100e7 chore(orchestration): add support for autoRetry, maxAwaitingRetries, retryStep (#13391)
RESOLVES CORE-1163
RESOLVES CORE-1164

**What**

### Add support for non auto retryable steps.

When marking a step with `maxRetries`, when it will fail it will be marked as temporary failure and then retry itself automatically. Thats the default behaviour, if you now add `autoRetry: false`, when the step will fail it will be marked as temporary failure but not retry automatically. you can now call the workflow engine run to resume the workflow from the failing step to be retried.

### Add support for `maxAwaitingRetries`

When setting `retyIntervalAwaiting` a step that is awaiting will be retried after the specified interval without maximun retry. Now you can set `maxAwaitingRetries` to force a maximum awaiting retry number

### Add support to manually retry an awaiting step

In some scenario, either a machine dies while a step is executing or a step is taking longer than expected, you can now call `retryStep` on the workflow engine to force a retry of the step that is supposedly stucked
2025-09-08 12:46:30 +00:00
Carlos R. L. Rodrigues
9412669e65 chore: idempotent cart operations (#13236)
* chore(core-flows): idempotent cart operations

* changeset

* add tests

* revert

* revert route

* promo test

* skip bugs

* fix test

* tests

* avoid workflow name conflict

* prevent nested workflow from being deleted until the top level parent finishes

* remove unused setTimeout

* update changeset

* rm comments

---------

Co-authored-by: adrien2p <adrien.deperetti@gmail.com>
2025-08-28 15:04:00 +02:00
Adrien de Peretti
1a78476608 fix(workflow-sdk): Async/nested runAsStep propagation (#12675)
FIXES CLO-524

**What**
Add hidden stepDefinition object as part of the step argument and ensure the runAsStep handlers rely on the latest definition when config is being used on the returned step in order to ensure async configuration propagation and nested configuration
2025-06-10 07:23:12 +00:00
Adrien de Peretti
399dddc0c7 test(workflow-engine-redist): Update tests to ensure behaviour validation properly (#12514) 2025-05-16 15:03:38 +02:00
Adrien de Peretti
7fdbf2a965 fix(workflows-sdk): Miss match context usage within run as step (#12449)
**What**

Currently, runAsStep keep reference of the workflow context that is being run as step, except that the step is composed for the current workflow composition and not the workflow being run as a step. Therefore, the context are currently miss matched leading to wrong configuration being used in case of async workflows.

**BUG**
This fix allow the runAsStep to use the current composition context to configure the step for the sub workflow to be run

**BUG BREAKING**
fix the step config wrongly used to wrap async step handlers. Now steps configured async through .config that returns a new step response will indeed marked itself as success without the need for background execution or calling setStepSuccess (as it was expected originally)

**FEATURE**
This pr also add support for cancelling running transaction, the transaction will be marked as being cancelled, once the current step finished, it will cancel the transaction to start compensating all previous steps including itself

Co-authored-by: Carlos R. L. Rodrigues <37986729+carlos-r-l-rodrigues@users.noreply.github.com>
2025-05-14 13:28:16 +00:00
Adrien de Peretti
80007f3afd feat(workflows-*): Allow to re run non idempotent but stored workflow with the same transaction id if considered done (#12362) 2025-05-06 17:17:49 +02:00
Adrien de Peretti
8618e6ee38 fix(): Properly handle workflow as step now that events are fixed entirely (#12196)
**What**
Now that all events management are fixed in the workflows life cycle, the run as step needs to leverage the workflow engine if present (which should always be the case for async workflows) in order to ensure the continuation and the ability to mark parent step in parent workflow as success or failure

Co-authored-by: Carlos R. L. Rodrigues <37986729+carlos-r-l-rodrigues@users.noreply.github.com>
2025-04-17 09:34:19 +00:00
Adrien de Peretti
13e159d8ad fix(workflow-engine-*): Prevent passing shared context reference (#11873)
* fix(workflow-engine-*): Prevent passing shared context reference

* fix(workflow-engine-*): Prevent passing shared context reference

* prevent tests from hanging

* fix event handling

* add integration tests

* use interval for scheduled in tests

* skip tests for now

* Create silent-glasses-enjoy.md

* fix cancel

* changeset

* push multiple aliases

* test multiple field alias

* increase wait time to index on test

---------

Co-authored-by: Carlos R. L. Rodrigues <37986729+carlos-r-l-rodrigues@users.noreply.github.com>
Co-authored-by: Carlos R. L. Rodrigues <rodrigolr@gmail.com>
2025-04-09 10:39:29 +02:00
Carlos R. L. Rodrigues
0625f76cd4 chore(workflow-engine): export cancel method (#11844)
What:
  * Workflow engine exports the method `cancel` to revert a workflow.
2025-03-17 12:59:09 +00:00
Adrien de Peretti
72d2cf9207 fix(workflow-engines): race condition when retry interval is used (#11771) 2025-03-12 09:53:34 -03:00
Adrien de Peretti
7d8f6cf39f fix(): Workflow cancellation + gracefully handle non serializable state (#10674)
FIXES FRMW-2852

**What**
A workflow distributed transaction expect any response and error to be serializable. When it is not the case, the distributed transaction might fail during the save checkpoint that occurs for async steps. This can lead to unexpected behaviour.

With this pr, we introduce a way to handle non serialazable object in a more sustainable manner, this means the following:

- If a workflow throw any non serialazable error (e.g AWS error that contains full IncomingMessage object that related to network communication, think of req/res) then we identify that this object is not serialzable and we clean up the object to make it serializable without loosing the main information, add a new error to the workflow to informed of this issue and can be handled by the user.
- If a response is not serializable (which should not happen at this point because it is handled before by the value resolver), in that case, we wont be able to reuse that response to continue the workflow which means that the workflow is in a non runnable state. In that case we throw a specific error stating that a non serializable context is being provided

**second what**
This pr refactor the `runAsStep` to add better support for workflow cancelation, especially async ones
2025-01-05 13:30:17 +00:00
Carlos R. L. Rodrigues
90ae187e09 fix(workflows-sdk): name for when/then step (#10459) 2024-12-05 15:47:42 -03:00
Carlos R. L. Rodrigues
1eef324af3 fix(orchestration): fix set step failure (#10031)
What:
 - copy data before saving checkpoint
 - removed unused data format function
 - properly handle registerStepFailure to not throw
 - emit onFinish event even when execution failed
2024-11-12 10:06:36 +00:00
Adrien de Peretti
6937d74252 chore(): Move dependencies around (#9278)
RESOLVES FRMW-2716 [1]

**What**
First stab at re organising dependencies in the modules and their usage.
2024-09-24 14:58:04 +00:00
Carlos R. L. Rodrigues
ef8dc4087e feat: run nested async workflows (#9119) 2024-09-16 13:06:45 +00:00
Stevche Radevski
bcba5b3a6c chore: Make scheduled job tests less likely to fail (#8241) 2024-07-23 16:59:27 +02:00
Stevche Radevski
26d600b6db feat: Ensure async workflow executions have access to shared container (#8157)
* feat: Ensure async workflow executions have access to shared container

* fix: Register workflow worker on application start
2024-07-17 12:17:48 +02:00
Stevche Radevski
69410162f6 feat: Add support for scheduled workflows (#7651)
We still need to:
But wanted to open the PR for early feedback on the approach
2024-06-10 14:49:52 +00:00
Adrien de Peretti
4eae25e1ef chore(): Reorganize modules (#7210)
**What**
Move all modules to the modules directory
2024-05-02 15:33:34 +00:00