Go to file

Harminder Virk cf0297f74a feat: implement stream based processing of the files (#12574 )

Fixes: FRMW-2960

This PR adds support for processing large CSV files by breaking them into chunks and processing one chunk at a time. This is how it works in nutshell.

- The CSV file is read as a stream and each chunk of the stream is one CSV row.
- We read upto 1000 rows (plus a few more to ensure product variants of a product are not split into multiple chunks).
- Each chunk is then normalized using the `CSVNormalizer` and validated using zod schemas. If there is an error, the entire process will be aborted and the existing chunks will be deleted.
- Each chunk is written to a JSON file, so that we can process them later (after user confirms) without re-processing or validating the CSV file.
- The confirmation process will start consuming one chunk at a time and create/update products using the `batchProducts` workflow.

## Resume or not to resume processing of chunks

Let's imagine during processing of chunks, we find that chunk 3 leads to a database error. However, till this time we have processed the first two chunks already. How do we deal with this situation? Options are:

- We store at which chunk we failed and then during the re-upload we ignore chunks before the failed one. In my conversation with @olivermrbl we discovered that resuming will have to work with certain assumptions if we decide to implement it.
- What if a user updates the CSV rows which are part of the already processed chunks? These changes will be ignored and they will never notice it.
- Resuming works if the file name is still the same. What if they made changes and saved the file with "Save as - New name". In that case we will anyways process the entire file.
- We will have to fetch the old workflow from the workflow engine using some `ilike` search, so that we can see at which chunk the last run failed for the given file.

Co-authored-by: Carlos R. L. Rodrigues <37986729+carlos-r-l-rodrigues@users.noreply.github.com>

2025-05-29 05:42:16 +00:00

.changeset

feat: implement stream based processing of the files (#12574 )

2025-05-29 05:42:16 +00:00

.github

chore(test-cli-with-database): Update cli tag from latest to preview (#12529 )

2025-05-19 09:59:00 +02:00

.yarn

chore: Patch changesets to avoid major bumps (#12151 )

2025-04-11 10:18:31 +02:00

integration-tests

feat(core-flows,js-sdk,medusa,types): draft order delete (#12172 )

2025-05-28 14:37:00 +02:00

packages

feat: implement stream based processing of the files (#12574 )

2025-05-29 05:42:16 +00:00

scripts

test: use shared as integration-tests level (#12278 )

2025-04-28 18:41:19 +05:30

www

docs: fix broken links utility + uncaught broken links (#12637 )

2025-05-28 17:13:27 +03:00

_tsconfig.base.json

feat(framework,medusa): Ensure publishable key middleware is set for all store endpoints (#9429 )

2024-10-02 18:01:50 +02:00

.eslintignore

fea(providers): locking postgres (#9545 )

2024-10-17 13:11:39 +00:00

.eslintrc.js

feat(ui,dashboard): Add DataTable block (#10024 )

2025-01-20 13:26:12 +00:00

.gitattributes

…

.gitignore

[Fix] Update Repository Directory Paths for All Packages (#10910 )

2025-01-13 12:49:50 +01:00

.prettierrc

feat(ui,dashboard): Add DataTable block (#10024 )

2025-01-20 13:26:12 +00:00

.vale.ini

…

.yarnrc.yml

…

CHANGELOG.md

…

CODEOWNERS

…

CONTRIBUTING.md

chore: fix links in contribution guidelines (#11829 )

2025-03-24 07:08:16 +01:00

define_jest_config.js

chore: swc/jest source maps config to inline instead of true (#9531 )

2024-10-11 09:40:51 +02:00

jest.config.js

…

LICENSE

…

package.json

test: use shared as integration-tests level (#12278 )

2025-04-28 18:41:19 +05:30

README.md

Update README.md (#10825 )

2025-01-06 08:23:36 +00:00

SECURITY.md

…

turbo.json

test: use shared as integration-tests level (#12278 )

2025-04-28 18:41:19 +05:30

yarn.lock

feat: implement stream based processing of the files (#12574 )

2025-05-29 05:42:16 +00:00

README.md

Medusa

Documentation | Website

Building blocks for digital commerce

Getting Started

Visit the Documentation to set up a Medusa application.

What is Medusa

Medusa is an ecommerce platform with a built-in framework for customization that allows you to build custom commerce applications without reinventing core commerce logic. The framework and modules can be used to build advanced B2B or DTC ecommerce stores, marketplaces, PoS systems, service businesses, or any product that needs foundational commerce primitives. All commerce modules are open-source and freely available on npm.

Learn more about Medusa’s architecture and commerce modules in the Docs.

Upgrades & Integrations

Follow the Release Notes to keep your Medusa project up-to-date.

Check out all available Medusa integrations.

Community & Contributions

The community and core team are available in GitHub Discussions, where you can ask for support, discuss roadmap, and share ideas.

Our Contribution Guide describes how to contribute to the codebase and Docs.

Join our Discord server to meet other community members.

Other channels

License

Licensed under the MIT License.

README.md Unescape Escape

Medusa

Documentation | Website

Getting Started

What is Medusa

Upgrades & Integrations

Community & Contributions

Other channels

License

README.md