Harminder Virk cf0297f74a feat: implement stream based processing of the files (#12574)
Fixes: FRMW-2960

This PR adds support for processing large CSV files by breaking them into chunks and processing one chunk at a time. This is how it works in nutshell.

- The CSV file is read as a stream and each chunk of the stream is one CSV row.
- We read upto 1000 rows (plus a few more to ensure product variants of a product are not split into multiple chunks).
- Each chunk is then normalized using the `CSVNormalizer` and validated using zod schemas. If there is an error, the entire process will be aborted and the existing chunks will be deleted.
- Each chunk is written to a JSON file, so that we can process them later (after user confirms) without re-processing or validating the CSV file.
- The confirmation process will start consuming one chunk at a time and create/update products using the `batchProducts` workflow.

## Resume or not to resume processing of chunks

Let's imagine during processing of chunks, we find that chunk 3 leads to a database error. However, till this time we have processed the first two chunks already. How do we deal with this situation? Options are:

- We store at which chunk we failed and then during the re-upload we ignore chunks before the failed one. In my conversation with @olivermrbl we discovered that resuming will have to work with certain assumptions if we decide to implement it.
   - What if a user updates the CSV rows which are part of the already processed chunks? These changes will be ignored and they will never notice it.
   - Resuming works if the file name is still the same. What if they made changes and saved the file with "Save as - New name". In that case we will anyways process the entire file.
   - We will have to fetch the old workflow from the workflow engine using some `ilike` search, so that we can see at which chunk the last run failed for the given file.

Co-authored-by: Carlos R. L. Rodrigues <37986729+carlos-r-l-rodrigues@users.noreply.github.com>
2025-05-29 05:42:16 +00:00
2025-01-06 08:23:36 +00:00

Medusa logo

Medusa

Documentation | Website

Building blocks for digital commerce

Medusa is released under the MIT license. PRs welcome!

Follow @medusajs Discord Chat

Getting Started

Visit the Documentation to set up a Medusa application.

What is Medusa

Medusa is an ecommerce platform with a built-in framework for customization that allows you to build custom commerce applications without reinventing core commerce logic. The framework and modules can be used to build advanced B2B or DTC ecommerce stores, marketplaces, PoS systems, service businesses, or any product that needs foundational commerce primitives. All commerce modules are open-source and freely available on npm.

Learn more about Medusas architecture and commerce modules in the Docs.

Upgrades & Integrations

Follow the Release Notes to keep your Medusa project up-to-date.

Check out all available Medusa integrations.

Community & Contributions

The community and core team are available in GitHub Discussions, where you can ask for support, discuss roadmap, and share ideas.

Our Contribution Guide describes how to contribute to the codebase and Docs.

Join our Discord server to meet other community members.

Other channels

License

Licensed under the MIT License.

Description
No description provided
Readme 539 MiB
Languages
TypeScript 84.9%
JavaScript 14.8%
Shell 0.2%