Files
medusa-store/packages/core/core-flows/package.json
Harminder Virk cf0297f74a feat: implement stream based processing of the files (#12574)
Fixes: FRMW-2960

This PR adds support for processing large CSV files by breaking them into chunks and processing one chunk at a time. This is how it works in nutshell.

- The CSV file is read as a stream and each chunk of the stream is one CSV row.
- We read upto 1000 rows (plus a few more to ensure product variants of a product are not split into multiple chunks).
- Each chunk is then normalized using the `CSVNormalizer` and validated using zod schemas. If there is an error, the entire process will be aborted and the existing chunks will be deleted.
- Each chunk is written to a JSON file, so that we can process them later (after user confirms) without re-processing or validating the CSV file.
- The confirmation process will start consuming one chunk at a time and create/update products using the `batchProducts` workflow.

## Resume or not to resume processing of chunks

Let's imagine during processing of chunks, we find that chunk 3 leads to a database error. However, till this time we have processed the first two chunks already. How do we deal with this situation? Options are:

- We store at which chunk we failed and then during the re-upload we ignore chunks before the failed one. In my conversation with @olivermrbl we discovered that resuming will have to work with certain assumptions if we decide to implement it.
   - What if a user updates the CSV rows which are part of the already processed chunks? These changes will be ignored and they will never notice it.
   - Resuming works if the file name is still the same. What if they made changes and saved the file with "Save as - New name". In that case we will anyways process the entire file.
   - We will have to fetch the old workflow from the workflow engine using some `ilike` search, so that we can see at which chunk the last run failed for the given file.

Co-authored-by: Carlos R. L. Rodrigues <37986729+carlos-r-l-rodrigues@users.noreply.github.com>
2025-05-29 05:42:16 +00:00

57 lines
1.3 KiB
JSON

{
"name": "@medusajs/core-flows",
"version": "2.8.3",
"description": "Set of workflow definitions for Medusa",
"main": "dist/index.js",
"exports": {
".": "./dist/index.js"
},
"repository": {
"type": "git",
"url": "https://github.com/medusajs/medusa",
"directory": "packages/core-flows"
},
"publishConfig": {
"access": "public"
},
"engines": {
"node": ">=20"
},
"files": [
"dist",
"!dist/**/__tests__",
"!dist/**/__mocks__",
"!dist/**/__fixtures__"
],
"author": "Medusa",
"license": "MIT",
"devDependencies": {
"@medusajs/framework": "2.8.3",
"@mikro-orm/core": "6.4.3",
"@mikro-orm/knex": "6.4.3",
"@mikro-orm/migrations": "6.4.3",
"@mikro-orm/postgresql": "6.4.3",
"@swc/core": "^1.7.28",
"@swc/jest": "^0.2.36",
"awilix": "^8.0.1",
"expect-type": "^0.20.0",
"jest": "^29.7.0",
"pg": "^8.13.0",
"rimraf": "^5.0.1",
"typescript": "^5.6.2"
},
"dependencies": {
"csv-parse": "^5.6.0",
"json-2-csv": "^5.5.4"
},
"peerDependencies": {
"@medusajs/framework": "2.8.3",
"awilix": "^8.0.1"
},
"scripts": {
"build": "rimraf dist && tsc --build",
"watch": "tsc --build --watch",
"test": "jest --runInBand --bail --forceExit --passWithNoTests"
}
}