feat: implement stream based processing of the files (#12574)

Fixes: FRMW-2960

This PR adds support for processing large CSV files by breaking them into chunks and processing one chunk at a time. This is how it works in nutshell.

- The CSV file is read as a stream and each chunk of the stream is one CSV row.
- We read upto 1000 rows (plus a few more to ensure product variants of a product are not split into multiple chunks).
- Each chunk is then normalized using the `CSVNormalizer` and validated using zod schemas. If there is an error, the entire process will be aborted and the existing chunks will be deleted.
- Each chunk is written to a JSON file, so that we can process them later (after user confirms) without re-processing or validating the CSV file.
- The confirmation process will start consuming one chunk at a time and create/update products using the `batchProducts` workflow.

## Resume or not to resume processing of chunks

Let's imagine during processing of chunks, we find that chunk 3 leads to a database error. However, till this time we have processed the first two chunks already. How do we deal with this situation? Options are:

- We store at which chunk we failed and then during the re-upload we ignore chunks before the failed one. In my conversation with @olivermrbl we discovered that resuming will have to work with certain assumptions if we decide to implement it.
- What if a user updates the CSV rows which are part of the already processed chunks? These changes will be ignored and they will never notice it.
- Resuming works if the file name is still the same. What if they made changes and saved the file with "Save as - New name". In that case we will anyways process the entire file.
- We will have to fetch the old workflow from the workflow engine using some `ilike` search, so that we can see at which chunk the last run failed for the given file.

Co-authored-by: Carlos R. L. Rodrigues <37986729+carlos-r-l-rodrigues@users.noreply.github.com>

This commit is contained in:

Harminder Virk

2025-05-29 11:12:16 +05:30

committed by

GitHub

parent 40e73c6ea2

commit cf0297f74a

12 changed files with 360 additions and 141 deletions

									
										packages/core/utils/src/common/index.ts
									
		+1
		
												View File
												
				@@ -88,3 +88,4 @@ export * from "./upper-case-first"

				export * from "./validate-handle"

				export * from "./validate-module-name"

				export * from "./wrap-handler"

				export * from "./normalize-csv-value"

									
										packages/core/utils/src/common/normalize-csv-value.ts
									
		+10
		
												View File
												
				@@ -0,0 +1,10 @@

				/**

				 * Normalizes a CSV value by removing the leading "\r" from the

				 * value.

				 */

				export function normalizeCSVValue<T>(value: T): T {

				  if (typeof value === "string") {

				    return value.replace(/\\r$/, "").trim() as T

				  }

				  return value

				}