Fixes: FRMW-2960
This PR adds support for processing large CSV files by breaking them into chunks and processing one chunk at a time. This is how it works in nutshell.
- The CSV file is read as a stream and each chunk of the stream is one CSV row.
- We read upto 1000 rows (plus a few more to ensure product variants of a product are not split into multiple chunks).
- Each chunk is then normalized using the `CSVNormalizer` and validated using zod schemas. If there is an error, the entire process will be aborted and the existing chunks will be deleted.
- Each chunk is written to a JSON file, so that we can process them later (after user confirms) without re-processing or validating the CSV file.
- The confirmation process will start consuming one chunk at a time and create/update products using the `batchProducts` workflow.
## Resume or not to resume processing of chunks
Let's imagine during processing of chunks, we find that chunk 3 leads to a database error. However, till this time we have processed the first two chunks already. How do we deal with this situation? Options are:
- We store at which chunk we failed and then during the re-upload we ignore chunks before the failed one. In my conversation with @olivermrbl we discovered that resuming will have to work with certain assumptions if we decide to implement it.
- What if a user updates the CSV rows which are part of the already processed chunks? These changes will be ignored and they will never notice it.
- Resuming works if the file name is still the same. What if they made changes and saved the file with "Save as - New name". In that case we will anyways process the entire file.
- We will have to fetch the old workflow from the workflow engine using some `ilike` search, so that we can see at which chunk the last run failed for the given file.
Co-authored-by: Carlos R. L. Rodrigues <37986729+carlos-r-l-rodrigues@users.noreply.github.com>
Fixes: FRMW-2974
Currently during the product imports, we create multiple chunks that must be deleted after the import has finished (either successfully or with an error). Deleting files one by one leads to multiple network calls and slows down everything.
The `bulkDelete` method deletes multiple files (with their fileKey) in one go
* feat: Add an analytics module and local and posthog providers
* fix: Add tests and wire up in missing places
* fix: Address feedback and add missing module typing
* fix: Address feedback and add missing module typing
---------
Co-authored-by: Adrien de Peretti <adrien.deperetti@gmail.com>
Co-authored-by: Oli Juhl <59018053+olivermrbl@users.noreply.github.com>
Fixes: FRMW-2965
In this PR we replace/remove the existing step to normalize a CSV file with the newly written CSV normalizer and also we validate the file contents further using a Zod schema.
I have duplicated the schema for now. But it is makes sense to re-use the schema for CSV validating and `/admin/products/batch`, then I can keep one source of truth under utils and re-export it. WDYT?
**Screenshots of some errors after validating the file strictly**


* add failing test for upsertWithReplace order
* reproduce prices update shuffling issue
* fix: fix order of returned updates in updateMany
* fix: fix order of returned updates in ProductService
* fix: reset test count to 1
* Create tame-insects-marry.md
---------
Co-authored-by: Oli Juhl <59018053+olivermrbl@users.noreply.github.com>
* fix: Plugin admin folder loading with backslash on Windows
* fix: Plugin admin folder loading with backslash on Windows - Add changeset
---------
Co-authored-by: Oli Juhl <59018053+olivermrbl@users.noreply.github.com>
* feat: implement direct upload
* feat: add direct-upload endpoint
* refactor: implement feedback
* refactor: have a dedicated endpoint for direct uploads
* refactor: convert responses to snakecase
* refactor: rename method to createImport
* test: add tests for the presigned-urls endpoint
Move fulfillment workflow events to be with other workflow events.
Could be considered a breaking change for users using the previously `FulfillmentEvents` variable
* feat(index): add filterable fields to link definition
* rm comment
* break recursion
* validate read only links
* validate filterable
* gql schema array
* link parents
* isInverse
* push id when not present
* Fix ciruclar relationships and add tests to ensure proper behaviour (part 1)
* log and fallback to entity.alias
* cleanup and fixes
* cleanup and fixes
* cleanup and fixes
* fix get attributes
* gql type
* unit test
* array inference
* rm only
* package.json
* pacvkage.json
* fix link retrieval on duplicated entity type and aliases + tests
* link parents as array
* Match only parent entity
* rm comment
* remove hard coded schema
* extend types
* unit test
* test
* types
* pagination type
* type
* fix integration tests
* Improve performance of in selection
* use @@ to filter property
* escape jsonPath
* add Event Bus by default
* changeset
* rm postgres analyze
* estimate count
* new query
* parent aliases
* inner query w/ filter and sort relations
* address comments
---------
Co-authored-by: adrien2p <adrien.deperetti@gmail.com>
Co-authored-by: Oli Juhl <59018053+olivermrbl@users.noreply.github.com>
**What**
I have removed the check for the context key where it was fetching all attributes available and then stripping out the one that does not exists.. On big dataset these would remove multiple hundreds of ms of query execution
**What**
First iteration to prevent events from overwhelming the systems.
- Group emitted event ids when possible instead of creating a message per id which leads to reduced amount of events to process massively in cases of import for example
- Update the index engine to process event data in batches of 100
- Update event handling by the index engine to be able to upsert by batch as well
- Fix index engine build config for intermediate listeners inferrence
* feat: add createMultiple flag to enforce inApp link uniqueness
* changes
* mocks
* default
* many to many
---------
Co-authored-by: Carlos R. L. Rodrigues <rodrigolr@gmail.com>