Story: https://linear.app/n8n/issue/PAY-839
This is a longstanding bug, fixed now so that the S3 backend for binary
data can use execution IDs as part of the filename.
To reproduce:
1. Set up a workflow with a POST Webhook node that accepts binary data.
2. Activate the workflow and call it sending a binary file, e.g. `curl
-X POST -F "file=@/path/to/binary/file/test.jpg"
http://localhost:5678/webhook/uuid`
3. Check `~/.n8n/binaryData`. The binary data and metadata files will be
missing the execution ID, e.g. `11869055-83c4-4493-876a-9092c4708b9b`
instead of `39011869055-83c4-4493-876a-9092c4708b9b`.
Based on #7065 | Story: https://linear.app/n8n/issue/PAY-771
n8n on filesystem mode marks binary data to delete on manual execution
deletion, on unsaved execution completion, and on every execution
pruning cycle. We later prune binary data in a separate cycle via these
marker files, based on the configured TTL. In the context of introducing
an S3 client to manage binary data, the filesystem mode's mark-and-prune
setup is too tightly coupled to the general binary data management
client interface.
This PR...
- Ensures the deletion of an execution causes the deletion of any binary
data associated to it. This does away with the need for binary data TTL
and simplifies the filesystem mode's mark-and-prune setup.
- Refactors all execution deletions (including pruning) to cause soft
deletions, hard-deletes soft-deleted executions based on the existing
pruning config, and adjusts execution endpoints to filter out
soft-deleted executions. This reduces DB load, and keeps binary data
around long enough for users to access it when building workflows with
unsaved executions.
- Moves all execution pruning work from an execution lifecycle hook to
`execution.repository.ts`. This keeps related logic in a single place.
- Removes all marking logic from the binary data manager. This
simplifies the interface that the S3 client will meet.
- Adds basic sanity-check tests to pruning logic and execution deletion.
Out of scope:
- Improving existing pruning logic.
- Improving existing execution repository logic.
- Adjusting dir structure for filesystem mode.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
# Motivation
In Queue mode, finished executions would cause the main instance to
always pull all execution data from the database, unflatten it and then
use it to send out event log events and telemetry events, as well as
required returns to Respond to Webhook nodes etc.
This could cause OOM errors when the data was large, since it had to be
fully unpacked and transformed on the main instance’s side, using up a
lot of memory (and time).
This PR attempts to limit this behaviour to only happen in those
required cases where the data has to be forwarded to some waiting
webhook, for example.
# Changes
Execution data is only required in cases, where the active execution has
a `postExecutePromise` attached to it. These usually forward the data to
some other endpoint (e.g. a listening webhook connection).
By adding a helper `getPostExecutePromiseCount()`, we can decide that in
cases where there is nothing listening at all, there is no reason to
pull the data on the main instance.
Previously, there would always be postExecutePromises because the
telemetry events were called. Now, these have been moved into the
workers, which have been given the various InternalHooks calls to their
hook function arrays, so they themselves issue these telemetry and event
calls.
This results in all event log messages to now be logged on the worker’s
event log, as well as the worker’s eventbus being the one to send out
the events to destinations. The main event log does…pretty much nothing.
We are not logging executions on the main event log any more, because
this would require all events to be replicated 1:1 from the workers to
the main instance(s) (this IS possible and implemented, see the worker’s
`replicateToRedisEventLogFunction` - but it is not enabled to reduce the
amount of traffic over redis).
Partial events in the main log could confuse the recovery process and
would result in, ironically, the recovery corrupting the execution data
by considering them crashed.
# Refactor
I have also used the opportunity to reduce duplicate code and move some
of the hook functionality into
`packages/cli/src/executionLifecycleHooks/shared/sharedHookFunctions.ts`
in preparation for a future full refactor of the hooks
* refactor: Set up ownership service
* refactor: Specify cache keys and values
* refactor: Replace util with service calls
* test: Mock service in tests
* refactor: Use dependency injection
* test: Write tests
* refactor: Apply feedback from Omar and Micha
* test: Fix tests
* test: Fix missing spot
* refactor: Return user entity from cache
* refactor: More dependency injection!
* first commit for postgres migration
* (not working)
* sqlite migration
* quicksave
* fix tests
* fix pg test
* fix postgres
* fix variables import
* fix execution saving
* add user settings fix
* change migration to single lines
* patch preferences endpoint
* cleanup
* improve variable import
* cleanup unusued code
* Update packages/cli/src/PublicApi/v1/handlers/workflows/workflows.handler.ts
Co-authored-by: Omar Ajoue <krynble@gmail.com>
* address review notes
* fix var update/import
* refactor: Separate execution data to its own table (#6323)
* wip: Temporary migration process
* refactor: Create boilerplate repository methods for executions
* fix: Lint issues
* refactor: Added search endpoint to repository
* refactor: Make the execution list work again
* wip: Updating how we create and update executions everywhere
* fix: Lint issues and remove most of the direct access to execution model
* refactor: Remove includeWorkflowData flag and fix more tests
* fix: Lint issues
* fix: Fixed ordering of executions for FE, removed transaction when saving execution and removed unnecessary update
* refactor: Add comment about missing feature
* refactor: Refactor counting executions
* refactor: Add migration for other dbms and fix issues found
* refactor: Fix lint issues
* refactor: Remove unnecessary comment and auto inject repo to internal hooks
* refactor: remove type assertion
* fix: Fix broken tests
* fix: Remove unnecessary import
* Remove unnecessary toString() call
Co-authored-by: Iván Ovejero <ivov.src@gmail.com>
* fix: Address comments after review
* refactor: Remove unused import
* fix: Lint issues
* fix: Add correct migration files
---------
Co-authored-by: Iván Ovejero <ivov.src@gmail.com>
* remove null values from credential export
* fix: Fix an issue with queue mode where all running execution would be returned
* fix: Update n8n node to allow for workflow ids with letters
* set upstream on set branch
* remove typo
* add nodeAccess to credentials
* fix unsaved run check for undefined id
* fix(core): Rename version control feature to source control (#6480)
* rename versionControl to sourceControl
* fix source control tooltip wording
---------
Co-authored-by: Romain Minaud <romain.minaud@gmail.com>
* fix(editor): Pay 548 hide the set up version control button (#6485)
* feat(DebugHelper Node): Fix and include in main app (#6406)
* improve node a bit
* fixing continueOnFail() ton contain error in json
* improve pairedItem
* fix random data returning object results
* fix nanoId length typo
* update pnpm-lock file
---------
Co-authored-by: Marcus <marcus@n8n.io>
* fix(editor): Remove setup source control CTA button
* fix(editor): Remove setup source control CTA button
---------
Co-authored-by: Michael Auerswald <michael.auerswald@gmail.com>
Co-authored-by: Marcus <marcus@n8n.io>
* fix(editor): Update source control docs links (#6488)
* feat(DebugHelper Node): Fix and include in main app (#6406)
* improve node a bit
* fixing continueOnFail() ton contain error in json
* improve pairedItem
* fix random data returning object results
* fix nanoId length typo
* update pnpm-lock file
---------
Co-authored-by: Marcus <marcus@n8n.io>
* feat(editor): Replace root events with event bus events (no-changelog) (#6454)
* feat: replace root events with event bus events
* fix: prevent cypress from replacing global with globalThis in import path
* feat: remove emitter mixin
* fix: replace component events with event bus
* fix: fix linting issue
* fix: fix breaking expression switch
* chore: prettify ndv e2e suite code
* fix(editor): Update source control docs links
---------
Co-authored-by: Michael Auerswald <michael.auerswald@gmail.com>
Co-authored-by: Marcus <marcus@n8n.io>
Co-authored-by: Alex Grozav <alex@grozav.com>
* fix tag endpoint regex
---------
Co-authored-by: Omar Ajoue <krynble@gmail.com>
Co-authored-by: Iván Ovejero <ivov.src@gmail.com>
Co-authored-by: Romain Minaud <romain.minaud@gmail.com>
Co-authored-by: Csaba Tuncsik <csaba@n8n.io>
Co-authored-by: Marcus <marcus@n8n.io>
Co-authored-by: Alex Grozav <alex@grozav.com>
* remove unnecesary Db re-initialization
this is from before we added `Db.init` in `WorkflowRunnerProcess`
* feat(core): Improved health check
* make health check not care about DB connections
* close DB connections, and shutdown the timer
* wip: workflow execution filtering
* fix: import type failing to build
* fix: remove console.logs
* feat: execution metadata migrations
* fix(editor): Move global executions filter to its own component
* fix(editor): Using the same filter component in workflow level
* fix(editor): a small housekeeping
* checking workflowId in filter applied
* fix(editor): update filter after resolving merge conflicts
* fix(editor): unify empy filter status
* feat(editor): add datetime picker to filter
* feat(editor): add meta fields
* fix: fix button override in datepicker panel
* feat(editor): add filter metadata
* feat(core): add 'startedBefore' execution filter prop
* feat(core): add 'tags' execution query filter
* Revert "feat(core): add 'tags' execution query filter"
This reverts commit a7b968081c.
* feat(editor): add translations and tooltip and counting selected filter props
* fix(editor): fix label layouts
* fix(editor): update custom data docs link
* fix(editor): update custom data tooltip position
* fix(editor): update tooltip text
* refactor: Ignore metadata if not enabled by license
* fix(editor): Add paywall states to advanced execution filter
* refactor: Save custom data also for worker mode
* fix: Remove duplicate migration name from list
* fix(editor): Reducing filter complexity and add debounce to text inputs
* fix(editor): Remove unused import, add comment
* fix(editor): simplify event listener
* fix: Prevent error when there are running executions
* test(editor): Add advanced execution filter basic unit test
* test(editor): Add advanced execution filter state change unit test
* fix: Small lint issue
* feat: Add indices to speed up queries
* feat: add customData limits
* refactor: put metadata save in transaction
* chore: remove unneed comment
* test: add tests for execution metadata
* fix(editor): Fixes after merge conflict
* fix(editor): Remove unused import
* wordings and ui fixes
* fix(editor): type fixes
* feat: add code node autocompletions for customData
* fix: Prevent transaction issues and ambiguous ID in sql clauses
* fix(editor): Suppress requesting current executions if metadata is used in filter (#5739)
* fix(editor): Suppress requesting current executions if metadata is used in filter
* fix(editor): Fix arrows for select in popover
* refactor: Improve performance by correcting database indices
* fix: Lint issue
* test: Fix broken test
* fix: Broken test
* test: add call data check for saveExecutionMetadata test
---------
Co-authored-by: Valya Bullions <valya@n8n.io>
Co-authored-by: Alex Grozav <alex@grozav.com>
Co-authored-by: Omar Ajoue <krynble@gmail.com>
Co-authored-by: Romain Minaud <romain.minaud@gmail.com>
* use typedi for UserManagementMailer
* use typedi for SamlService
* fix typos
* use typedi for Queue
* use typedi for License
* convert some more code to use typedi
* fix(core): Execution pruning should delete query should use the `OR` operator
* fix(core): Prune executions in a chunk to avoid sqlite error "Expression tree is too large"
* reduce the memory usage during execution pruning
* add typedi
* convert ActiveWorkflowRunner into an injectable service
* convert ExternalHooks into an injectable service
* convert InternalHooks into an injectable service
* convert LoadNodesAndCredentials into an injectable service
* convert NodeTypes and CredentialTypes into an injectable service
* convert ActiveExecutions into an injectable service
* convert WaitTracker into an injectable service
* convert Push into an injectable service
* convert ActiveWebhooks and TestWebhooks into an injectable services
* handle circular references, and log errors when a circular dependency is found
* Prune execution data when more than cofnfigured limit
* use stricter typings
* use `pruneDataMaxCount`
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
* adds ExecutionEvents view modal to ExecutionList
* fix time rendering and remove wf column
* checks for unfinished executions and fails them
* prevent re-setting stoppedAt for execution
* some cleanup / manually create rundata after crash
* quicksave
* remove Threads lib, log worker rewrite
* cleanup comment
* fix sentry destination return value
* test for tests...
* run tests with single worker
* fix tests
* remove console log
* add endpoint for execution data recovery
* lint cleanup and some refactoring
* fix accidental recursion
* remove cyclic imports
* add rundata recovery to Workflowrunner
* remove comments
* cleanup and refactor
* adds a status field to executions
* setExecutionStatus on queued worker
* fix onWorkflowPostExecute
* set waiting from worker
* get crashed status into frontend
* remove comment
* merge fix
* cleanup
* catch empty rundata in recovery
* refactor IExecutionsSummary and inject nodeExecution Errors
* reduce default event log size to 10mb from 100mb
* add per node execution status
* lint fix
* merge and lint fix
* phrasing change
* improve preview rendering and messaging
* remove debug
* Improve partial rundata recovery
* fix labels
* fix line through
* send manual rundata to ui at crash
* some type and msg push fixes
* improve recovered item rendering in preview
* update workflowStatistics on recover
* merge fix
* review fixes
* merge fix
* notify eventbus when ui is back up
* add a small timeout to make sure the UI is back up
* increase reconnect timeout to 30s
* adjust recover timeout and ui connection lost msg
* do not stop execution in editor after x reconnects
* add executionRecovered push event
* fix recovered connection not green
* remove reconnect toast and merge existing rundata
* merge editor and recovered data for own mode
* 🔨 - Remove `shared` key from execution save data
* 👕 - Using import type where needed
* remove console.log
* 🔨 - Create new clean workflowData instead of removing shared
If IWorkflowBase changes in future, TS will error out here ensuring it's kept up to date
* 🔨 - use lodash.pick for less verbosity
* 🔨 - fix lodash imports
* fix: Stop OOM crashed in Execution Data pruning
Currently while pruning execution data, we are loading all the data in memory. For instances where there are thousands of executions, this causes the container to run out of memory.
Since ids is all we need, we should only query for ids.
* query for Executions only when ids are actually needed for pruning binary data
in default mode the binary data is in the database, and will get pruned along with the executions.
* ✨ Create rule `no-unneeded-backticks`
* 👕 Enable rule
* ⚡ Run rule on `cli`
* ⚡ Run rule on `core`
* ⚡ Run rule on `workflow`
* ⚡ Rule rule on `design-system`
* ⚡ Run rule on `node-dev`
* ⚡ Run rule on `editor-ui`
* ⚡ Run rule on `nodes-base`
* fix: Prevent workflows with only manual trigger from being activated
* fix: Fix workflow id when sharing from workflows list
* fix: Update sharing modal translations
* fix: Allow sharees to disable workflows and fix issue with unique key when removing a user
* refactor: Improve error messages and change logging level to be less verbose
* fix: Broken user removal transfer issue
* feat: Implement workflow sharing BE telemetry
* chore: temporarily add sharing env vars
* feat: Implement BE telemetry for workflow sharing
* fix: Prevent issues with possibly missing workflow id
* feat: Replace WorkflowSharing flag references (no-changelog) (#4918)
* ci: Block all external network calls in tests (no-changelog) (#4930)
* setup nock to prevent tests from making any external requests
* mock all calls to posthog sdk
* feat: Replace WorkflowSharing flag references (no-changelog)
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <netroy@users.noreply.github.com>
* refactor: Remove temporary feature flag for workflow sharing
* refactor: add sharing_role to both manual and node executions
* refactor: Allow changing name, position and disabled of read only nodes
* feat: Overhaul dynamic translations for local and cloud (#4943)
* feat: Overhaul dynamic translations for local and cloud
* fix: remove type casting
* chore: remove unused translations
* fix: fix workflow sharing translation
* test: Fix broken test
* refactor: remove unnecessary import
* refactor: Minor code improvements
* refactor: rename dynamicTranslations to contextBasedTranslationKeys
* fix: fix type imports
* refactor: Consolidate sharing feature check
* feat: update cred sharing unavailable translations
* feat: update upgrade message when user management not available
* fix: rename plan names to Pro and Power
* feat: update translations to no longer contain plan names
* wip: subworkflow permissions
* feat: add workflowsFromSameOwner caller policy
* feat: Fix subworkflow permissions
* shared entites should check for role when deleting users
* refactor: remove circular dependency
* role filter shouldn't be an array
* fixed role issue
* fix: Corrected behavior when removing users
* feat: show instance owner credential sharing message only if isnt sharee
* feat: update workflow caller policy caller ids labels
* feat: update upgrade plan links to contain instance ids
* fix: show check errors below creds message only to owner
* fix(editor): Hide usage page on cloud
* fix: update credential validation error message for sharee
* fix(core): Remove duplicate import
* fix(editor): Extending deployment types
* feat: Overhaul contextual translations (#4992)
feat: update how contextual translations work
* refactor: improve messageing for subworkflow permissions
* test: Fix issue with user deletion and transfer
* fix: Explicitly throw error message so it can be displayed in UI
Co-authored-by: Alex Grozav <alex@grozav.com>
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <netroy@users.noreply.github.com>
Co-authored-by: freyamade <freya@n8n.io>
Co-authored-by: Csaba Tuncsik <csaba@n8n.io>
* Mark binary data to be deleted when pruning executions
* eslint
* make pruneExecutionData async
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
* feat(cli): Setup error tracking using Sentry
* make error reporting available in the workflows package
* address some of the PR comments
* create a ErrorReporterProxy like LoggerProxy
* remove the `captureError` helper. use ErrorReporterProxy directly
* fix linting issues
* remove ErrorReporterProxy warnings in tests
* check for NODE_ENV === 'production' instead
* IErrorReporter -> ErrorReporter
* ErrorReporterProxy.getInstance() -> ErrorReporter
* allow capturing stacks in warnings as well
* make n8n debugging consistent with `npm start`
* IReportingOptions -> ReportingOptions
* use consistent signature for `error` and `warn`
* use Logger instead of console.log