This change kept coming up in #6713, #7773, and #8135.
So this PR moves the existing code without actually changing anything,
to help get rid of some of the circular dependencies.
## Review / Merge checklist
- [x] PR title and summary are descriptive.
## Summary
Add `ShutdownService` and `OnShutdown` decorator for more unified way to
shutdown different components. Use this new way in the following
components:
- HTTP(S) server
- Pruning service
- Push connection
- License
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
## Summary
We accidentally made some functions `async` in
https://github.com/n8n-io/n8n/pull/7846
This PR reverts that change.
## Review / Merge checklist
- [x] PR title and summary are descriptive.
Add generic N8N_GRACEFUL_SHUTDOWN_TIMEOUT which controls how long n8n
process will wait for graceful exit before exitting forcefully. This
variables replaces the QUEUE_WORKER_TIMEOUT variable that was used for
worker process.
DEPRECATED: QUEUE_WORKER_TIMEOUT deprected
QUEUE_WORKER_TIMEOUT environment variable has been replaced with
N8N_GRACEFUL_SHUTDOWN_TIMEOUT.
If the process doesn't shutdown within a time limit, exit with error
code.
1. conceptually something timing out is an error.
2. on successful exit we close down the DB connection gracefully. On an
exit timeout we rather not do that, since it will wait for any active
connections to close and would possible block the exit.
We added ID-less workflow reporting at #8031, which has already produced
multiple reports coming from internal, enough info to tackle [this
story](https://linear.app/n8n/issue/PAY-1147). To prevent an
overwhelming number of reports from cloud, this PR removes the reporting
for now.
When setting up queue mode, it is easy to overlook that not exporting
Postgres env vars will default the worker to use sqlite, which will fail
during execution with a non-obvious error. Hence add warnings when
starting a worker with an incompatible DB type.
We're initializing the queue twice because of a [bad
merge](2c63474538).
No associated known bugs but no need to init the queue twice. We should
follow up by investigating if any pending bugs can be associated to
this.
Ensure all errors in `cli` are `ApplicationError` or children of it and
contain no variables in the message, to continue normalizing all the
errors we report to Sentry
Follow-up to: https://github.com/n8n-io/n8n/pull/7839
Ensure all errors in `cli` inherit from `ApplicationError` to continue
normalizing all the errors we report to Sentry
Follow-up to: https://github.com/n8n-io/n8n/pull/7820
Followup to #7566 | Story: https://linear.app/n8n/issue/PAY-926
### Manual workflow activation and deactivation
In a multi-main scenario, if the user manually activates or deactivates
a workflow, the process (whether leader or follower) that handles the
PATCH request and updates its internal state should send a message into
the command channel, so that all other main processes update their
internal state accordingly:
- Add to `ActiveWorkflows` if activating
- Remove from `ActiveWorkflows` if deactivating
- Remove and re-add to `ActiveWorkflows` if the update did not change
activation status.
After updating their internal state, if activating or deactivating, the
recipient main processes should push a message to all connected
frontends so that these can update their stores and so reflect the value
in the UI.
### Workflow activation errors
On failure to activate a workflow, the main instance should record the
error in Redis - main instances should always pull activation errors
from Redis in a multi-main scenario.
### Leadership change
On leadership change...
- The old leader should stop pruning and the new leader should start
pruning.
- The old leader should remove trigger- and poller-based workflows and
the new leader should add them.
Story: https://linear.app/n8n/issue/PAY-926
This PR coordinates workflow activation on instance startup and on
leadership change in multiple main scenario in the internal API. Part 3
on manual workflow activation and deactivation will be a separate PR.
### Part 1: Instance startup
In multi-main scenario, on starting an instance...
- [x] If the instance is the leader, it should add webhooks, triggers
and pollers.
- [x] If the instance is the follower, it should not add webhooks,
triggers or pollers.
- [x] Unit tests.
### Part 2: Leadership change
In multi-main scenario, if the main instance leader dies…
- [x] The new main instance leader must activate all trigger- and
poller-based workflows, excluding webhook-based workflows.
- [x] The old main instance leader must deactivate all trigger- and
poller-based workflows, excluding webhook-based workflows.
- [x] Unit tests.
To test, start two instances and check behavior on startup and
leadership change:
```
EXECUTIONS_MODE=queue N8N_LEADER_SELECTION_ENABLED=true N8N_LICENSE_TENANT_ID=... N8N_LICENSE_ACTIVATION_KEY=... N8N_LOG_LEVEL=debug npm run start
EXECUTIONS_MODE=queue N8N_LEADER_SELECTION_ENABLED=true N8N_LICENSE_TENANT_ID=... N8N_LICENSE_ACTIVATION_KEY=... N8N_LOG_LEVEL=debug N8N_PORT=5679 npm run start
```
This PR ensures `MultiMainInstancePublisher` is initialized before
checking if the instance is leader or follower. Followers skip license
init, license check, and pruning start and stop.
Due to a change, during the credentials import command, the core's
Credential object is being called through its prototype. This caused the
Credential's cipher variable to not be set, thus no cipher service being
available during import. This fix catches this edge case and provides a
fix.
https://linear.app/n8n/issue/PAY-933/set-up-leader-selection-for-multiple-main-instances
- [x] Set up new envs
- [x] Add config and license checks
- [x] Implement `MultiMainInstancePublisher`
- [x] Expand `RedisServicePubSubPublisher` to support
`MultiMainInstancePublisher`
- [x] Init `MultiMainInstancePublisher` on startup and destroy on
shutdown
- [x] Add to sandbox plans
- [x] Test manually
Note: This is only for setup - coordinating in reaction to leadership
changes will come in later PRs.
This change ensures that things like `encryptionKey` and `instanceId`
are always available directly where they are needed, instead of passing
them around throughout the code.
In a rare edge case an undefined queue could be returned - this should
not happen and now an error is thrown.
Also using the opportunity to remove a cyclic dependency from the Queue.
This PR adds a message for queue mode which triggers an external secrets
provider reload inside the workers if the configuration has changed on
the main instance.
It also refactors some of the message handler code to remove cyclic
dependencies, as well as remove unnecessary duplicate redis clients
inside services (thanks to no more cyclic deps)
all commands sent between main instance and workers need to contain a
server id to prevent senders from reacting to their own messages,
causing loops
this PR makes sure all sent messages contain a sender id by default as
part of constructing a sending redis client.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Depends on: #7092 | Story:
[PAY-768](https://linear.app/n8n/issue/PAY-768)
This PR:
- Generalizes the `IBinaryDataManager` interface.
- Adjusts `Filesystem.ts` to satisfy the interface.
- Sets up an S3 client stub to be filled in in the next PR.
- Turns `BinaryDataManager` into an injectable service.
- Adjusts the config schema and adds new validators.
Note that the PR looks large but all the main changes are in
`packages/core/src/binaryData`.
Out of scope:
- `BinaryDataManager` (now `BinaryDataService`) and `Filesystem.ts` (now
`fs.client.ts`) were slightly refactored for maintainability, but fully
overhauling them is **not** the focus of this PR, which is meant to
clear the way for the S3 implementation. Future improvements for these
two should include setting up a backwards-compatible dir structure that
makes it easier to locate binary data files to delete, removing
duplication, simplifying cloning methods, using integers for binary data
size instead of `prettyBytes()`, writing tests for existing binary data
logic, etc.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Github issue / Community forum post (link here to close automatically):
---------
Co-authored-by: Omar Ajoue <krynble@gmail.com>
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Based on #7065 | Story: https://linear.app/n8n/issue/PAY-771
n8n on filesystem mode marks binary data to delete on manual execution
deletion, on unsaved execution completion, and on every execution
pruning cycle. We later prune binary data in a separate cycle via these
marker files, based on the configured TTL. In the context of introducing
an S3 client to manage binary data, the filesystem mode's mark-and-prune
setup is too tightly coupled to the general binary data management
client interface.
This PR...
- Ensures the deletion of an execution causes the deletion of any binary
data associated to it. This does away with the need for binary data TTL
and simplifies the filesystem mode's mark-and-prune setup.
- Refactors all execution deletions (including pruning) to cause soft
deletions, hard-deletes soft-deleted executions based on the existing
pruning config, and adjusts execution endpoints to filter out
soft-deleted executions. This reduces DB load, and keeps binary data
around long enough for users to access it when building workflows with
unsaved executions.
- Moves all execution pruning work from an execution lifecycle hook to
`execution.repository.ts`. This keeps related logic in a single place.
- Removes all marking logic from the binary data manager. This
simplifies the interface that the S3 client will meet.
- Adds basic sanity-check tests to pruning logic and execution deletion.
Out of scope:
- Improving existing pruning logic.
- Improving existing execution repository logic.
- Adjusting dir structure for filesystem mode.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
This PR implements the updated license SDK so that worker and webhook
instances do not auto-renew licenses any more.
Instead, they receive a `reloadLicense` command via the Redis client
that will fetch the updated license after it was saved on the main
instance
This also contains some refactoring with moving redis sub and pub
clients into the event bus directly, to prevent cyclic dependency
issues.
# Motivation
In Queue mode, finished executions would cause the main instance to
always pull all execution data from the database, unflatten it and then
use it to send out event log events and telemetry events, as well as
required returns to Respond to Webhook nodes etc.
This could cause OOM errors when the data was large, since it had to be
fully unpacked and transformed on the main instance’s side, using up a
lot of memory (and time).
This PR attempts to limit this behaviour to only happen in those
required cases where the data has to be forwarded to some waiting
webhook, for example.
# Changes
Execution data is only required in cases, where the active execution has
a `postExecutePromise` attached to it. These usually forward the data to
some other endpoint (e.g. a listening webhook connection).
By adding a helper `getPostExecutePromiseCount()`, we can decide that in
cases where there is nothing listening at all, there is no reason to
pull the data on the main instance.
Previously, there would always be postExecutePromises because the
telemetry events were called. Now, these have been moved into the
workers, which have been given the various InternalHooks calls to their
hook function arrays, so they themselves issue these telemetry and event
calls.
This results in all event log messages to now be logged on the worker’s
event log, as well as the worker’s eventbus being the one to send out
the events to destinations. The main event log does…pretty much nothing.
We are not logging executions on the main event log any more, because
this would require all events to be replicated 1:1 from the workers to
the main instance(s) (this IS possible and implemented, see the worker’s
`replicateToRedisEventLogFunction` - but it is not enabled to reduce the
amount of traffic over redis).
Partial events in the main log could confuse the recovery process and
would result in, ironically, the recovery corrupting the execution data
by considering them crashed.
# Refactor
I have also used the opportunity to reduce duplicate code and move some
of the hook functionality into
`packages/cli/src/executionLifecycleHooks/shared/sharedHookFunctions.ts`
in preparation for a future full refactor of the hooks
This PR adds new endpoints to the REST API:
`/orchestration/worker/status` and `/orchestration/worker/id`
Currently these just trigger the return of status / ids from the workers
via the redis back channel, this still needs to be handled and passed
through to the frontend.
It also adds the eventbus to each worker, and triggers a reload of those
eventbus instances when the configuration changes on the main instances.
* refactor: Set up ownership service
* refactor: Specify cache keys and values
* refactor: Replace util with service calls
* test: Mock service in tests
* refactor: Use dependency injection
* test: Write tests
* refactor: Apply feedback from Omar and Micha
* test: Fix tests
* test: Fix missing spot
* refactor: Return user entity from cache
* refactor: More dependency injection!
* ci: Fix test workflows (no-changelog)
We removed `pdf-parse` in #6640, so we need to get these test PDF files from the `test-workflows` repo instead ([which has been updated to include these files](0f6ef1c804))
* remove `\n` from ids and skipList text files
* fix(core): Make node execution order configurable, and backward-compatible
* ⚡ Also add new Merge-Node behaviour
* ⚡ Fix typo
* Fix lint issue
* update labels
* rename legacy to v0
* remove the unnecessary log
* default all new workflows to use v1 execution-order
* remove the controller changes
* clone default settings to avoid it getting modified
---------
Co-authored-by: Jan Oberhauser <jan.oberhauser@gmail.com>
* feat(core): Change node execution order (most top-left one first)
* ⚡ Fix issue with multi-output-nodes
* ⚡ Remove not needed meta-entry in test
* fix the e2e test
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
* first commit for postgres migration
* (not working)
* sqlite migration
* quicksave
* fix tests
* fix pg test
* fix postgres
* fix variables import
* fix execution saving
* add user settings fix
* change migration to single lines
* patch preferences endpoint
* cleanup
* improve variable import
* cleanup unusued code
* Update packages/cli/src/PublicApi/v1/handlers/workflows/workflows.handler.ts
Co-authored-by: Omar Ajoue <krynble@gmail.com>
* address review notes
* fix var update/import
* refactor: Separate execution data to its own table (#6323)
* wip: Temporary migration process
* refactor: Create boilerplate repository methods for executions
* fix: Lint issues
* refactor: Added search endpoint to repository
* refactor: Make the execution list work again
* wip: Updating how we create and update executions everywhere
* fix: Lint issues and remove most of the direct access to execution model
* refactor: Remove includeWorkflowData flag and fix more tests
* fix: Lint issues
* fix: Fixed ordering of executions for FE, removed transaction when saving execution and removed unnecessary update
* refactor: Add comment about missing feature
* refactor: Refactor counting executions
* refactor: Add migration for other dbms and fix issues found
* refactor: Fix lint issues
* refactor: Remove unnecessary comment and auto inject repo to internal hooks
* refactor: remove type assertion
* fix: Fix broken tests
* fix: Remove unnecessary import
* Remove unnecessary toString() call
Co-authored-by: Iván Ovejero <ivov.src@gmail.com>
* fix: Address comments after review
* refactor: Remove unused import
* fix: Lint issues
* fix: Add correct migration files
---------
Co-authored-by: Iván Ovejero <ivov.src@gmail.com>
* remove null values from credential export
* fix: Fix an issue with queue mode where all running execution would be returned
* fix: Update n8n node to allow for workflow ids with letters
* set upstream on set branch
* remove typo
* add nodeAccess to credentials
* fix unsaved run check for undefined id
* fix(core): Rename version control feature to source control (#6480)
* rename versionControl to sourceControl
* fix source control tooltip wording
---------
Co-authored-by: Romain Minaud <romain.minaud@gmail.com>
* fix(editor): Pay 548 hide the set up version control button (#6485)
* feat(DebugHelper Node): Fix and include in main app (#6406)
* improve node a bit
* fixing continueOnFail() ton contain error in json
* improve pairedItem
* fix random data returning object results
* fix nanoId length typo
* update pnpm-lock file
---------
Co-authored-by: Marcus <marcus@n8n.io>
* fix(editor): Remove setup source control CTA button
* fix(editor): Remove setup source control CTA button
---------
Co-authored-by: Michael Auerswald <michael.auerswald@gmail.com>
Co-authored-by: Marcus <marcus@n8n.io>
* fix(editor): Update source control docs links (#6488)
* feat(DebugHelper Node): Fix and include in main app (#6406)
* improve node a bit
* fixing continueOnFail() ton contain error in json
* improve pairedItem
* fix random data returning object results
* fix nanoId length typo
* update pnpm-lock file
---------
Co-authored-by: Marcus <marcus@n8n.io>
* feat(editor): Replace root events with event bus events (no-changelog) (#6454)
* feat: replace root events with event bus events
* fix: prevent cypress from replacing global with globalThis in import path
* feat: remove emitter mixin
* fix: replace component events with event bus
* fix: fix linting issue
* fix: fix breaking expression switch
* chore: prettify ndv e2e suite code
* fix(editor): Update source control docs links
---------
Co-authored-by: Michael Auerswald <michael.auerswald@gmail.com>
Co-authored-by: Marcus <marcus@n8n.io>
Co-authored-by: Alex Grozav <alex@grozav.com>
* fix tag endpoint regex
---------
Co-authored-by: Omar Ajoue <krynble@gmail.com>
Co-authored-by: Iván Ovejero <ivov.src@gmail.com>
Co-authored-by: Romain Minaud <romain.minaud@gmail.com>
Co-authored-by: Csaba Tuncsik <csaba@n8n.io>
Co-authored-by: Marcus <marcus@n8n.io>
Co-authored-by: Alex Grozav <alex@grozav.com>
* remove unnecesary Db re-initialization
this is from before we added `Db.init` in `WorkflowRunnerProcess`
* feat(core): Improved health check
* make health check not care about DB connections
* close DB connections, and shutdown the timer
* refactor: Add deprecation notice for MySQL and MariaDB
* Update packages/cli/src/commands/BaseCommand.ts
Co-authored-by: Cornelius Suermann <cornelius@n8n.io>
---------
Co-authored-by: Cornelius Suermann <cornelius@n8n.io>
If you install a community node with `polling: true`, activating a workflow with that node fails with an error: `WorkflowActivationError: There was a problem activating the workflow: "Could not get parameter "pollTimes"!"`.
You can test this by installing `n8n-nodes-rss-feed-trigger`, creating a workflow with the `RSS Trigger` node, and then trying to activate it. Activation will fail on `master`, but work as expected on this branch.
* use typedi for UserManagementMailer
* use typedi for SamlService
* fix typos
* use typedi for Queue
* use typedi for License
* convert some more code to use typedi
* add typedi
* convert ActiveWorkflowRunner into an injectable service
* convert ExternalHooks into an injectable service
* convert InternalHooks into an injectable service
* convert LoadNodesAndCredentials into an injectable service
* convert NodeTypes and CredentialTypes into an injectable service
* convert ActiveExecutions into an injectable service
* convert WaitTracker into an injectable service
* convert Push into an injectable service
* convert ActiveWebhooks and TestWebhooks into an injectable services
* handle circular references, and log errors when a circular dependency is found
* feat(editor): roll out schema view
* feat(editor): add posthog tracking
* refactor: use composables
* refactor: clean up console log
* refactor: clean up impl
* chore: clean up impl
* fix: fix demo var
* chore: add comment
* refactor: clean up
* chore: wrap error func
* refactor: clean up import
* refactor: make store
* feat: enable rudderstack usebeacon, move event to unload
* chore: clean up alert
* refactor: move tracking from hooks
* fix: reload flags on login
* fix: add func to setup
* fix: clear duplicate import
* chore: add console to tesT
* chore: add console to tesT
* fix: try reload
* chore: randomize instnace id for testing
* chore: randomize instnace id for testing
* chore: add console logs for testing
* chore: move random id to fe
* chore: use query param for testing
* feat: update PostHog api endpoint
* feat: update rs host
* feat: update rs host
* feat: update rs endpoints
* refactor: use api host for BE events as well
* refactor: refactor out posthog client
* feat: add feature flags to login
* feat: add feature flags to login
* feat: get feature flags to work
* feat: add created at to be events
* chore: add todos
* chore: clean up store
* chore: add created at to identify
* feat: add posthog config to settings
* feat: add bootstrapping
* chore: clean up
* chore: fix build
* fix: get dates to work
* fix: get posthog to recognize dates
* chore: refactor
* fix: update back to number
* fix: update key
* fix: get experiment evals to work
* feat: add posthog to signup router
* feat: add feature flags on sign up
* chore: clean up
* fix: fix import
* chore: clean up loading script
* feat: add timeout, fix: script loader
* fix: test timeout and get working on 8080
* refactor: move out posthog
* feat: add experiment tracking
* fix: clear tracked on reset
* fix: fix signup bug
* fix: handle errors when telmetry is disabled
* refactor: remove redundant await
* fix: add back posthog to telemetry
* test: fix test
* test: fix test
* test: add tests for posthog client
* lint: fix
* fix: fix issue with slow decide endpoint
* lint: fix
* lint: fix
* lint: fix
* lint: fix
* chore: address PR feedback
* chore: address PR feedback
* feat: add onboarding experiment
* adds ExecutionEvents view modal to ExecutionList
* fix time rendering and remove wf column
* checks for unfinished executions and fails them
* prevent re-setting stoppedAt for execution
* some cleanup / manually create rundata after crash
* quicksave
* remove Threads lib, log worker rewrite
* cleanup comment
* fix sentry destination return value
* test for tests...
* run tests with single worker
* fix tests
* remove console log
* add endpoint for execution data recovery
* lint cleanup and some refactoring
* fix accidental recursion
* remove cyclic imports
* add rundata recovery to Workflowrunner
* remove comments
* cleanup and refactor
* adds a status field to executions
* setExecutionStatus on queued worker
* fix onWorkflowPostExecute
* set waiting from worker
* get crashed status into frontend
* remove comment
* merge fix
* cleanup
* catch empty rundata in recovery
* refactor IExecutionsSummary and inject nodeExecution Errors
* reduce default event log size to 10mb from 100mb
* add per node execution status
* lint fix
* merge and lint fix
* phrasing change
* improve preview rendering and messaging
* remove debug
* Improve partial rundata recovery
* fix labels
* fix line through
* send manual rundata to ui at crash
* some type and msg push fixes
* improve recovered item rendering in preview
* update workflowStatistics on recover
* merge fix
* review fixes
* merge fix
* notify eventbus when ui is back up
* add a small timeout to make sure the UI is back up
* increase reconnect timeout to 30s
* adjust recover timeout and ui connection lost msg
* do not stop execution in editor after x reconnects
* add executionRecovered push event
* fix recovered connection not green
* remove reconnect toast and merge existing rundata
* merge editor and recovered data for own mode