Followup to #7566 | Story: https://linear.app/n8n/issue/PAY-926
### Manual workflow activation and deactivation
In a multi-main scenario, if the user manually activates or deactivates
a workflow, the process (whether leader or follower) that handles the
PATCH request and updates its internal state should send a message into
the command channel, so that all other main processes update their
internal state accordingly:
- Add to `ActiveWorkflows` if activating
- Remove from `ActiveWorkflows` if deactivating
- Remove and re-add to `ActiveWorkflows` if the update did not change
activation status.
After updating their internal state, if activating or deactivating, the
recipient main processes should push a message to all connected
frontends so that these can update their stores and so reflect the value
in the UI.
### Workflow activation errors
On failure to activate a workflow, the main instance should record the
error in Redis - main instances should always pull activation errors
from Redis in a multi-main scenario.
### Leadership change
On leadership change...
- The old leader should stop pruning and the new leader should start
pruning.
- The old leader should remove trigger- and poller-based workflows and
the new leader should add them.
https://linear.app/n8n/issue/PAY-933/set-up-leader-selection-for-multiple-main-instances
- [x] Set up new envs
- [x] Add config and license checks
- [x] Implement `MultiMainInstancePublisher`
- [x] Expand `RedisServicePubSubPublisher` to support
`MultiMainInstancePublisher`
- [x] Init `MultiMainInstancePublisher` on startup and destroy on
shutdown
- [x] Add to sandbox plans
- [x] Test manually
Note: This is only for setup - coordinating in reaction to leadership
changes will come in later PRs.
This PR adds a message for queue mode which triggers an external secrets
provider reload inside the workers if the configuration has changed on
the main instance.
It also refactors some of the message handler code to remove cyclic
dependencies, as well as remove unnecessary duplicate redis clients
inside services (thanks to no more cyclic deps)
all commands sent between main instance and workers need to contain a
server id to prevent senders from reacting to their own messages,
causing loops
this PR makes sure all sent messages contain a sender id by default as
part of constructing a sending redis client.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
This PR implements the updated license SDK so that worker and webhook
instances do not auto-renew licenses any more.
Instead, they receive a `reloadLicense` command via the Redis client
that will fetch the updated license after it was saved on the main
instance
This also contains some refactoring with moving redis sub and pub
clients into the event bus directly, to prevent cyclic dependency
issues.
This PR adds new endpoints to the REST API:
`/orchestration/worker/status` and `/orchestration/worker/id`
Currently these just trigger the return of status / ids from the workers
via the redis back channel, this still needs to be handled and passed
through to the frontend.
It also adds the eventbus to each worker, and triggers a reload of those
eventbus instances when the configuration changes on the main instances.
Issue: during startup, unfinished executions trigger a recovery process
that, under certain circumstances, can in itself crash the instance
(e.g. by running our of memory), resulting in an infinite recovery loop
This PR aims to change this behaviour by writing a flag file when the
recovery process starts, and removing it when it finishes. In the case
of a crash, this flag will persist and upon the next attempt, the
recovery will instead do the absolute minimal (marking executions as
'crashed'), without attempting any 'crashable' actions.
* adds ExecutionEvents view modal to ExecutionList
* fix time rendering and remove wf column
* checks for unfinished executions and fails them
* prevent re-setting stoppedAt for execution
* some cleanup / manually create rundata after crash
* quicksave
* remove Threads lib, log worker rewrite
* cleanup comment
* fix sentry destination return value
* test for tests...
* run tests with single worker
* fix tests
* remove console log
* add endpoint for execution data recovery
* lint cleanup and some refactoring
* fix accidental recursion
* remove cyclic imports
* add rundata recovery to Workflowrunner
* remove comments
* cleanup and refactor
* adds a status field to executions
* setExecutionStatus on queued worker
* fix onWorkflowPostExecute
* set waiting from worker
* get crashed status into frontend
* remove comment
* merge fix
* cleanup
* catch empty rundata in recovery
* refactor IExecutionsSummary and inject nodeExecution Errors
* reduce default event log size to 10mb from 100mb
* add per node execution status
* lint fix
* merge and lint fix
* phrasing change
* improve preview rendering and messaging
* remove debug
* Improve partial rundata recovery
* fix labels
* fix line through
* send manual rundata to ui at crash
* some type and msg push fixes
* improve recovered item rendering in preview
* update workflowStatistics on recover
* merge fix
* review fixes
* merge fix
* notify eventbus when ui is back up
* add a small timeout to make sure the UI is back up
* increase reconnect timeout to 30s
* adjust recover timeout and ui connection lost msg
* do not stop execution in editor after x reconnects
* add executionRecovered push event
* fix recovered connection not green
* remove reconnect toast and merge existing rundata
* merge editor and recovered data for own mode
* create prometheus metrics from events
* feat(core): Add more Prometheus metrics (experimental) (#5187)
* refactor(core): Add Prometheus labels to relevant metrics
* feat(core): Add more Prometheus metrics (experimental)
* add 'v' prefix to value of version label
Co-authored-by: Cornelius Suermann <cornelius@n8n.io>
* adds ExecutionEvents view modal to ExecutionList
* fix time rendering and remove wf column
* checks for unfinished executions and fails them
* prevent re-setting stoppedAt for execution
* removing UI changes but keeping eventbus fixes
* remove comment