- Enable two-way communication with web sockets
- Enable sending push messages to specific users
- Add collaboration service for managing active users for workflow
Missing things:
- State is currently kept only in memory, making this not work in
multi-master setups
- Removing a user from active users in situations where they go inactive
or we miss the "workflow closed" message
- I think a timer based solution for this would cover most edge cases.
I.e. have FE ping every X minutes, BE removes the user unless they have
received a ping in Y minutes, where Y > X
- FE changes to be added later by @MiloradFilipovic
Github issue / Community forum post (link here to close automatically):
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Story: https://linear.app/n8n/issue/PAY-926
This PR coordinates workflow activation on instance startup and on
leadership change in multiple main scenario in the internal API. Part 3
on manual workflow activation and deactivation will be a separate PR.
### Part 1: Instance startup
In multi-main scenario, on starting an instance...
- [x] If the instance is the leader, it should add webhooks, triggers
and pollers.
- [x] If the instance is the follower, it should not add webhooks,
triggers or pollers.
- [x] Unit tests.
### Part 2: Leadership change
In multi-main scenario, if the main instance leader dies…
- [x] The new main instance leader must activate all trigger- and
poller-based workflows, excluding webhook-based workflows.
- [x] The old main instance leader must deactivate all trigger- and
poller-based workflows, excluding webhook-based workflows.
- [x] Unit tests.
To test, start two instances and check behavior on startup and
leadership change:
```
EXECUTIONS_MODE=queue N8N_LEADER_SELECTION_ENABLED=true N8N_LICENSE_TENANT_ID=... N8N_LICENSE_ACTIVATION_KEY=... N8N_LOG_LEVEL=debug npm run start
EXECUTIONS_MODE=queue N8N_LEADER_SELECTION_ENABLED=true N8N_LICENSE_TENANT_ID=... N8N_LICENSE_ACTIVATION_KEY=... N8N_LOG_LEVEL=debug N8N_PORT=5679 npm run start
```
Due to a change, during the credentials import command, the core's
Credential object is being called through its prototype. This caused the
Credential's cipher variable to not be set, thus no cipher service being
available during import. This fix catches this edge case and provides a
fix.
https://linear.app/n8n/issue/PAY-933/set-up-leader-selection-for-multiple-main-instances
- [x] Set up new envs
- [x] Add config and license checks
- [x] Implement `MultiMainInstancePublisher`
- [x] Expand `RedisServicePubSubPublisher` to support
`MultiMainInstancePublisher`
- [x] Init `MultiMainInstancePublisher` on startup and destroy on
shutdown
- [x] Add to sandbox plans
- [x] Test manually
Note: This is only for setup - coordinating in reaction to leadership
changes will come in later PRs.
This PR converts the hard-deletion interval to a timeout:
- to prevent the interval from not being restored when hard deletion
throws, and
- to prevent a long-running hard deletion from leading to duplicate
deletions.
This change ensures that things like `encryptionKey` and `instanceId`
are always available directly where they are needed, instead of passing
them around throughout the code.
This is related to an issue with how Bull handles stalled jobs, see
https://github.com/OptimalBits/bull/issues/1415 for reference.
CPU intensive workflows can in certain cases take a long while to finish
up, thereby blocking the thread and causing Bull queue to think the job
has stalled, even though it finished successfully. In these cases the
error handling could then overwrite the successful execution data with
the error message.
This fixes a bug in the pruning (soft-delete). The pruning was a bit too
aggressive, as it also pruned executions that weren't in an end state
yet. This only becomes an issue if there are long-running executions
(e.g. workflow with Wait node) or the prune parameters are set to keep
only a tiny number of executions.
This PR adds a message for queue mode which triggers an external secrets
provider reload inside the workers if the configuration has changed on
the main instance.
It also refactors some of the message handler code to remove cyclic
dependencies, as well as remove unnecessary duplicate redis clients
inside services (thanks to no more cyclic deps)
all commands sent between main instance and workers need to contain a
server id to prevent senders from reacting to their own messages,
causing loops
this PR makes sure all sent messages contain a sender id by default as
part of constructing a sending redis client.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Depends on: https://github.com/n8n-io/n8n/pull/7195 | Story:
[PAY-837](https://linear.app/n8n/issue/PAY-837/implement-object-store-manager-for-binary-data)
This PR includes `workflowId` in binary data writes so that the S3
manager can support this filepath structure
`/workflows/{workflowId}/executions/{executionId}/binaryData/{binaryFilename}`
to easily delete binary data for workflows. Also all binary data service
and manager methods that take `workflowId` and `executionId` are made
consistent in arg order.
Note: `workflowId` is included in filesystem mode for compatibility with
the common interface, but `workflowId` will remain unused by filesystem
mode until we decide to restructure how this mode stores data.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>