Story: https://linear.app/n8n/issue/PAY-1188
- Implement Redis hashes on the caching service, based on Micha's work
in #7747, adapted from `node-cache-manager-ioredis-yet`. Optimize
workflow ownership lookups and manual webhook lookups with Redis hashes.
- Simplify the caching service by removing all currently unused methods
and options: `enable`, `disable`, `getCache`, `keys`, `keyValues`,
`refreshFunctionEach`, `refreshFunctionMany`, `refreshTtl`, etc.
- Remove the flag `N8N_CACHE_ENABLED`. Currently some features on
`master` are broken with caching disabled, and test webhooks now rely
entirely on caching, for multi-main setup support. We originally
introduced this flag to protect against excessive memory usage, but
total cache usage is low enough that we decided to drop this setting.
Apparently this flag was also never documented.
- Overall caching service refactor: use generics, reduce branching, add
discriminants for cache kinds for better type safety, type caching
events, improve readability, remove outdated docs, etc. Also refactor
and expand caching service tests.
Follow-up to: https://github.com/n8n-io/n8n/pull/8176
---------
Co-authored-by: Michael Auerswald <michael.auerswald@gmail.com>
In a multi-main setup, we have the following issue. The user's client
connects to main A and runs a test webhook, so main A starts listening
for a webhook call. A third-party service sends a request to the test
webhook URL. The request is forwarded by the load balancer to main B,
who is not listening for this webhook call. Therefore, the webhook call
is unhandled.
To start addressing this, cache test webhook registrations, using Redis
for queue mode and in-memory for regular mode. When the third-party
service sends a request to the test webhook URL, the request is
forwarded by the load balancer to main B, who fetches test webhooks from
the cache and, if it finds a match, executes the test webhook. This
should be transparent - test webhook behavior should remain the same as
so far.
Notes:
- Test webhook timeouts are not cached. A timeout is only relevant to
the process it was created in, so another process retrieving from Redis
a "foreign" timeout will be unable to act on it. A timeout also has
circular references, so `cache-manager-ioredis-yet` is unable to
serialize it.
- In a single-main scenario, the timeout remains in the single process
and is cleared on test webhook expiration, successful execution, and
manual cancellation - all as usual.
- In a multi-main scenario, we will need to have the process who
received the webhook call send a message to the process who created the
webhook directing this originating process to clear the timeout. This
will likely be implemented via execution lifecycle hooks and Redis
channel messages checking session ID. This implementation is out of
scope for this PR and will come next.
- Additional data in test webhooks is not cached. From what I can tell,
additional data is not needed for test webhooks to be executed.
Additional data also has circular references, so
`cache-manager-ioredis-yet` is unable to serialize it.
Follow-up to: #8155
This change kept coming up in #6713, #7773, and #8135.
So this PR moves the existing code without actually changing anything,
to help get rid of some of the circular dependencies.
## Review / Merge checklist
- [x] PR title and summary are descriptive.
## Summary
Add `ShutdownService` and `OnShutdown` decorator for more unified way to
shutdown different components. Use this new way in the following
components:
- HTTP(S) server
- Pruning service
- Push connection
- License
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Github issue / Community forum post (link here to close automatically):
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Co-authored-by: Giulio Andreini <andreini@netseven.it>
I have observed that the next hard deletion timeout is not scheduled if
the `hardDeleteOnPruningCycle` function throws when fetching the data
from the database. That is because the thrown error is not caught and
the `scheduleHardDeletion` method is not called.
This PR moves the call to `scheduleHardDeletion` into the
`scheduleHardDeletion` for better cohesion, and ensures that it is
called even if `hardDeleteOnPruningCycle` throws.
## Summary
Ensure `ownedBy` and `sharedWith` are present and uniform for
credentials and workflows.
Details in story: https://linear.app/n8n/issue/PAY-987
Ensure all errors in `cli` are `ApplicationError` or children of it and
contain no variables in the message, to continue normalizing all the
errors we report to Sentry
Follow-up to: https://github.com/n8n-io/n8n/pull/7839
Ensure all errors in `cli` inherit from `ApplicationError` to continue
normalizing all the errors we report to Sentry
Follow-up to: https://github.com/n8n-io/n8n/pull/7820
https://linear.app/n8n/issue/PAY-985
```
PATCH /users/:id/role
unauthenticated user
✓ should receive 401 (349 ms)
member
✓ should fail to demote owner to member (349 ms)
✓ should fail to demote owner to admin (359 ms)
✓ should fail to demote admin to member (381 ms)
✓ should fail to promote other member to owner (353 ms)
✓ should fail to promote other member to admin (377 ms)
✓ should fail to promote self to admin (354 ms)
✓ should fail to promote self to owner (371 ms)
admin
✓ should receive 400 on invalid payload (351 ms)
✓ should receive 404 on unknown target user (351 ms)
✓ should fail to demote owner to admin (349 ms)
✓ should fail to demote owner to member (347 ms)
✓ should fail to promote member to owner (384 ms)
✓ should fail to promote admin to owner (350 ms)
✓ should be able to demote admin to member (354 ms)
✓ should be able to demote self to member (350 ms)
✓ should be able to promote member to admin (349 ms)
owner
✓ should be able to promote member to admin (349 ms)
✓ should be able to demote admin to member (349 ms)
✓ should fail to demote self to admin (348 ms)
✓ should fail to demote self to member (354 ms)
```
Followup to #7566 | Story: https://linear.app/n8n/issue/PAY-926
### Manual workflow activation and deactivation
In a multi-main scenario, if the user manually activates or deactivates
a workflow, the process (whether leader or follower) that handles the
PATCH request and updates its internal state should send a message into
the command channel, so that all other main processes update their
internal state accordingly:
- Add to `ActiveWorkflows` if activating
- Remove from `ActiveWorkflows` if deactivating
- Remove and re-add to `ActiveWorkflows` if the update did not change
activation status.
After updating their internal state, if activating or deactivating, the
recipient main processes should push a message to all connected
frontends so that these can update their stores and so reflect the value
in the UI.
### Workflow activation errors
On failure to activate a workflow, the main instance should record the
error in Redis - main instances should always pull activation errors
from Redis in a multi-main scenario.
### Leadership change
On leadership change...
- The old leader should stop pruning and the new leader should start
pruning.
- The old leader should remove trigger- and poller-based workflows and
the new leader should add them.
1. Reduce a lot of code duplication
2. Move more endpoints out of `Server.ts`
3. Move all query-param parsing and validation into a middleware to make
the route handlers simpler.
This PR:
- Creates `InvitationController`
- Moves `POST /users` to `POST /invitations` and move related test to
`invitations.api.tests`
- Moves `POST /users/:id` to `POST /invitations/:id/accept` and move
related test to `invitations.api.tests`
- Adjusts FE to use new endpoints
- Moves all the invitation logic to the `UserService`
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
This change expands on the command channel communication introduced
lately between the main instance(s) and the workers. The frontend gets a
new menu entry "Workers" which will, when opened, trigger a regular call
to getStatus from the workers. The workers then respond via their
response channel to the backend, which then pushes the status to the
frontend.
This introduces the use of ChartJS for metrics.
This feature is still in MVP state and thus disabled by default for the
moment.
Story: https://linear.app/n8n/issue/PAY-926
This PR coordinates workflow activation on instance startup and on
leadership change in multiple main scenario in the internal API. Part 3
on manual workflow activation and deactivation will be a separate PR.
### Part 1: Instance startup
In multi-main scenario, on starting an instance...
- [x] If the instance is the leader, it should add webhooks, triggers
and pollers.
- [x] If the instance is the follower, it should not add webhooks,
triggers or pollers.
- [x] Unit tests.
### Part 2: Leadership change
In multi-main scenario, if the main instance leader dies…
- [x] The new main instance leader must activate all trigger- and
poller-based workflows, excluding webhook-based workflows.
- [x] The old main instance leader must deactivate all trigger- and
poller-based workflows, excluding webhook-based workflows.
- [x] Unit tests.
To test, start two instances and check behavior on startup and
leadership change:
```
EXECUTIONS_MODE=queue N8N_LEADER_SELECTION_ENABLED=true N8N_LICENSE_TENANT_ID=... N8N_LICENSE_ACTIVATION_KEY=... N8N_LOG_LEVEL=debug npm run start
EXECUTIONS_MODE=queue N8N_LEADER_SELECTION_ENABLED=true N8N_LICENSE_TENANT_ID=... N8N_LICENSE_ACTIVATION_KEY=... N8N_LOG_LEVEL=debug N8N_PORT=5679 npm run start
```
This PR ensures `MultiMainInstancePublisher` is initialized before
checking if the instance is leader or follower. Followers skip license
init, license check, and pruning start and stop.
https://linear.app/n8n/issue/PAY-933/set-up-leader-selection-for-multiple-main-instances
- [x] Set up new envs
- [x] Add config and license checks
- [x] Implement `MultiMainInstancePublisher`
- [x] Expand `RedisServicePubSubPublisher` to support
`MultiMainInstancePublisher`
- [x] Init `MultiMainInstancePublisher` on startup and destroy on
shutdown
- [x] Add to sandbox plans
- [x] Test manually
Note: This is only for setup - coordinating in reaction to leadership
changes will come in later PRs.
This change ensures that things like `encryptionKey` and `instanceId`
are always available directly where they are needed, instead of passing
them around throughout the code.
This PR adds a message for queue mode which triggers an external secrets
provider reload inside the workers if the configuration has changed on
the main instance.
It also refactors some of the message handler code to remove cyclic
dependencies, as well as remove unnecessary duplicate redis clients
inside services (thanks to no more cyclic deps)
all commands sent between main instance and workers need to contain a
server id to prevent senders from reacting to their own messages,
causing loops
this PR makes sure all sent messages contain a sender id by default as
part of constructing a sending redis client.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
This PR implements the updated license SDK so that worker and webhook
instances do not auto-renew licenses any more.
Instead, they receive a `reloadLicense` command via the Redis client
that will fetch the updated license after it was saved on the main
instance
This also contains some refactoring with moving redis sub and pub
clients into the event bus directly, to prevent cyclic dependency
issues.
This PR adds new endpoints to the REST API:
`/orchestration/worker/status` and `/orchestration/worker/id`
Currently these just trigger the return of status / ids from the workers
via the redis back channel, this still needs to be handled and passed
through to the frontend.
It also adds the eventbus to each worker, and triggers a reload of those
eventbus instances when the configuration changes on the main instances.
In scope:
- Consolidate `WorkflowService.getMany()`.
- Support non-entity field `ownedBy` for `select`.
- Support `tags` for `filter`.
- Move `addOwnerId` to `OwnershipService`.
- Remove unneeded check for `filter.id`.
- Simplify DTO validation for `filter` and `select`.
- Expand tests for `GET /workflows`.
Workflow list query DTOs:
```
filter → name, active, tags
select → id, name, active, tags, createdAt, updatedAt, versionId, ownedBy
```
Out of scope:
- Migrate `shared_workflow.roleId` and `shared_credential.roleId` to
string IDs.
- Refactor `WorkflowHelpers.getSharedWorkflowIds()`.
* fix handle empty keys in cache service
* add test
* add cache mock test
* add simpler mocking, and add tests for all the updated methods
* don't use RedisStore specifically in the mock
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
* refactor: Set up ownership service
* refactor: Specify cache keys and values
* refactor: Replace util with service calls
* test: Mock service in tests
* refactor: Use dependency injection
* test: Write tests
* refactor: Apply feedback from Omar and Micha
* test: Fix tests
* test: Fix missing spot
* refactor: Return user entity from cache
* refactor: More dependency injection!
* use jwt to reset password
* increase expiration time to 1d
* drop user id query string
* refactor
* use service instead of package in tests
* sqlite migration
* postgres migration
* mysql migration
* remove unused properties
* remove userId from FE
* fix test for users.api
* move migration to the common folder
* move type assertion to the jwt.service
* Add jwt secret as a readonly property
* use signData instead of sign in user.controller
* remove base class
* remove base class
* add tests