Story: https://linear.app/n8n/issue/PAY-1188
- Implement Redis hashes on the caching service, based on Micha's work
in #7747, adapted from `node-cache-manager-ioredis-yet`. Optimize
workflow ownership lookups and manual webhook lookups with Redis hashes.
- Simplify the caching service by removing all currently unused methods
and options: `enable`, `disable`, `getCache`, `keys`, `keyValues`,
`refreshFunctionEach`, `refreshFunctionMany`, `refreshTtl`, etc.
- Remove the flag `N8N_CACHE_ENABLED`. Currently some features on
`master` are broken with caching disabled, and test webhooks now rely
entirely on caching, for multi-main setup support. We originally
introduced this flag to protect against excessive memory usage, but
total cache usage is low enough that we decided to drop this setting.
Apparently this flag was also never documented.
- Overall caching service refactor: use generics, reduce branching, add
discriminants for cache kinds for better type safety, type caching
events, improve readability, remove outdated docs, etc. Also refactor
and expand caching service tests.
Follow-up to: https://github.com/n8n-io/n8n/pull/8176
---------
Co-authored-by: Michael Auerswald <michael.auerswald@gmail.com>
Add generic N8N_GRACEFUL_SHUTDOWN_TIMEOUT which controls how long n8n
process will wait for graceful exit before exitting forcefully. This
variables replaces the QUEUE_WORKER_TIMEOUT variable that was used for
worker process.
DEPRECATED: QUEUE_WORKER_TIMEOUT deprected
QUEUE_WORKER_TIMEOUT environment variable has been replaced with
N8N_GRACEFUL_SHUTDOWN_TIMEOUT.
If the process doesn't shutdown within a time limit, exit with error
code.
1. conceptually something timing out is an error.
2. on successful exit we close down the DB connection gracefully. On an
exit timeout we rather not do that, since it will wait for any active
connections to close and would possible block the exit.
## Summary
Hashicorp Vault prefers a `LIST` HTTP method to be used when fetching
secrets but not all environments will allow custom http methods through
WAFs. This PR adds `N8N_EXTERNAL_SECRETS_PREFER_GET` which when set to
`true` will use GET instead of LIST to fetch secrets.
## Review / Merge checklist
- [x] PR title and summary are descriptive. **Remember, the title
automatically goes into the changelog. Use `(no-changelog)` otherwise.**
([conventions](https://github.com/n8n-io/n8n/blob/master/.github/pull_request_title_conventions.md))
Github issue / Community forum post (link here to close automatically):
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Co-authored-by: Giulio Andreini <andreini@netseven.it>
## Summary
Adds `N8N_EXTERNAL_SECRETS_UPDATE_INTERVAL` to allow enterprise users to
tweak the update internal for importing new secrets.
If using a config file the value is:
```
"externalSecrets": {
"updateInterval": 300
}
```
#### How to test the change:
1. Run as normal and check that the secret is updated every 5 minutes
2. Set `N8N_EXTERNAL_SECRETS_UPDATE_INTERVAL` to 10
3. Check the secret is reloaded after 10 seconds
## Review / Merge checklist
- [x] PR title and summary are descriptive. **Remember, the title
automatically goes into the changelog. Use `(no-changelog)` otherwise.**
([conventions](https://github.com/n8n-io/n8n/blob/master/.github/pull_request_title_conventions.md))
- [x] [Docs updated](https://github.com/n8n-io/n8n-docs) or follow-up
ticket created.
Ensure all errors in `cli` are `ApplicationError` or children of it and
contain no variables in the message, to continue normalizing all the
errors we report to Sentry
Follow-up to: https://github.com/n8n-io/n8n/pull/7839
Ensure all errors in `cli` inherit from `ApplicationError` to continue
normalizing all the errors we report to Sentry
Follow-up to: https://github.com/n8n-io/n8n/pull/7820
When we upgrade typeorm in #5151, we switched from no pooling to a
default pool-size of 10. This somehow significantly deteriorates the
performance of queries when the application is under load.
Followup to #7566 | Story: https://linear.app/n8n/issue/PAY-926
### Manual workflow activation and deactivation
In a multi-main scenario, if the user manually activates or deactivates
a workflow, the process (whether leader or follower) that handles the
PATCH request and updates its internal state should send a message into
the command channel, so that all other main processes update their
internal state accordingly:
- Add to `ActiveWorkflows` if activating
- Remove from `ActiveWorkflows` if deactivating
- Remove and re-add to `ActiveWorkflows` if the update did not change
activation status.
After updating their internal state, if activating or deactivating, the
recipient main processes should push a message to all connected
frontends so that these can update their stores and so reflect the value
in the UI.
### Workflow activation errors
On failure to activate a workflow, the main instance should record the
error in Redis - main instances should always pull activation errors
from Redis in a multi-main scenario.
### Leadership change
On leadership change...
- The old leader should stop pruning and the new leader should start
pruning.
- The old leader should remove trigger- and poller-based workflows and
the new leader should add them.
Github issue / Community forum post (link here to close automatically):
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <netroy@users.noreply.github.com>
https://linear.app/n8n/issue/PAY-933/set-up-leader-selection-for-multiple-main-instances
- [x] Set up new envs
- [x] Add config and license checks
- [x] Implement `MultiMainInstancePublisher`
- [x] Expand `RedisServicePubSubPublisher` to support
`MultiMainInstancePublisher`
- [x] Init `MultiMainInstancePublisher` on startup and destroy on
shutdown
- [x] Add to sandbox plans
- [x] Test manually
Note: This is only for setup - coordinating in reaction to leadership
changes will come in later PRs.
Github issue / Community forum post (link here to close automatically):
---------
Signed-off-by: Oleg Ivaniv <me@olegivaniv.com>
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
This PR allows users to configure the settings to Bull, possibly
reducing the errors with `maxStalledCount` and other issues, that
usually happen either when a worker crashes or when the event loop is
super busy. Increasing the lease time and the `maxStalledCount` settings
might improve UX.
Github issue / Community forum post (link here to close automatically):
This change ensures that things like `encryptionKey` and `instanceId`
are always available directly where they are needed, instead of passing
them around throughout the code.
This fixes a bug in the pruning (soft-delete). The pruning was a bit too
aggressive, as it also pruned executions that weren't in an end state
yet. This only becomes an issue if there are long-running executions
(e.g. workflow with Wait node) or the prune parameters are set to keep
only a tiny number of executions.
all commands sent between main instance and workers need to contain a
server id to prevent senders from reacting to their own messages,
causing loops
this PR makes sure all sent messages contain a sender id by default as
part of constructing a sending redis client.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Depends on: #7092 | Story:
[PAY-768](https://linear.app/n8n/issue/PAY-768)
This PR:
- Generalizes the `IBinaryDataManager` interface.
- Adjusts `Filesystem.ts` to satisfy the interface.
- Sets up an S3 client stub to be filled in in the next PR.
- Turns `BinaryDataManager` into an injectable service.
- Adjusts the config schema and adds new validators.
Note that the PR looks large but all the main changes are in
`packages/core/src/binaryData`.
Out of scope:
- `BinaryDataManager` (now `BinaryDataService`) and `Filesystem.ts` (now
`fs.client.ts`) were slightly refactored for maintainability, but fully
overhauling them is **not** the focus of this PR, which is meant to
clear the way for the S3 implementation. Future improvements for these
two should include setting up a backwards-compatible dir structure that
makes it easier to locate binary data files to delete, removing
duplication, simplifying cloning methods, using integers for binary data
size instead of `prettyBytes()`, writing tests for existing binary data
logic, etc.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Github issue / Community forum post (link here to close automatically):
---------
Co-authored-by: Omar Ajoue <krynble@gmail.com>
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Based on #7065 | Story: https://linear.app/n8n/issue/PAY-771
n8n on filesystem mode marks binary data to delete on manual execution
deletion, on unsaved execution completion, and on every execution
pruning cycle. We later prune binary data in a separate cycle via these
marker files, based on the configured TTL. In the context of introducing
an S3 client to manage binary data, the filesystem mode's mark-and-prune
setup is too tightly coupled to the general binary data management
client interface.
This PR...
- Ensures the deletion of an execution causes the deletion of any binary
data associated to it. This does away with the need for binary data TTL
and simplifies the filesystem mode's mark-and-prune setup.
- Refactors all execution deletions (including pruning) to cause soft
deletions, hard-deletes soft-deleted executions based on the existing
pruning config, and adjusts execution endpoints to filter out
soft-deleted executions. This reduces DB load, and keeps binary data
around long enough for users to access it when building workflows with
unsaved executions.
- Moves all execution pruning work from an execution lifecycle hook to
`execution.repository.ts`. This keeps related logic in a single place.
- Removes all marking logic from the binary data manager. This
simplifies the interface that the S3 client will meet.
- Adds basic sanity-check tests to pruning logic and execution deletion.
Out of scope:
- Improving existing pruning logic.
- Improving existing execution repository logic.
- Adjusting dir structure for filesystem mode.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
PR adds a new field to the SourceControlPreferences as well as to the
POST parameters for the `source-control/preferences` and
`source-control/generate-key-pair` endpoints. Both now accept an
optional string parameter `keyGeneratorType` of `'ed25519' | 'rsa'`
Calling the `source-control/generate-key-pair` endpoint with the
parameter set, it will also update the stored preferences accordingly
(so that in the future new keys will use the same method)
By default ed25519 is being used. The default may be changed using a new
environment parameter:
`N8N_SOURCECONTROL_DEFAULT_SSH_KEY_TYPE` which can be `rsa` or `ed25519`
RSA keys are generated with a length of 4096 bytes.
- For a saved execution, we write to disk binary data and metadata.
These two are only ever deleted via `POST /executions/delete`. No marker
file, so untouched by pruning.
- For an unsaved execution, we write to disk binary data, binary data
metadata, and a marker file at `/meta`. We later delete all three during
pruning.
- The third flow is legacy. Currently, if the execution is unsaved, we
actually store it in the DB while running the workflow and immediately
after the workflow is finished during the `onWorkflowPostExecute()` hook
we delete that execution, so the second flow applies. But formerly, we
did not store unsaved executions in the DB ("ephemeral executions") and
so we needed to write a marker file at `/persistMeta` so that, if the
ephemeral execution crashed after the step where binary data was stored,
we had a way to later delete its associated dangling binary data via a
second pruning cycle, and if the ephemeral execution succeeded, then we
immediately cleaned up the marker file at `/persistMeta` during the
`onWorkflowPostExecute()` hook.
This creation and cleanup at `/persistMeta` is still happening, but this
third flow no longer has a purpose, as we now store unsaved executions
in the DB and delete them immediately after. Hence the third flow can be
removed.