Story: https://linear.app/n8n/issue/PAY-1188
- Implement Redis hashes on the caching service, based on Micha's work
in #7747, adapted from `node-cache-manager-ioredis-yet`. Optimize
workflow ownership lookups and manual webhook lookups with Redis hashes.
- Simplify the caching service by removing all currently unused methods
and options: `enable`, `disable`, `getCache`, `keys`, `keyValues`,
`refreshFunctionEach`, `refreshFunctionMany`, `refreshTtl`, etc.
- Remove the flag `N8N_CACHE_ENABLED`. Currently some features on
`master` are broken with caching disabled, and test webhooks now rely
entirely on caching, for multi-main setup support. We originally
introduced this flag to protect against excessive memory usage, but
total cache usage is low enough that we decided to drop this setting.
Apparently this flag was also never documented.
- Overall caching service refactor: use generics, reduce branching, add
discriminants for cache kinds for better type safety, type caching
events, improve readability, remove outdated docs, etc. Also refactor
and expand caching service tests.
Follow-up to: https://github.com/n8n-io/n8n/pull/8176
---------
Co-authored-by: Michael Auerswald <michael.auerswald@gmail.com>
Add generic N8N_GRACEFUL_SHUTDOWN_TIMEOUT which controls how long n8n
process will wait for graceful exit before exitting forcefully. This
variables replaces the QUEUE_WORKER_TIMEOUT variable that was used for
worker process.
DEPRECATED: QUEUE_WORKER_TIMEOUT deprected
QUEUE_WORKER_TIMEOUT environment variable has been replaced with
N8N_GRACEFUL_SHUTDOWN_TIMEOUT.
If the process doesn't shutdown within a time limit, exit with error
code.
1. conceptually something timing out is an error.
2. on successful exit we close down the DB connection gracefully. On an
exit timeout we rather not do that, since it will wait for any active
connections to close and would possible block the exit.
## Summary
Hashicorp Vault prefers a `LIST` HTTP method to be used when fetching
secrets but not all environments will allow custom http methods through
WAFs. This PR adds `N8N_EXTERNAL_SECRETS_PREFER_GET` which when set to
`true` will use GET instead of LIST to fetch secrets.
## Review / Merge checklist
- [x] PR title and summary are descriptive. **Remember, the title
automatically goes into the changelog. Use `(no-changelog)` otherwise.**
([conventions](https://github.com/n8n-io/n8n/blob/master/.github/pull_request_title_conventions.md))
Github issue / Community forum post (link here to close automatically):
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Co-authored-by: Giulio Andreini <andreini@netseven.it>
## Summary
Adds `N8N_EXTERNAL_SECRETS_UPDATE_INTERVAL` to allow enterprise users to
tweak the update internal for importing new secrets.
If using a config file the value is:
```
"externalSecrets": {
"updateInterval": 300
}
```
#### How to test the change:
1. Run as normal and check that the secret is updated every 5 minutes
2. Set `N8N_EXTERNAL_SECRETS_UPDATE_INTERVAL` to 10
3. Check the secret is reloaded after 10 seconds
## Review / Merge checklist
- [x] PR title and summary are descriptive. **Remember, the title
automatically goes into the changelog. Use `(no-changelog)` otherwise.**
([conventions](https://github.com/n8n-io/n8n/blob/master/.github/pull_request_title_conventions.md))
- [x] [Docs updated](https://github.com/n8n-io/n8n-docs) or follow-up
ticket created.
When we upgrade typeorm in #5151, we switched from no pooling to a
default pool-size of 10. This somehow significantly deteriorates the
performance of queries when the application is under load.
Followup to #7566 | Story: https://linear.app/n8n/issue/PAY-926
### Manual workflow activation and deactivation
In a multi-main scenario, if the user manually activates or deactivates
a workflow, the process (whether leader or follower) that handles the
PATCH request and updates its internal state should send a message into
the command channel, so that all other main processes update their
internal state accordingly:
- Add to `ActiveWorkflows` if activating
- Remove from `ActiveWorkflows` if deactivating
- Remove and re-add to `ActiveWorkflows` if the update did not change
activation status.
After updating their internal state, if activating or deactivating, the
recipient main processes should push a message to all connected
frontends so that these can update their stores and so reflect the value
in the UI.
### Workflow activation errors
On failure to activate a workflow, the main instance should record the
error in Redis - main instances should always pull activation errors
from Redis in a multi-main scenario.
### Leadership change
On leadership change...
- The old leader should stop pruning and the new leader should start
pruning.
- The old leader should remove trigger- and poller-based workflows and
the new leader should add them.
Github issue / Community forum post (link here to close automatically):
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <netroy@users.noreply.github.com>
https://linear.app/n8n/issue/PAY-933/set-up-leader-selection-for-multiple-main-instances
- [x] Set up new envs
- [x] Add config and license checks
- [x] Implement `MultiMainInstancePublisher`
- [x] Expand `RedisServicePubSubPublisher` to support
`MultiMainInstancePublisher`
- [x] Init `MultiMainInstancePublisher` on startup and destroy on
shutdown
- [x] Add to sandbox plans
- [x] Test manually
Note: This is only for setup - coordinating in reaction to leadership
changes will come in later PRs.
This PR allows users to configure the settings to Bull, possibly
reducing the errors with `maxStalledCount` and other issues, that
usually happen either when a worker crashes or when the event loop is
super busy. Increasing the lease time and the `maxStalledCount` settings
might improve UX.
Github issue / Community forum post (link here to close automatically):
This change ensures that things like `encryptionKey` and `instanceId`
are always available directly where they are needed, instead of passing
them around throughout the code.