## Summary
Extend existing user types in the E2E database. Currently, we have only
owner and member but we need also admin
---------
Co-authored-by: Val <68596159+valya@users.noreply.github.com>
I have observed that the next hard deletion timeout is not scheduled if
the `hardDeleteOnPruningCycle` function throws when fetching the data
from the database. That is because the thrown error is not caught and
the `scheduleHardDeletion` method is not called.
This PR moves the call to `scheduleHardDeletion` into the
`scheduleHardDeletion` for better cohesion, and ensures that it is
called even if `hardDeleteOnPruningCycle` throws.
## Summary
Ensure `ownedBy` and `sharedWith` are present and uniform for
credentials and workflows.
Details in story: https://linear.app/n8n/issue/PAY-987
Ensure all errors in `cli` are `ApplicationError` or children of it and
contain no variables in the message, to continue normalizing all the
errors we report to Sentry
Follow-up to: https://github.com/n8n-io/n8n/pull/7839
extracted out of #7336
---------
Co-authored-by: Jan Oberhauser <jan.oberhauser@gmail.com>
Co-authored-by: Oleg Ivaniv <me@olegivaniv.com>
Co-authored-by: Alex Grozav <alex@grozav.com>
Ensure all errors in `cli` inherit from `ApplicationError` to continue
normalizing all the errors we report to Sentry
Follow-up to: https://github.com/n8n-io/n8n/pull/7820
This PR continues the effort of moving logic inside execution lifecycle
hooks into standalone testable functions, as a stepping stone to
refactoring the hooks themselves.
Keep reporting [path-related
errors](https://n8nio.sentry.io/issues/4649493725) in Sentry but
consolidate them in a single error group.
Also, add `options.extra` as `meta` so they remain visible in debug
logs:
```
2023-11-24T11:50:54.852Z | error | ReportableError: Something went wrong "{ test: 123, file: 'LoggerProxy.js', function: 'exports.error' }"
```
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
https://linear.app/n8n/issue/PAY-985
```
PATCH /users/:id/role
unauthenticated user
✓ should receive 401 (349 ms)
member
✓ should fail to demote owner to member (349 ms)
✓ should fail to demote owner to admin (359 ms)
✓ should fail to demote admin to member (381 ms)
✓ should fail to promote other member to owner (353 ms)
✓ should fail to promote other member to admin (377 ms)
✓ should fail to promote self to admin (354 ms)
✓ should fail to promote self to owner (371 ms)
admin
✓ should receive 400 on invalid payload (351 ms)
✓ should receive 404 on unknown target user (351 ms)
✓ should fail to demote owner to admin (349 ms)
✓ should fail to demote owner to member (347 ms)
✓ should fail to promote member to owner (384 ms)
✓ should fail to promote admin to owner (350 ms)
✓ should be able to demote admin to member (354 ms)
✓ should be able to demote self to member (350 ms)
✓ should be able to promote member to admin (349 ms)
owner
✓ should be able to promote member to admin (349 ms)
✓ should be able to demote admin to member (349 ms)
✓ should fail to demote self to admin (348 ms)
✓ should fail to demote self to member (354 ms)
```
This PR introduces the following changes:
- New Vue stores: `collaborationStore` and `pushConnectionStore`
- Front-end push connection handling overhaul: Keep only a singe
connection open and handle it from the new store
- Add user avatars in the editor header when there are multiple users
working on the same workflow
- Sending a heartbeat event to back-end service periodically to confirm
user is still active
- Back-end overhauls (authored by @tomi):
- Implementing a cleanup procedure that removes inactive users
- Refactoring collaboration service current implementation
---------
Co-authored-by: Tomi Turtiainen <10324676+tomi@users.noreply.github.com>
Validate first and last names before saving them to database. This
should prevent security issue with un-sanitized data that ends up in
emails.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
When we upgrade typeorm in #5151, we switched from no pooling to a
default pool-size of 10. This somehow significantly deteriorates the
performance of queries when the application is under load.
Followup to #7566 | Story: https://linear.app/n8n/issue/PAY-926
### Manual workflow activation and deactivation
In a multi-main scenario, if the user manually activates or deactivates
a workflow, the process (whether leader or follower) that handles the
PATCH request and updates its internal state should send a message into
the command channel, so that all other main processes update their
internal state accordingly:
- Add to `ActiveWorkflows` if activating
- Remove from `ActiveWorkflows` if deactivating
- Remove and re-add to `ActiveWorkflows` if the update did not change
activation status.
After updating their internal state, if activating or deactivating, the
recipient main processes should push a message to all connected
frontends so that these can update their stores and so reflect the value
in the UI.
### Workflow activation errors
On failure to activate a workflow, the main instance should record the
error in Redis - main instances should always pull activation errors
from Redis in a multi-main scenario.
### Leadership change
On leadership change...
- The old leader should stop pruning and the new leader should start
pruning.
- The old leader should remove trigger- and poller-based workflows and
the new leader should add them.
1. Reduce a lot of code duplication
2. Move more endpoints out of `Server.ts`
3. Move all query-param parsing and validation into a middleware to make
the route handlers simpler.
This PR:
- Creates `InvitationController`
- Moves `POST /users` to `POST /invitations` and move related test to
`invitations.api.tests`
- Moves `POST /users/:id` to `POST /invitations/:id/accept` and move
related test to `invitations.api.tests`
- Adjusts FE to use new endpoints
- Moves all the invitation logic to the `UserService`
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Github issue / Community forum post (link here to close automatically):
---------
Co-authored-by: Giulio Andreini <g.andreini@gmail.com>
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
This change expands on the command channel communication introduced
lately between the main instance(s) and the workers. The frontend gets a
new menu entry "Workers" which will, when opened, trigger a regular call
to getStatus from the workers. The workers then respond via their
response channel to the backend, which then pushes the status to the
frontend.
This introduces the use of ChartJS for metrics.
This feature is still in MVP state and thus disabled by default for the
moment.
- Enable two-way communication with web sockets
- Enable sending push messages to specific users
- Add collaboration service for managing active users for workflow
Missing things:
- State is currently kept only in memory, making this not work in
multi-master setups
- Removing a user from active users in situations where they go inactive
or we miss the "workflow closed" message
- I think a timer based solution for this would cover most edge cases.
I.e. have FE ping every X minutes, BE removes the user unless they have
received a ping in Y minutes, where Y > X
- FE changes to be added later by @MiloradFilipovic
Github issue / Community forum post (link here to close automatically):
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Story: https://linear.app/n8n/issue/PAY-926
This PR coordinates workflow activation on instance startup and on
leadership change in multiple main scenario in the internal API. Part 3
on manual workflow activation and deactivation will be a separate PR.
### Part 1: Instance startup
In multi-main scenario, on starting an instance...
- [x] If the instance is the leader, it should add webhooks, triggers
and pollers.
- [x] If the instance is the follower, it should not add webhooks,
triggers or pollers.
- [x] Unit tests.
### Part 2: Leadership change
In multi-main scenario, if the main instance leader dies…
- [x] The new main instance leader must activate all trigger- and
poller-based workflows, excluding webhook-based workflows.
- [x] The old main instance leader must deactivate all trigger- and
poller-based workflows, excluding webhook-based workflows.
- [x] Unit tests.
To test, start two instances and check behavior on startup and
leadership change:
```
EXECUTIONS_MODE=queue N8N_LEADER_SELECTION_ENABLED=true N8N_LICENSE_TENANT_ID=... N8N_LICENSE_ACTIVATION_KEY=... N8N_LOG_LEVEL=debug npm run start
EXECUTIONS_MODE=queue N8N_LEADER_SELECTION_ENABLED=true N8N_LICENSE_TENANT_ID=... N8N_LICENSE_ACTIVATION_KEY=... N8N_LOG_LEVEL=debug N8N_PORT=5679 npm run start
```
This PR ensures `MultiMainInstancePublisher` is initialized before
checking if the instance is leader or follower. Followers skip license
init, license check, and pruning start and stop.
Github issue / Community forum post (link here to close automatically):
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <netroy@users.noreply.github.com>
To help debugging possible issues in startup and migrations, log the
executed migrations with log level 'info', instead of 'debug'.
Github issue / Community forum post (link here to close automatically):
Due to a change, during the credentials import command, the core's
Credential object is being called through its prototype. This caused the
Credential's cipher variable to not be set, thus no cipher service being
available during import. This fix catches this edge case and provides a
fix.
https://linear.app/n8n/issue/PAY-933/set-up-leader-selection-for-multiple-main-instances
- [x] Set up new envs
- [x] Add config and license checks
- [x] Implement `MultiMainInstancePublisher`
- [x] Expand `RedisServicePubSubPublisher` to support
`MultiMainInstancePublisher`
- [x] Init `MultiMainInstancePublisher` on startup and destroy on
shutdown
- [x] Add to sandbox plans
- [x] Test manually
Note: This is only for setup - coordinating in reaction to leadership
changes will come in later PRs.
Github issue / Community forum post (link here to close automatically):
---------
Signed-off-by: Oleg Ivaniv <me@olegivaniv.com>
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
This PR allows users to configure the settings to Bull, possibly
reducing the errors with `maxStalledCount` and other issues, that
usually happen either when a worker crashes or when the event loop is
super busy. Increasing the lease time and the `maxStalledCount` settings
might improve UX.
Github issue / Community forum post (link here to close automatically):
This PR converts the hard-deletion interval to a timeout:
- to prevent the interval from not being restored when hard deletion
throws, and
- to prevent a long-running hard deletion from leading to duplicate
deletions.
Since we do not store which executions produced binary data, for pruning
on S3 we need to query for binary data items for each execution in order
to delete them. To minimize requests to S3, allow the user to skip
pruning requests when setting TTL at bucket level.
This change ensures that things like `encryptionKey` and `instanceId`
are always available directly where they are needed, instead of passing
them around throughout the code.
This is related to an issue with how Bull handles stalled jobs, see
https://github.com/OptimalBits/bull/issues/1415 for reference.
CPU intensive workflows can in certain cases take a long while to finish
up, thereby blocking the thread and causing Bull queue to think the job
has stalled, even though it finished successfully. In these cases the
error handling could then overwrite the successful execution data with
the error message.
In a rare edge case an undefined queue could be returned - this should
not happen and now an error is thrown.
Also using the opportunity to remove a cyclic dependency from the Queue.
This fixes a bug in the pruning (soft-delete). The pruning was a bit too
aggressive, as it also pruned executions that weren't in an end state
yet. This only becomes an issue if there are long-running executions
(e.g. workflow with Wait node) or the prune parameters are set to keep
only a tiny number of executions.
This PR adds a message for queue mode which triggers an external secrets
provider reload inside the workers if the configuration has changed on
the main instance.
It also refactors some of the message handler code to remove cyclic
dependencies, as well as remove unnecessary duplicate redis clients
inside services (thanks to no more cyclic deps)
Depends on https://github.com/n8n-io/n8n/pull/7220 | Story:
[PAY-840](https://linear.app/n8n/issue/PAY-840/introduce-object-store-service-and-manager-for-binary-data)
This PR introduces an object store service for Enterprise edition. Note
that the service is tested but currently unused - it will be integrated
soon as a binary data manager, and later for execution data.
`amazonaws.com` in the host is temporarily hardcoded until we integrate
the service and test against AWS, Cloudflare and Backblaze, in the next
PR.
This is ready for review - the PR it depends on is approved and waiting
for CI.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
all commands sent between main instance and workers need to contain a
server id to prevent senders from reacting to their own messages,
causing loops
this PR makes sure all sent messages contain a sender id by default as
part of constructing a sending redis client.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Depends on: https://github.com/n8n-io/n8n/pull/7195 | Story:
[PAY-837](https://linear.app/n8n/issue/PAY-837/implement-object-store-manager-for-binary-data)
This PR includes `workflowId` in binary data writes so that the S3
manager can support this filepath structure
`/workflows/{workflowId}/executions/{executionId}/binaryData/{binaryFilename}`
to easily delete binary data for workflows. Also all binary data service
and manager methods that take `workflowId` and `executionId` are made
consistent in arg order.
Note: `workflowId` is included in filesystem mode for compatibility with
the common interface, but `workflowId` will remain unused by filesystem
mode until we decide to restructure how this mode stores data.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
Story: [PAY-846](https://linear.app/n8n/issue/PAY-846) | Related:
https://github.com/n8n-io/n8n/pull/7225
For the S3 backend for external storage of binary data and execution
data, the `getAsStream` method in the binary data manager interface used
by FS and S3 will need to become async. This is a breaking change for
nodes-base.