In a multi-main setup, we have the following issue. The user's client
connects to main A and runs a test webhook, so main A starts listening
for a webhook call. A third-party service sends a request to the test
webhook URL. The request is forwarded by the load balancer to main B,
who is not listening for this webhook call. Therefore, the webhook call
is unhandled.
To start addressing this, cache test webhook registrations, using Redis
for queue mode and in-memory for regular mode. When the third-party
service sends a request to the test webhook URL, the request is
forwarded by the load balancer to main B, who fetches test webhooks from
the cache and, if it finds a match, executes the test webhook. This
should be transparent - test webhook behavior should remain the same as
so far.
Notes:
- Test webhook timeouts are not cached. A timeout is only relevant to
the process it was created in, so another process retrieving from Redis
a "foreign" timeout will be unable to act on it. A timeout also has
circular references, so `cache-manager-ioredis-yet` is unable to
serialize it.
- In a single-main scenario, the timeout remains in the single process
and is cleared on test webhook expiration, successful execution, and
manual cancellation - all as usual.
- In a multi-main scenario, we will need to have the process who
received the webhook call send a message to the process who created the
webhook directing this originating process to clear the timeout. This
will likely be implemented via execution lifecycle hooks and Redis
channel messages checking session ID. This implementation is out of
scope for this PR and will come next.
- Additional data in test webhooks is not cached. From what I can tell,
additional data is not needed for test webhooks to be executed.
Additional data also has circular references, so
`cache-manager-ioredis-yet` is unable to serialize it.
Follow-up to: #8155
- Move webhook, poller and trigger activation logs closer to activation
event
- Enrich response of `/debug/multi-main-setup`
- Ensure workflow updates broadcast activation state changes only if
state changed
- Fix bug on workflow activation after leadership change
- Ensure debug controller is not available in production
---------
Co-authored-by: Omar Ajoue <krynble@gmail.com>
## Summary
A circular dependency between `WorkflowService` and
`ActiveWorkflowRunner` is sometimes causing `this.activeWorkflowRunner`
to be `undefined` in `WorkflowService`.
Breaking this circular dependency should hopefully fix this issue.
## Related tickets and issues
#8122
## Review / Merge checklist
- [x] PR title and summary are descriptive
- [ ] Tests included
Remove duplication, improve readability, and expand tests for
`TestWebhooks.ts` - in anticipation for storing test webhooks in Redis.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
## Summary
We accidentally made some functions `async` in
https://github.com/n8n-io/n8n/pull/7846
This PR reverts that change.
## Review / Merge checklist
- [x] PR title and summary are descriptive.
Refactor static workflow service classes into DI-compatible classes
Context: https://n8nio.slack.com/archives/C069HS026UF/p1702466571648889
Up next:
- Inject dependencies into workflow services
- Consolidate workflow controllers into one
- Make workflow controller injectable
- Inject dependencies into workflow controller
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
When performing actions such as renaming a workflow or updating its
settings, n8n errors with "Failed to save workflow version" in the
console although the saving process was successful. We are now correctly
checking whether `nodes` and `connections` exist and only then save a
snapshot.
Github issue / Community forum post (link here to close automatically):
## Summary
Ensure `ownedBy` and `sharedWith` are present and uniform for
credentials and workflows.
Details in story: https://linear.app/n8n/issue/PAY-987
Ensure all errors in `cli` are `ApplicationError` or children of it and
contain no variables in the message, to continue normalizing all the
errors we report to Sentry
Follow-up to: https://github.com/n8n-io/n8n/pull/7839
Ensure all errors in `cli` inherit from `ApplicationError` to continue
normalizing all the errors we report to Sentry
Follow-up to: https://github.com/n8n-io/n8n/pull/7820
Followup to #7566 | Story: https://linear.app/n8n/issue/PAY-926
### Manual workflow activation and deactivation
In a multi-main scenario, if the user manually activates or deactivates
a workflow, the process (whether leader or follower) that handles the
PATCH request and updates its internal state should send a message into
the command channel, so that all other main processes update their
internal state accordingly:
- Add to `ActiveWorkflows` if activating
- Remove from `ActiveWorkflows` if deactivating
- Remove and re-add to `ActiveWorkflows` if the update did not change
activation status.
After updating their internal state, if activating or deactivating, the
recipient main processes should push a message to all connected
frontends so that these can update their stores and so reflect the value
in the UI.
### Workflow activation errors
On failure to activate a workflow, the main instance should record the
error in Redis - main instances should always pull activation errors
from Redis in a multi-main scenario.
### Leadership change
On leadership change...
- The old leader should stop pruning and the new leader should start
pruning.
- The old leader should remove trigger- and poller-based workflows and
the new leader should add them.
Depends on: https://github.com/n8n-io/n8n/pull/7195 | Story:
[PAY-837](https://linear.app/n8n/issue/PAY-837/implement-object-store-manager-for-binary-data)
This PR includes `workflowId` in binary data writes so that the S3
manager can support this filepath structure
`/workflows/{workflowId}/executions/{executionId}/binaryData/{binaryFilename}`
to easily delete binary data for workflows. Also all binary data service
and manager methods that take `workflowId` and `executionId` are made
consistent in arg order.
Note: `workflowId` is included in filesystem mode for compatibility with
the common interface, but `workflowId` will remain unused by filesystem
mode until we decide to restructure how this mode stores data.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
In scope:
- Consolidate `WorkflowService.getMany()`.
- Support non-entity field `ownedBy` for `select`.
- Support `tags` for `filter`.
- Move `addOwnerId` to `OwnershipService`.
- Remove unneeded check for `filter.id`.
- Simplify DTO validation for `filter` and `select`.
- Expand tests for `GET /workflows`.
Workflow list query DTOs:
```
filter → name, active, tags
select → id, name, active, tags, createdAt, updatedAt, versionId, ownedBy
```
Out of scope:
- Migrate `shared_workflow.roleId` and `shared_credential.roleId` to
string IDs.
- Refactor `WorkflowHelpers.getSharedWorkflowIds()`.