# Motivation
In Queue mode, finished executions would cause the main instance to
always pull all execution data from the database, unflatten it and then
use it to send out event log events and telemetry events, as well as
required returns to Respond to Webhook nodes etc.
This could cause OOM errors when the data was large, since it had to be
fully unpacked and transformed on the main instance’s side, using up a
lot of memory (and time).
This PR attempts to limit this behaviour to only happen in those
required cases where the data has to be forwarded to some waiting
webhook, for example.
# Changes
Execution data is only required in cases, where the active execution has
a `postExecutePromise` attached to it. These usually forward the data to
some other endpoint (e.g. a listening webhook connection).
By adding a helper `getPostExecutePromiseCount()`, we can decide that in
cases where there is nothing listening at all, there is no reason to
pull the data on the main instance.
Previously, there would always be postExecutePromises because the
telemetry events were called. Now, these have been moved into the
workers, which have been given the various InternalHooks calls to their
hook function arrays, so they themselves issue these telemetry and event
calls.
This results in all event log messages to now be logged on the worker’s
event log, as well as the worker’s eventbus being the one to send out
the events to destinations. The main event log does…pretty much nothing.
We are not logging executions on the main event log any more, because
this would require all events to be replicated 1:1 from the workers to
the main instance(s) (this IS possible and implemented, see the worker’s
`replicateToRedisEventLogFunction` - but it is not enabled to reduce the
amount of traffic over redis).
Partial events in the main log could confuse the recovery process and
would result in, ironically, the recovery corrupting the execution data
by considering them crashed.
# Refactor
I have also used the opportunity to reduce duplicate code and move some
of the hook functionality into
`packages/cli/src/executionLifecycleHooks/shared/sharedHookFunctions.ts`
in preparation for a future full refactor of the hooks
This PR adds new endpoints to the REST API:
`/orchestration/worker/status` and `/orchestration/worker/id`
Currently these just trigger the return of status / ids from the workers
via the redis back channel, this still needs to be handled and passed
through to the frontend.
It also adds the eventbus to each worker, and triggers a reload of those
eventbus instances when the configuration changes on the main instances.
Until https://github.com/n8n-io/n8n/pull/7061 we had an edge case where
a manual unsaved workflow when run creates an orphan execution, i.e. a
saved execution not pointing to any workflow. This execution is only
ever visible to the instance owner (even if triggered by a member), and
is wrongly stored as unfinished and crashed. This PR enforces that the
DB disallows any such executions from making it into the DB.
This is needed also for the S3 client, which will include the
`workflowId` in the path-like filename.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <aditya@netroy.in>
- For a saved execution, we write to disk binary data and metadata.
These two are only ever deleted via `POST /executions/delete`. No marker
file, so untouched by pruning.
- For an unsaved execution, we write to disk binary data, binary data
metadata, and a marker file at `/meta`. We later delete all three during
pruning.
- The third flow is legacy. Currently, if the execution is unsaved, we
actually store it in the DB while running the workflow and immediately
after the workflow is finished during the `onWorkflowPostExecute()` hook
we delete that execution, so the second flow applies. But formerly, we
did not store unsaved executions in the DB ("ephemeral executions") and
so we needed to write a marker file at `/persistMeta` so that, if the
ephemeral execution crashed after the step where binary data was stored,
we had a way to later delete its associated dangling binary data via a
second pruning cycle, and if the ephemeral execution succeeded, then we
immediately cleaned up the marker file at `/persistMeta` during the
`onWorkflowPostExecute()` hook.
This creation and cleanup at `/persistMeta` is still happening, but this
third flow no longer has a purpose, as we now store unsaved executions
in the DB and delete them immediately after. Hence the third flow can be
removed.
Github issue / Community forum post (link here to close automatically):
For the upcoming workflow history feature, we're creating the necessary
database tables.
Also changes the schema for Postgres so the versionId column is now
properly a UUID. The `using` statement prevents losing data, basically
converting the strings to UUIDs.
---------
Co-authored-by: कारतोफ्फेलस्क्रिप्ट™ <netroy@users.noreply.github.com>
When ever we have migrations that use `.addColumn` or `.dropColumn`,
typeorm recreates tables for sqlite. so, we need to disable foreign key
enforcement for sqlite, or else data in some tables can get deleted
because of `ON DELETE CASCADE`
[This has happened in the
past](https://github.com/n8n-io/n8n/pull/6739), and we should really
come up with a way to prevent this from happening again.
---------
Signed-off-by: Oleg Ivaniv <me@olegivaniv.com>
Co-authored-by: Oleg Ivaniv <me@olegivaniv.com>