Ensure all errors in `cli` are `ApplicationError` or children of it and
contain no variables in the message, to continue normalizing all the
errors we report to Sentry
Follow-up to: https://github.com/n8n-io/n8n/pull/7839
This PR allows users to configure the settings to Bull, possibly
reducing the errors with `maxStalledCount` and other issues, that
usually happen either when a worker crashes or when the event loop is
super busy. Increasing the lease time and the `maxStalledCount` settings
might improve UX.
Github issue / Community forum post (link here to close automatically):
In a rare edge case an undefined queue could be returned - this should
not happen and now an error is thrown.
Also using the opportunity to remove a cyclic dependency from the Queue.
# Motivation
In Queue mode, finished executions would cause the main instance to
always pull all execution data from the database, unflatten it and then
use it to send out event log events and telemetry events, as well as
required returns to Respond to Webhook nodes etc.
This could cause OOM errors when the data was large, since it had to be
fully unpacked and transformed on the main instance’s side, using up a
lot of memory (and time).
This PR attempts to limit this behaviour to only happen in those
required cases where the data has to be forwarded to some waiting
webhook, for example.
# Changes
Execution data is only required in cases, where the active execution has
a `postExecutePromise` attached to it. These usually forward the data to
some other endpoint (e.g. a listening webhook connection).
By adding a helper `getPostExecutePromiseCount()`, we can decide that in
cases where there is nothing listening at all, there is no reason to
pull the data on the main instance.
Previously, there would always be postExecutePromises because the
telemetry events were called. Now, these have been moved into the
workers, which have been given the various InternalHooks calls to their
hook function arrays, so they themselves issue these telemetry and event
calls.
This results in all event log messages to now be logged on the worker’s
event log, as well as the worker’s eventbus being the one to send out
the events to destinations. The main event log does…pretty much nothing.
We are not logging executions on the main event log any more, because
this would require all events to be replicated 1:1 from the workers to
the main instance(s) (this IS possible and implemented, see the worker’s
`replicateToRedisEventLogFunction` - but it is not enabled to reduce the
amount of traffic over redis).
Partial events in the main log could confuse the recovery process and
would result in, ironically, the recovery corrupting the execution data
by considering them crashed.
# Refactor
I have also used the opportunity to reduce duplicate code and move some
of the hook functionality into
`packages/cli/src/executionLifecycleHooks/shared/sharedHookFunctions.ts`
in preparation for a future full refactor of the hooks
* use typedi for UserManagementMailer
* use typedi for SamlService
* fix typos
* use typedi for Queue
* use typedi for License
* convert some more code to use typedi
* add typedi
* convert ActiveWorkflowRunner into an injectable service
* convert ExternalHooks into an injectable service
* convert InternalHooks into an injectable service
* convert LoadNodesAndCredentials into an injectable service
* convert NodeTypes and CredentialTypes into an injectable service
* convert ActiveExecutions into an injectable service
* convert WaitTracker into an injectable service
* convert Push into an injectable service
* convert ActiveWebhooks and TestWebhooks into an injectable services
* handle circular references, and log errors when a circular dependency is found
* ✨ Add Webhook-Response-Node
* ⚡ Replace callback function with promise
* ✨ Add support for Bull and binary-data
* ✨ Add string response option
* ⚡ Remove some comments
* ✨ Make more generically possible & fix issue multi call in
queue mode
* ⚡ Fix startup and eslint issues
* ⚡ Improvements to webhook response node and functionality
* ⚡ Replace data with more generic type
* ⚡ Make statusMessage optional
* ⚡ Change parameter order
* ⚡ Move Response Code underneath options
* ⚡ Hide Response Code on Webhook node if mode responseNode got selected
* ⚡ Minor improvements
* ⚡ Add missing file and fix lint issue
* ⚡ Fix some node linting issues
* ⚡ Apply feedback
* ⚡ Minor improvements
* Unify execution ID across executions
* Fix indentation and improved comments
* WIP: saving data after each node execution
* Added on/off to save data after each step, saving initial data and retries working
* Fixing lint issues
* Fixing more lint issues
* ✨ Add bull to execute workflows
* 👕 Fix lint issue
* ⚡ Add graceful shutdown to worker
* ⚡ Add loading staticData to worker
* 👕 Fix lint issue
* ⚡ Fix import
* Changed tables metadata to add nullable to stoppedAt
* Reload database on migration run
* Fixed reloading database schema for sqlite by reconnecting and fixing postgres migration
* Added checks to Redis and exiting process if connection is unavailable
* Fixing error with new installations
* Fix issue with data not being sent back to browser on manual executions with defined destination
* Merging bull and unify execution id branch fixes
* Main process will now get execution success from database instead of redis
* Omit execution duration if execution did not stop
* Fix issue with execution list displaying inconsistant information information while a workflow is running
* Remove unused hooks to clarify for developers that these wont run in queue mode
* Added active pooling to help recover from Redis crashes
* Lint issues
* Changing default polling interval to 60 seconds
* Removed unnecessary attributes from bull job
* Added webhooks service and setting to disable webhooks from main process
* Fixed executions list when running with queues. Now we get the list of actively running workflows from bull.
* Add option to disable deregistration of webhooks on shutdown
* Rename WEBHOOK_TUNNEL_URL to WEBHOOK_URL keeping backwards compat.
* Added auto refresh to executions list
* Improvements to workflow stop process when running with queues
* Refactor queue system to use a singleton and avoid code duplication
* Improve comments and remove unnecessary commits
* Remove console.log from vue file
* Blocking webhook process to run without queues
* Handling execution stop graciously when possible
* Removing initialization of all workflows from webhook process
* Refactoring code to remove code duplication for job stop
* Improved execution list to be more fluid and less intrusive
* Fixing workflow name for current executions when auto updating
* ⚡ Right align autorefresh checkbox
Co-authored-by: Jan Oberhauser <jan.oberhauser@gmail.com>