Asynchronous Task Queue

ℹ️ About This Section

INFRAX runs background work through an asynchronous queue. That keeps the interface responsive while long jobs move through Active tasks and Task history.

Section Contents

System Overview
Architecture
Task Lifecycle
Task Priorities
Retry Mechanism
Task States
Active Tasks
Task History
Queue Monitoring
Best Practices

System Overview

The INFRAX task queue handles both user-driven and scheduled background work. It covers jobs such as script execution, agent operations, backup tasks, scheduled monitoring checks, and other work that should not run in the main UI thread.

What the queue provides

Async execution - long jobs move to the background
Persistent storage - queue rows live in the database and survive restarts
Priorities - more important work is picked first
Retries - transient failures can return a task to the queue
Interrupts - running jobs can be stopped from task history
History - every job gets a result card and execution log

✅ Practical effect

You can start a long operation, continue working immediately, and review the result later in Active tasks or Task history.

Architecture

The queue is served by a dedicated background consumer. It pulls tasks from the database, hands them to the right handler, and updates task history and queue state when execution finishes.

Components

1. Queue table

Active tasks are stored in the database. A row tracks:

task ID
task type
queue status
priority
created time and next retry time
retry counter
start time and worker PID
interrupt flag and task payload

2. Consumer service

The service in tasks.consumer.service.php starts InfraxTaskQueueService, registers task handlers, and runs the queue with a fixed worker pool.

it only picks pending tasks that are ready to run
it uses row locks and skip locked so the same job is not claimed twice
it updates worker_pid, started_at, retry_count, and next_try_at
it can recover stale tasks after a worker disappears

3. Task handlers

Execution logic lives in api/Modules/Tasks/Handlers. For example, ExecuteScript creates a task history card, writes progress steps, and renders script results in the modal.

Queue settings

Setting	Value	Meaning
Worker threads	40	Up to 40 tasks can be processed in parallel
Retry delay	30 seconds	A fixed pause before a task becomes eligible again
Retry limit	10	After the tenth failed attempt, the task is removed from the queue
Backoff growth	Disabled	The pause stays constant instead of increasing after each retry
Stale task check	60 seconds	The service periodically checks running/cancelling tasks for dead workers

Task Lifecycle

Each task moves from creation to a final history status. At the queue level it lives as pending, running, or cancelling, and then it becomes a user-facing history state.

Processing stages

1. Creation

A task is created by a user or by the scheduler. The payload usually contains the title, execution parameters, and the initiator. If the scheduler created it, task history shows Scheduler instead of a user.

2. Queueing

The row is stored as pending. For retries, next_try_at is written immediately so workers do not pick the task too early.

3. Claiming

The consumer selects ready tasks by priority, then by next retry time, then by ID. It then marks the row as running, increments the retry counter, and stores the worker PID.

4. Execution

The task-specific handler performs the actual work. Task history is created or refreshed with a Running status and a step-by-step log.

5. Result

The handler eventually returns SUCCESS, FAILED, or RETRY.

6. Cleanup

Successful and failed tasks are removed from the queue and recorded as Completed or Failed. RETRY puts the task back into pending, and an interrupted job ends up as Canceled in history.

ℹ️ Interrupt handling

If a running task is interrupted, the queue finishes cleanup as if the job had completed successfully, then writes the final history state as Canceled. The same rule applies to jobs stopped through the cancel signal.

Task Priorities

Higher numeric values run first. In Active tasks, the priority is shown as a colored badge.

UI label	Numeric range	How it is used
High	10 and above	Top-priority jobs that should run first
Medium	5-9	Regular jobs with a noticeable priority
Low	Below 5	Background work that can wait

✅ Deterministic ordering

When priorities match, the backend compares next_try_at first and then the task ID. That keeps execution order stable and predictable.

Retry Mechanism

Retries are meant for transient problems such as a short network timeout, a busy resource, or a temporary overload that may disappear on the next pass.

Retry parameters

Maximum attempts - 10
Delay between attempts - 30 seconds
Delay mode - fixed, without exponential backoff

How it works

The handler returns RETRY or fails with a temporary error.
retry_count increases when a worker claims the task.
The task stays in the queue as pending and gets a new next_try_at.
After 30 seconds it becomes eligible again.
After 10 failed attempts, the queue removes it and history shows Failed.

⚠️ When retries are not enough

If the root cause is not temporary, the queue will still retry the job until the limit is reached. Check the parameters, data, and logs before submitting the task again.

Situation	Typical outcome
Short network timeout	The next attempt often succeeds
Resource locked by another process	The queue waits for release and tries again
Temporary overload	The queue retries after a pause
Bad input data	Retries do not help and the task ends as Failed

Task States

The UI uses different labels for queue state and history state. A job first lives in the queue, then it gets a final history result.

Queue states

Code	Shown as	Meaning
`pending`	Pending	The task is waiting for a worker
`running`	Running	A worker is currently processing it
`cancelling`	Cancelling	The user requested an interrupt and cleanup is in progress

Task history states

Code	Shown as	Meaning
`InProgress`	Running	The task is still active and the card can refresh in real time
`Done`	Completed	The task finished successfully
`Failed`	Failed	The task exhausted retries or hit a fatal error
`Canceled`	Canceled	The task was stopped by the user or by the cancel signal

Active Tasks

Active tasks shows the live queue. It is the place to check what is waiting, what is already running, and which kind of workload the system is processing right now.

ℹ️ Where to find it

Open Administration → Automation → Active tasks. The table supports search, sorting, and a live view of the current queue state.

What the table shows

Column	What it means
Origin	Source marker: System or User. In the current UI, script execution jobs are tagged as User and the other rows are tagged as System.
ID	Unique queue row ID
Status	`Pending`, `Running`, or `Cancelling`
Task title	The task title, or Untitled if none was set
Created at	When the row was added to the queue
Started at	Execution start time; waiting tasks show a dash
Retry attempts	The counter that increases each time a worker claims the task
Priority	A colored badge: High, Medium, or Low
Type	The internal task code, such as `ExecuteScript`

Filtering and sorting

search by ID
search by task title
filter by created at
search by task type
sort by any visible column

⚠️ What to watch for

If the queue does not shrink and running or cancelling rows hang around, check the consumer service, server load, and backend logs.

Task History

Task history shows completed, running, and interrupted work. It is the main screen for auditing, troubleshooting, and checking the result of background jobs.

Origin filter

User - tasks launched by a user
System - scheduled or automatic tasks
All - the full history list

The current filter behavior is asymmetric: User hides system rows, while System and All currently leave both scheduler and user entries visible.

Table information

Column	What it means
Status	The `TaskHistoryStatus` component shows `Running`, `Completed`, `Failed`, or `Canceled`
Created	Task creation time
Title	The task title or a type-derived fallback title
Initiator	The user name or Scheduler when no initiator is set

Task details card

Clicking a row opens the Task details modal. The card includes:

task ID, title, status, and initiator
network node details when the task has a node_id
Interrupt for tasks that are still running
execution timeline with step-by-step log entries
script results for ExecuteScript jobs

💡 Refresh behavior

The history table refreshes automatically about every 3 seconds. While a task is still running, the details modal polls the server every second until the job reaches a final status.

Queue Monitoring

Queue health is easiest to judge together with the consumer logs. The most useful signals are a growing pending backlog, stuck running/cancelling rows, and repeated errors in history.

Normal signs

tasks move from pending to running and then to a final state without long stalls
most jobs do not need many retries
history is dominated by Completed
interrupts happen only when a user actually requests them

Problem signs

🚨 Signs of overload or failure

the queue does not shrink for a long time
tasks often reach 10 attempts
running or cancelling rows stay visible for too long
history contains many Failed items

What to check

whether the queue consumer service is running
backend logs for the specific task type
database availability
CPU and memory on the server
whether a worker is stuck and needs a restart

Best Practices

For administrators

Operations

Check Active tasks and Task history regularly
Watch consumer logs for repeated failures of the same task type
Make sure the queue service starts automatically after a server reboot
If load grows, investigate the bottleneck before changing worker settings

Performance

avoid launching too many heavy background jobs at the same time
confirm that the database and indexes can keep up with queue pressure
keep an eye on long-running jobs that may be blocked by external dependencies

For users

Working with tasks

do not create a duplicate task while the old one is still visible in Active tasks
if a job takes too long, open its card in Task history
use Interrupt only when the task is genuinely stuck or no longer needed
when a task fails, check the log and parameters before retrying it

✅ Takeaway

The asynchronous queue keeps INFRAX responsive during long operations. If you watch active tasks and task history, queue problems usually show up before they start affecting the rest of the system.