Asynchronous Task Queue

ℹ️ About This Section

INFRAX runs background work through an asynchronous queue. That keeps the interface responsive while long jobs move through Active tasks and Task history.

System Overview

The INFRAX task queue handles both user-driven and scheduled background work. It covers jobs such as script execution, agent operations, backup tasks, scheduled monitoring checks, and other work that should not run in the main UI thread.

What the queue provides

  • Async execution - long jobs move to the background
  • Persistent storage - queue rows live in the database and survive restarts
  • Priorities - more important work is picked first
  • Retries - transient failures can return a task to the queue
  • Interrupts - running jobs can be stopped from task history
  • History - every job gets a result card and execution log
✅ Practical effect

You can start a long operation, continue working immediately, and review the result later in Active tasks or Task history.

Architecture

The queue is served by a dedicated background consumer. It pulls tasks from the database, hands them to the right handler, and updates task history and queue state when execution finishes.

Components

1. Queue table

Active tasks are stored in the database. A row tracks:

  • task ID
  • task type
  • queue status
  • priority
  • created time and next retry time
  • retry counter
  • start time and worker PID
  • interrupt flag and task payload

2. Consumer service

The service in tasks.consumer.service.php starts InfraxTaskQueueService, registers task handlers, and runs the queue with a fixed worker pool.

  • it only picks pending tasks that are ready to run
  • it uses row locks and skip locked so the same job is not claimed twice
  • it updates worker_pid, started_at, retry_count, and next_try_at
  • it can recover stale tasks after a worker disappears

3. Task handlers

Execution logic lives in api/Modules/Tasks/Handlers. For example, ExecuteScript creates a task history card, writes progress steps, and renders script results in the modal.

Queue settings

Setting Value Meaning
Worker threads 40 Up to 40 tasks can be processed in parallel
Retry delay 30 seconds A fixed pause before a task becomes eligible again
Retry limit 10 After the tenth failed attempt, the task is removed from the queue
Backoff growth Disabled The pause stays constant instead of increasing after each retry
Stale task check 60 seconds The service periodically checks running/cancelling tasks for dead workers

Task Lifecycle

Each task moves from creation to a final history status. At the queue level it lives as pending, running, or cancelling, and then it becomes a user-facing history state.

Processing stages

1. Creation

A task is created by a user or by the scheduler. The payload usually contains the title, execution parameters, and the initiator. If the scheduler created it, task history shows Scheduler instead of a user.

2. Queueing

The row is stored as pending. For retries, next_try_at is written immediately so workers do not pick the task too early.

3. Claiming

The consumer selects ready tasks by priority, then by next retry time, then by ID. It then marks the row as running, increments the retry counter, and stores the worker PID.

4. Execution

The task-specific handler performs the actual work. Task history is created or refreshed with a Running status and a step-by-step log.

5. Result

The handler eventually returns SUCCESS, FAILED, or RETRY.

6. Cleanup

Successful and failed tasks are removed from the queue and recorded as Completed or Failed. RETRY puts the task back into pending, and an interrupted job ends up as Canceled in history.

ℹ️ Interrupt handling

If a running task is interrupted, the queue finishes cleanup as if the job had completed successfully, then writes the final history state as Canceled. The same rule applies to jobs stopped through the cancel signal.

Task Priorities

Higher numeric values run first. In Active tasks, the priority is shown as a colored badge.

UI label Numeric range How it is used
High 10 and above Top-priority jobs that should run first
Medium 5-9 Regular jobs with a noticeable priority
Low Below 5 Background work that can wait
✅ Deterministic ordering

When priorities match, the backend compares next_try_at first and then the task ID. That keeps execution order stable and predictable.

Retry Mechanism

Retries are meant for transient problems such as a short network timeout, a busy resource, or a temporary overload that may disappear on the next pass.

Retry parameters

  • Maximum attempts - 10
  • Delay between attempts - 30 seconds
  • Delay mode - fixed, without exponential backoff

How it works

  1. The handler returns RETRY or fails with a temporary error.
  2. retry_count increases when a worker claims the task.
  3. The task stays in the queue as pending and gets a new next_try_at.
  4. After 30 seconds it becomes eligible again.
  5. After 10 failed attempts, the queue removes it and history shows Failed.
⚠️ When retries are not enough

If the root cause is not temporary, the queue will still retry the job until the limit is reached. Check the parameters, data, and logs before submitting the task again.

Situation Typical outcome
Short network timeout The next attempt often succeeds
Resource locked by another process The queue waits for release and tries again
Temporary overload The queue retries after a pause
Bad input data Retries do not help and the task ends as Failed

Task States

The UI uses different labels for queue state and history state. A job first lives in the queue, then it gets a final history result.

Queue states

Code Shown as Meaning
pending Pending The task is waiting for a worker
running Running A worker is currently processing it
cancelling Cancelling The user requested an interrupt and cleanup is in progress

Task history states

Code Shown as Meaning
InProgress Running The task is still active and the card can refresh in real time
Done Completed The task finished successfully
Failed Failed The task exhausted retries or hit a fatal error
Canceled Canceled The task was stopped by the user or by the cancel signal

Active Tasks

Active tasks shows the live queue. It is the place to check what is waiting, what is already running, and which kind of workload the system is processing right now.

ℹ️ Where to find it

Open AdministrationAutomationActive tasks. The table supports search, sorting, and a live view of the current queue state.

What the table shows

Column What it means
Origin Source marker: System or User. In the current UI, script execution jobs are tagged as User and the other rows are tagged as System.
ID Unique queue row ID
Status Pending, Running, or Cancelling
Task title The task title, or Untitled if none was set
Created at When the row was added to the queue
Started at Execution start time; waiting tasks show a dash
Retry attempts The counter that increases each time a worker claims the task
Priority A colored badge: High, Medium, or Low
Type The internal task code, such as ExecuteScript

Filtering and sorting

  • search by ID
  • search by task title
  • filter by created at
  • search by task type
  • sort by any visible column
⚠️ What to watch for

If the queue does not shrink and running or cancelling rows hang around, check the consumer service, server load, and backend logs.

Task History

Task history shows completed, running, and interrupted work. It is the main screen for auditing, troubleshooting, and checking the result of background jobs.

Origin filter

  • User - tasks launched by a user
  • System - scheduled or automatic tasks
  • All - the full history list

The current filter behavior is asymmetric: User hides system rows, while System and All currently leave both scheduler and user entries visible.

Table information

Column What it means
Status The TaskHistoryStatus component shows Running, Completed, Failed, or Canceled
Created Task creation time
Title The task title or a type-derived fallback title
Initiator The user name or Scheduler when no initiator is set

Task details card

Clicking a row opens the Task details modal. The card includes:

  • task ID, title, status, and initiator
  • network node details when the task has a node_id
  • Interrupt for tasks that are still running
  • execution timeline with step-by-step log entries
  • script results for ExecuteScript jobs
💡 Refresh behavior

The history table refreshes automatically about every 3 seconds. While a task is still running, the details modal polls the server every second until the job reaches a final status.

Queue Monitoring

Queue health is easiest to judge together with the consumer logs. The most useful signals are a growing pending backlog, stuck running/cancelling rows, and repeated errors in history.

Normal signs

  • tasks move from pending to running and then to a final state without long stalls
  • most jobs do not need many retries
  • history is dominated by Completed
  • interrupts happen only when a user actually requests them

Problem signs

🚨 Signs of overload or failure
  • the queue does not shrink for a long time
  • tasks often reach 10 attempts
  • running or cancelling rows stay visible for too long
  • history contains many Failed items

What to check

  1. whether the queue consumer service is running
  2. backend logs for the specific task type
  3. database availability
  4. CPU and memory on the server
  5. whether a worker is stuck and needs a restart

Best Practices

For administrators

Operations

  • Check Active tasks and Task history regularly
  • Watch consumer logs for repeated failures of the same task type
  • Make sure the queue service starts automatically after a server reboot
  • If load grows, investigate the bottleneck before changing worker settings

Performance

  • avoid launching too many heavy background jobs at the same time
  • confirm that the database and indexes can keep up with queue pressure
  • keep an eye on long-running jobs that may be blocked by external dependencies

For users

Working with tasks

  • do not create a duplicate task while the old one is still visible in Active tasks
  • if a job takes too long, open its card in Task history
  • use Interrupt only when the task is genuinely stuck or no longer needed
  • when a task fails, check the log and parameters before retrying it
✅ Takeaway

The asynchronous queue keeps INFRAX responsive during long operations. If you watch active tasks and task history, queue problems usually show up before they start affecting the rest of the system.