Asynchronous Task Queue
INFRAX runs background work through an asynchronous queue. That keeps the interface responsive while long jobs move through Active tasks and Task history.
System Overview
The INFRAX task queue handles both user-driven and scheduled background work. It covers jobs such as script execution, agent operations, backup tasks, scheduled monitoring checks, and other work that should not run in the main UI thread.
What the queue provides
- Async execution - long jobs move to the background
- Persistent storage - queue rows live in the database and survive restarts
- Priorities - more important work is picked first
- Retries - transient failures can return a task to the queue
- Interrupts - running jobs can be stopped from task history
- History - every job gets a result card and execution log
You can start a long operation, continue working immediately, and review the result later in Active tasks or Task history.
Architecture
The queue is served by a dedicated background consumer. It pulls tasks from the database, hands them to the right handler, and updates task history and queue state when execution finishes.
Components
1. Queue table
Active tasks are stored in the database. A row tracks:
- task ID
- task type
- queue status
- priority
- created time and next retry time
- retry counter
- start time and worker PID
- interrupt flag and task payload
2. Consumer service
The service in tasks.consumer.service.php starts InfraxTaskQueueService, registers task handlers, and runs the queue with a fixed worker pool.
- it only picks
pendingtasks that are ready to run - it uses row locks and
skip lockedso the same job is not claimed twice - it updates
worker_pid,started_at,retry_count, andnext_try_at - it can recover stale tasks after a worker disappears
3. Task handlers
Execution logic lives in api/Modules/Tasks/Handlers. For example, ExecuteScript creates a task history card, writes progress steps, and renders script results in the modal.
Queue settings
| Setting | Value | Meaning |
|---|---|---|
| Worker threads | 40 | Up to 40 tasks can be processed in parallel |
| Retry delay | 30 seconds | A fixed pause before a task becomes eligible again |
| Retry limit | 10 | After the tenth failed attempt, the task is removed from the queue |
| Backoff growth | Disabled | The pause stays constant instead of increasing after each retry |
| Stale task check | 60 seconds | The service periodically checks running/cancelling tasks for dead workers |
Task Lifecycle
Each task moves from creation to a final history status. At the queue level it lives as pending, running, or cancelling, and then it becomes a user-facing history state.
Processing stages
1. Creation
A task is created by a user or by the scheduler. The payload usually contains the title, execution parameters, and the initiator. If the scheduler created it, task history shows Scheduler instead of a user.
2. Queueing
The row is stored as pending. For retries, next_try_at is written immediately so workers do not pick the task too early.
3. Claiming
The consumer selects ready tasks by priority, then by next retry time, then by ID. It then marks the row as running, increments the retry counter, and stores the worker PID.
4. Execution
The task-specific handler performs the actual work. Task history is created or refreshed with a Running status and a step-by-step log.
5. Result
The handler eventually returns SUCCESS, FAILED, or RETRY.
6. Cleanup
Successful and failed tasks are removed from the queue and recorded as Completed or Failed. RETRY puts the task back into pending, and an interrupted job ends up as Canceled in history.
If a running task is interrupted, the queue finishes cleanup as if the job had completed successfully, then writes the final history state as Canceled. The same rule applies to jobs stopped through the cancel signal.
Task Priorities
Higher numeric values run first. In Active tasks, the priority is shown as a colored badge.
| UI label | Numeric range | How it is used |
|---|---|---|
| High | 10 and above | Top-priority jobs that should run first |
| Medium | 5-9 | Regular jobs with a noticeable priority |
| Low | Below 5 | Background work that can wait |
When priorities match, the backend compares next_try_at first and then the task ID. That keeps execution order stable and predictable.
Retry Mechanism
Retries are meant for transient problems such as a short network timeout, a busy resource, or a temporary overload that may disappear on the next pass.
Retry parameters
- Maximum attempts - 10
- Delay between attempts - 30 seconds
- Delay mode - fixed, without exponential backoff
How it works
- The handler returns
RETRYor fails with a temporary error. retry_countincreases when a worker claims the task.- The task stays in the queue as
pendingand gets a newnext_try_at. - After 30 seconds it becomes eligible again.
- After 10 failed attempts, the queue removes it and history shows Failed.
If the root cause is not temporary, the queue will still retry the job until the limit is reached. Check the parameters, data, and logs before submitting the task again.
| Situation | Typical outcome |
|---|---|
| Short network timeout | The next attempt often succeeds |
| Resource locked by another process | The queue waits for release and tries again |
| Temporary overload | The queue retries after a pause |
| Bad input data | Retries do not help and the task ends as Failed |
Task States
The UI uses different labels for queue state and history state. A job first lives in the queue, then it gets a final history result.
Queue states
| Code | Shown as | Meaning |
|---|---|---|
pending |
Pending | The task is waiting for a worker |
running |
Running | A worker is currently processing it |
cancelling |
Cancelling | The user requested an interrupt and cleanup is in progress |
Task history states
| Code | Shown as | Meaning |
|---|---|---|
InProgress |
Running | The task is still active and the card can refresh in real time |
Done |
Completed | The task finished successfully |
Failed |
Failed | The task exhausted retries or hit a fatal error |
Canceled |
Canceled | The task was stopped by the user or by the cancel signal |
Active Tasks
Active tasks shows the live queue. It is the place to check what is waiting, what is already running, and which kind of workload the system is processing right now.
Open Administration → Automation → Active tasks. The table supports search, sorting, and a live view of the current queue state.
What the table shows
| Column | What it means |
|---|---|
| Origin | Source marker: System or User. In the current UI, script execution jobs are tagged as User and the other rows are tagged as System. |
| ID | Unique queue row ID |
| Status | Pending, Running, or Cancelling |
| Task title | The task title, or Untitled if none was set |
| Created at | When the row was added to the queue |
| Started at | Execution start time; waiting tasks show a dash |
| Retry attempts | The counter that increases each time a worker claims the task |
| Priority | A colored badge: High, Medium, or Low |
| Type | The internal task code, such as ExecuteScript |
Filtering and sorting
- search by ID
- search by task title
- filter by created at
- search by task type
- sort by any visible column
If the queue does not shrink and running or cancelling rows hang around, check the consumer service, server load, and backend logs.
Task History
Task history shows completed, running, and interrupted work. It is the main screen for auditing, troubleshooting, and checking the result of background jobs.
Origin filter
- User - tasks launched by a user
- System - scheduled or automatic tasks
- All - the full history list
The current filter behavior is asymmetric: User hides system rows, while System and All currently leave both scheduler and user entries visible.
Table information
| Column | What it means |
|---|---|
| Status | The TaskHistoryStatus component shows Running, Completed, Failed, or Canceled |
| Created | Task creation time |
| Title | The task title or a type-derived fallback title |
| Initiator | The user name or Scheduler when no initiator is set |
Task details card
Clicking a row opens the Task details modal. The card includes:
- task ID, title, status, and initiator
- network node details when the task has a node_id
- Interrupt for tasks that are still running
- execution timeline with step-by-step log entries
- script results for
ExecuteScriptjobs
The history table refreshes automatically about every 3 seconds. While a task is still running, the details modal polls the server every second until the job reaches a final status.
Queue Monitoring
Queue health is easiest to judge together with the consumer logs. The most useful signals are a growing pending backlog, stuck running/cancelling rows, and repeated errors in history.
Normal signs
- tasks move from
pendingtorunningand then to a final state without long stalls - most jobs do not need many retries
- history is dominated by Completed
- interrupts happen only when a user actually requests them
Problem signs
- the queue does not shrink for a long time
- tasks often reach 10 attempts
runningorcancellingrows stay visible for too long- history contains many Failed items
What to check
- whether the queue consumer service is running
- backend logs for the specific task type
- database availability
- CPU and memory on the server
- whether a worker is stuck and needs a restart
Best Practices
For administrators
Operations
- Check Active tasks and Task history regularly
- Watch consumer logs for repeated failures of the same task type
- Make sure the queue service starts automatically after a server reboot
- If load grows, investigate the bottleneck before changing worker settings
Performance
- avoid launching too many heavy background jobs at the same time
- confirm that the database and indexes can keep up with queue pressure
- keep an eye on long-running jobs that may be blocked by external dependencies
For users
Working with tasks
- do not create a duplicate task while the old one is still visible in Active tasks
- if a job takes too long, open its card in Task history
- use Interrupt only when the task is genuinely stuck or no longer needed
- when a task fails, check the log and parameters before retrying it
The asynchronous queue keeps INFRAX responsive during long operations. If you watch active tasks and task history, queue problems usually show up before they start affecting the rest of the system.