Error-Handling

When you send an email notification to a customer, process a payment, or sync data with a third-party API, things can go wrong. The receiving service might be down, the network could be unstable, or rate limits might be hit. Taskhook automatically retries failed tasks and provides tools to handle persistent failures.

Failure Detection

Taskhook considers a task execution as failed when:

The callback request returns any non-2xx HTTP status code
The request times out
TLS/SSL errors occur

When a failure occurs, the task enters the retry cycle automatically unless it has exhausted all retry attempts or expired.

Retry Strategy

When a task fails, Taskhook retries it automatically with exponential backoff delays between attempts. The retry delay is calculated using the formula:

retry_count ** 4 + 15 + random(10) * (retry_count + 1)

This formula combines exponential backoff with a random component (jitter). The exponential part (retry_count ** 4) ensures that retries spread out more as failures persist, reducing load during extended outages. The random component prevents the "thundering herd" problem where many failed tasks would retry at exactly the same time, potentially overwhelming the target system as it recovers.

Maximum retries: 24 attempts
Total retry period: Approximately 20 days
Task expiration: Optional time limit that prevents further retries when reached

You can set a task expiration time when creating tasks. If a task reaches its expiration time before completing all retry attempts, it will be moved to the dead letter queue and no further retries will be attempted.

For reference, here are the delays for all retry attempts, assuming median random jitter:

Attempt #	Next retry backoff	Total waiting time
1	26s	26s
2	46s	1m 12s
3	1m 56s	3m 8s
4	4m 56s	8m 4s
5	11m 10s	19m 14s
6	22m 26s	41m 40s
7	40m 56s	1h 22m
8	1h 9m	2h 31m
9	1h 50m	4h 22m
10	2h 47m	7h 10m
11	4h 5m	11h 15m
12	5h 46m	17h 2m
13	7h 57m	1d 59m
14	10h 41m	1d 11h 41m
15	14h 5m	2d 1h 46m
16	18h 13m	2d 20h
17	23h 13m	3d 19h 14m
18	1d 5h 11m	5d 26m
19	1d 12h 13m	6d 12h 39m
20	1d 20h 28m	8d 9h 8m
21	2d 6h 3m	10d 15h 12m
22	2d 17h 6m	13d 8h 18m
23	3d 5h 46m	16d 14h 4m
24	3d 20h 11m	20d 10h 16m

Dead Letter Queue

Tasks that exhaust all retry attempts or hit their expiration time are automatically moved to the dead letter queue (DLQ). The DLQ serves as a holding area for failed tasks that require investigation or manual intervention.

Managing Dead Letter Queue Tasks

You can manage DLQ tasks through both the API and dashboard.

Review the task's full execution history, including all retry attempts and error messages
Requeue the task with a fresh retry cycle (resets attempt count)
Permanently remove the task from the DLQ
Update task parameters (target URL, headers, payload) before retrying

Handling Duplicate Deliveries

In rare cases, Taskhook might send the same task twice to ensure it's not lost. For example, if your server processes a task but the success response doesn't reach Taskhook due to a network issue, Taskhook will retry the task. Your endpoint should handle these duplicate deliveries gracefully.

Implementing Idempotency

To handle potential duplicate deliveries:

Use the id field provided in each task as an idempotency key
Store processed task IDs in your application
Skip processing if a task ID has already been handled

Monitoring and Alerting

We recommend monitoring your task failure rates and DLQ size to catch systemic issues early. Taskhook provides several monitoring endpoints and integrations:

Webhook notifications for DLQ events
Custom alert rules based on failure patterns