Error-Handling
When you send an email notification to a customer, process a payment, or sync data with a third-party API, things can go wrong. The receiving service might be down, the network could be unstable, or rate limits might be hit. Taskhook automatically retries failed tasks and provides tools to handle persistent failures.
Failure Detection
Taskhook considers a task execution as failed when:
- The callback request returns any non-2xx HTTP status code
- The request times out
- TLS/SSL errors occur
When a failure occurs, the task enters the retry cycle automatically unless it has exhausted all retry attempts or expired.
Retry Strategy
When a task fails, Taskhook retries it automatically with exponential backoff delays between attempts. The retry delay is calculated using the formula:
retry_count ** 4 + 15 + random(10) * (retry_count + 1)
This formula combines exponential backoff with a random component (jitter). The exponential part (retry_count ** 4
) ensures that retries spread out more as failures persist, reducing load during extended outages. The random component prevents the "thundering herd" problem where many failed tasks would retry at exactly the same time, potentially overwhelming the target system as it recovers.
- Maximum retries: 24 attempts
- Total retry period: Approximately 20 days
- Task expiration: Optional time limit that prevents further retries when reached
You can set a task expiration time when creating tasks. If a task reaches its expiration time before completing all retry attempts, it will be moved to the dead letter queue and no further retries will be attempted.
For reference, here are the delays for all retry attempts, assuming median random jitter:
Attempt # | Next retry backoff | Total waiting time |
---|---|---|
1 | 26s | 26s |
2 | 46s | 1m 12s |
3 | 1m 56s | 3m 8s |
4 | 4m 56s | 8m 4s |
5 | 11m 10s | 19m 14s |
6 | 22m 26s | 41m 40s |
7 | 40m 56s | 1h 22m |
8 | 1h 9m | 2h 31m |
9 | 1h 50m | 4h 22m |
10 | 2h 47m | 7h 10m |
11 | 4h 5m | 11h 15m |
12 | 5h 46m | 17h 2m |
13 | 7h 57m | 1d 59m |
14 | 10h 41m | 1d 11h 41m |
15 | 14h 5m | 2d 1h 46m |
16 | 18h 13m | 2d 20h |
17 | 23h 13m | 3d 19h 14m |
18 | 1d 5h 11m | 5d 26m |
19 | 1d 12h 13m | 6d 12h 39m |
20 | 1d 20h 28m | 8d 9h 8m |
21 | 2d 6h 3m | 10d 15h 12m |
22 | 2d 17h 6m | 13d 8h 18m |
23 | 3d 5h 46m | 16d 14h 4m |
24 | 3d 20h 11m | 20d 10h 16m |
Dead Letter Queue
Tasks that exhaust all retry attempts or hit their expiration time are automatically moved to the dead letter queue (DLQ). The DLQ serves as a holding area for failed tasks that require investigation or manual intervention.
Managing Dead Letter Queue Tasks
You can manage DLQ tasks through both the API and dashboard.
- Review the task's full execution history, including all retry attempts and error messages
- Requeue the task with a fresh retry cycle (resets attempt count)
- Permanently remove the task from the DLQ
- Update task parameters (target URL, headers, payload) before retrying
Handling Duplicate Deliveries
In rare cases, Taskhook might send the same task twice to ensure it's not lost. For example, if your server processes a task but the success response doesn't reach Taskhook due to a network issue, Taskhook will retry the task. Your endpoint should handle these duplicate deliveries gracefully.
Implementing Idempotency
To handle potential duplicate deliveries:
- Use the
id
field provided in each task as an idempotency key - Store processed task IDs in your application
- Skip processing if a task ID has already been handled
Monitoring and Alerting
We recommend monitoring your task failure rates and DLQ size to catch systemic issues early. Taskhook provides several monitoring endpoints and integrations:
- Webhook notifications for DLQ events
- Custom alert rules based on failure patterns