Error-Handling

When you send an email notification to a customer, process a payment, or sync data with a third-party API, things can go wrong. The receiving service might be down, the network could be unstable, or rate limits might be hit. Taskhook automatically retries failed tasks and provides tools to handle persistent failures.

Failure Detection

Taskhook considers a task execution as failed when:

  • The callback request returns any non-2xx HTTP status code
  • The request times out
  • TLS/SSL errors occur

When a failure occurs, the task enters the retry cycle automatically unless it has exhausted all retry attempts or expired.

Retry Strategy

When a task fails, Taskhook retries it automatically with exponential backoff delays between attempts. The retry delay is calculated using the formula:

retry_count ** 4 + 15 + random(10) * (retry_count + 1)

This formula combines exponential backoff with a random component (jitter). The exponential part (retry_count ** 4) ensures that retries spread out more as failures persist, reducing load during extended outages. The random component prevents the "thundering herd" problem where many failed tasks would retry at exactly the same time, potentially overwhelming the target system as it recovers.

  • Maximum retries: 24 attempts
  • Total retry period: Approximately 20 days
  • Task expiration: Optional time limit that prevents further retries when reached

You can set a task expiration time when creating tasks. If a task reaches its expiration time before completing all retry attempts, it will be moved to the dead letter queue and no further retries will be attempted.

For reference, here are the delays for all retry attempts, assuming median random jitter:

Attempt #Next retry backoffTotal waiting time
126s26s
246s1m 12s
31m 56s3m 8s
44m 56s8m 4s
511m 10s19m 14s
622m 26s41m 40s
740m 56s1h 22m
81h 9m2h 31m
91h 50m4h 22m
102h 47m7h 10m
114h 5m11h 15m
125h 46m17h 2m
137h 57m1d 59m
1410h 41m1d 11h 41m
1514h 5m2d 1h 46m
1618h 13m2d 20h
1723h 13m3d 19h 14m
181d 5h 11m5d 26m
191d 12h 13m6d 12h 39m
201d 20h 28m8d 9h 8m
212d 6h 3m10d 15h 12m
222d 17h 6m13d 8h 18m
233d 5h 46m16d 14h 4m
243d 20h 11m20d 10h 16m

Dead Letter Queue

Tasks that exhaust all retry attempts or hit their expiration time are automatically moved to the dead letter queue (DLQ). The DLQ serves as a holding area for failed tasks that require investigation or manual intervention.

Managing Dead Letter Queue Tasks

You can manage DLQ tasks through both the API and dashboard.

  • Review the task's full execution history, including all retry attempts and error messages
  • Requeue the task with a fresh retry cycle (resets attempt count)
  • Permanently remove the task from the DLQ
  • Update task parameters (target URL, headers, payload) before retrying

Handling Duplicate Deliveries

In rare cases, Taskhook might send the same task twice to ensure it's not lost. For example, if your server processes a task but the success response doesn't reach Taskhook due to a network issue, Taskhook will retry the task. Your endpoint should handle these duplicate deliveries gracefully.

Implementing Idempotency

To handle potential duplicate deliveries:

  1. Use the id field provided in each task as an idempotency key
  2. Store processed task IDs in your application
  3. Skip processing if a task ID has already been handled

Monitoring and Alerting

We recommend monitoring your task failure rates and DLQ size to catch systemic issues early. Taskhook provides several monitoring endpoints and integrations:

  • Webhook notifications for DLQ events
  • Custom alert rules based on failure patterns

Was this page helpful?