Batches & Workflows
Batches in Taskhook allow you to group related tasks together and process them as a single unit. They provide powerful capabilities for parallel processing while maintaining oversight and control over the entire group of tasks. Batches can be used as a building block for complex workflows.
What are Batches?
A batch is a collection of tasks that are logically related and need to be monitored or managed together. When you need to process multiple items in parallel while tracking their collective progress or responding to group-level events, batches provide the perfect solution.
Key Features
- Parallel Processing: Execute multiple tasks simultaneously for improved performance
- Progress Tracking: Monitor the status and progress of the entire group
- Lifecycle Events: React to batch-level events through webhooks
- Atomic Operations: Create and manage multiple tasks as a single unit
- Dynamic Updates: Add tasks to existing batches as needed
- Expiration Control: Automatic cleanup of stale batches
Common Use Cases
Batches are particularly useful for:
- Processing large datasets in chunks
- Parallel data transformation tasks
- Distributed image or video processing
- Bulk update operations
- Multi-step workflow orchestration
How Batches Work
Progress Tracking
Taskhook provides statistics for each batch:
{
"total": 100,
"pending": 20,
"completed": 75,
"failed": 5,
"completion_rate": 75.0
}
Event Notifications
Batches can trigger webhooks on events:
on_progress
: When task status changes affect batch progresson_finish
: When all tasks complete (success or failure)on_success
: When all tasks succeedon_death
: On first task failure
Example: CSV Processing
Here's an example of using batches to process a CSV file where processing each row requires significant computation, and the rows can be processed independent of each other.
const batch = await client.batches.create({
description: 'Process Sales Data CSV',
notifications: {
on_complete: {
url: 'https://example.com/notify-completion',
method: 'POST',
},
},
tasks: csvRows.map((row) => ({
target: 'https://example.com/process-row',
payload: { row },
})),
})
Best Practices
- Implement idempotent task processing
- Use the
on_death
callback for early failure detection - Monitor completion rate for batch health
- Design tasks to be independent and parallel-safe
Multi-Step Workflow Orchestration
Taskhook's batch lifecycle events enable implementing sophisticated multi-step long-running asynchronous workflows directly in your application code. Instead of relying on external workflow engines, you can orchestrate complex processes by creating and managing batches based on lifecycle events.
Event-Driven Orchestration
Your application can listen to batch lifecycle events and trigger new batches or tasks based on the outcomes:
// Stage 1: Data Validation
const validationBatch = await client.batches.create({
description: 'Validate uploaded files',
notifications: {
on_success: {
url: 'https://example.com/workflow/process-validated-files',
method: 'POST',
},
on_death: {
url: 'https://example.com/workflow/handle-validation-failure',
method: 'POST',
},
},
tasks: files.map((file) => ({
target: 'https://example.com/validate-file',
payload: { fileId: file.id },
})),
})
// Stage 2: File Processing Handler
app.post('/workflow/process-validated-files', async (req, res) => {
const { batch } = req.body // Previous batch details
// Create next batch in workflow
const processingBatch = await client.batches.create({
description: 'Process validated files',
notifications: {
on_complete: {
url: 'https://example.com/workflow/send-notifications',
method: 'POST',
},
},
tasks: batch.files.map((file) => ({
target: 'https://example.com/process-file',
payload: { fileId: file.id },
})),
})
res.sendStatus(200)
})
Benefits of Application-Based Orchestration
- Code Co-location: Workflow logic lives with your application code
- Natural Evolution: Workflows can be versioned and deployed with your application
- Flexible Control Flow: Implement complex conditional logic based on batch outcomes
- Testability: Unit test workflow logic alongside application code
- Familiar Tools: Use your existing logging and monitoring infrastructure
Best Practices for Complex Workflows
-
Event Handling
- Implement idempotent event handlers
- Use event IDs to prevent duplicate processing
- Include correlation IDs across workflow stages
-
State Management
- Pass necessary context between stages via batch descriptions or payloads
- Consider using your application's database for workflow state
- Keep batch payloads small and reference external data when needed
-
Error-Handling
- Plan for partial failures
- Implement compensating actions for rollbacks
- Use dead letter queues for failed events
-
Monitoring
- Log workflow transitions
- Track timing between stages
- Monitor workflow completion rates
- Set up alerts for stuck workflows