Batch processing with Claude API: handling hundreds of documents efficiently

Why Batch Processing Matters

You've got 500 customer reviews to analyze. Or 1,000 invoices to extract data from. Or 2,000 support tickets to categorize. Running these through Claude's standard API one at a time will either break your budget or take forever—or both.

This is where Claude's Batch API shines. Instead of juggling rate limits, retries, and real-time pricing, you submit your entire job queue, get 50% off your token costs, and Claude processes everything asynchronously. It's not just cheaper. It's fundamentally different from streaming interactions.

Let's walk through how to actually implement this.

Understanding Batch API Fundamentals

The Batch API works in three phases:

Submit phase: You create a JSONL file (one JSON request per line) containing all your Claude API calls. Each request gets a unique client ID you generate.

Processing phase: Anthropic's servers process your batch, respecting rate limits globally. Your batch queues alongside everyone else's but with guaranteed processing within 24 hours.

Retrieve phase: You poll for results or wait for a webhook notification. Results come back in the same JSONL format, matched to your client IDs.

The 50% cost reduction applies to both input and output tokens. A 1M token batch costs half what it would via the standard API.

Setting Up Your First Batch

Here's a practical example. Say you're analyzing product reviews:

import anthropic
import json
from datetime import datetime

client = anthropic.Anthropic(api_key="your-api-key")

# Your data source
reviews = [
    "This product changed my life. Highly recommend!",
    "Broke after two weeks. Total waste of money.",
    "It's okay. Nothing special but does the job.",
    # ... 497 more reviews
]

# Build requests
requests = []
for i, review in enumerate(reviews):
    requests.append({
        "custom_id": f"review-{i:05d}",
        "params": {
            "model": "claude-3-5-sonnet-20241022",
            "max_tokens": 200,
            "messages": [
                {
                    "role": "user",
                    "content": f"""Analyze this review and provide:
1. Sentiment (positive/negative/neutral)
2. Key issue or benefit mentioned
3. Overall score (1-5)

Review: {review}"""
                }
            ]
        }
    })

# Create JSONL content
jsonl_content = "\n".join(json.dumps(req) for req in requests)

# Submit batch
response = client.beta.messages.batches.create(
    requests=requests,
    betas=["batch-2024-09-24"]
)

batch_id = response.id
print(f"Batch submitted: {batch_id}")
print(f"Request count: {response.request_counts.processing}")


That's it for submission. You now have a batch ID you can check later.

## Processing and Retrieval Patterns

Don't expect instant results. Most batches process within hours, but you should design for 24-hour waits.

**Polling approach (simple but not elegant):**

```python
import time

batch_id = "your-batch-id"

while True:
    batch = client.beta.messages.batches.retrieve(batch_id, betas=["batch-2024-09-24"])

    if batch.processing_status == "ended":
        break

    print(f"Status: {batch.processing_status}")
    print(f"Processed: {batch.request_counts.succeeded}/{batch.request_counts.total}")

    time.sleep(30)  # Check every 30 seconds
```

**Webhook approach (production-ready):**

Set up an endpoint that receives notifications when your batch completes:

```python
# In your batch submission
response = client.beta.messages.batches.create(
    requests=requests,
    betas=["batch-2024-09-24"],
    # Some Anthropic plans support webhook_url parameter
)
```

Then when the webhook fires, retrieve your results immediately. No polling waste.

## Handling Results at Scale

Results come back as JSONL. Each line contains your original request matched with the response:

```python
def process_batch_results(batch_id):
    """Stream results without loading everything into memory"""

    # Get the result file
    batch = client.beta.messages.batches.retrieve(batch_id, betas=["batch-2024-09-24"])

    # Process results line by line
    results_file = client.beta.messages.batches.results(batch_id, betas=["batch-2024-09-24"])

    insights = {
        "positive": [],
        "negative": [],
        "neutral": [],
        "errors": []
    }

    for line in results_file:
        result = json.loads(line)

        if result.get("error"):
            insights["errors"].append({
                "id": result["custom_id"],
                "error": result["error"]
            })
            continue

        # Extract sentiment from response
        content = result["message"]["content"][0]["text"]

        if "positive" in content.lower():
            insights["positive"].append(result["custom_id"])
        elif "negative" in content.lower():
            insights["negative"].append(result["custom_id"])
        else:
            insights["neutral"].append(result["custom_id"])

    return insights
```

This streams results line-by-line instead of loading the entire file. With 10,000 requests, that's the difference between 50MB in memory and constant 1MB usage.

## Real-World Optimization Tactics

**Batch size strategy:** While there's no hard limit, 10,000-100,000 requests per batch is the sweet spot. Larger batches get slightly better efficiency. If you have millions of documents, split across multiple batches submitted simultaneously.

**Token counting:** Use `anthropic.tokenize()` before batch submission to estimate costs. Find the maximum tokens you need per request and multiply:

```python
estimate = tokenize(system_prompt + longest_document) * request_count * 1.1
cost_at_discount = (estimate / 1_000_000) * 1.50  # $1.50 per 1M tokens
```

**Error retry logic:** The batch API returns errors for individual requests (malformed messages, rate limits). Don't re-submit the entire batch. Extract failed `custom_id` values and run them again:

```python
failed_ids = [r["custom_id"] for r in results if r.get("error")]
retry_requests = [r for r in original_requests if r["custom_id"] in failed_ids]
# Submit as new batch
```

**Parallel extraction:** Once a batch completes, you can immediately submit the next one. The API supports thousands of parallel batches. Use this for continuous pipelines.

## When NOT to Use Batches

Batches aren't for everything:

- **Interactive workflows:** Anything requiring user feedback. Batch API is fire-and-forget.
- **Real-time classification:** If you need results in seconds, use the standard API.
- **Single documents:** The overhead isn't worth it for 5-10 requests.
- **Testing prompts:** Use Messages API. Batch API is less flexible for iteration.

## The Bottom Line

Batch processing isn't complicated, but it requires planning. You can't interactively debug a batch of 1,000 requests. You need structured input, patience, and result handling code.

But the math is compelling: 50% discount plus guaranteed rate limit compliance for async work you'd do anyway. If you're processing hundreds of documents, this should be your default approach. The cost savings alone justify the implementation time.

Start small—submit a 100-request batch as a test. Build the results handling. Then scale to thousands.

```

Batch processing with Claude API: handling hundreds of documents efficiently

Why Batch Processing Matters

Understanding Batch API Fundamentals

Setting Up Your First Batch

Related articles

How knowledge workers should think about AI skill development in 2026

AI tools for deep work: what helps focus and what destroys it

How to build an AI-augmented personal workflow that actually saves time