Few-shot vs zero-shot prompting: a practical guide with real examples

The fundamental difference

Let's get straight to it: zero-shot means asking Claude to do something without examples. Few-shot means giving Claude examples of what you want before asking it to do the task.

That's it. But the implications are huge for your results.

When you write "Classify this customer feedback as positive, negative, or neutral: 'The product broke after two days,'" you're doing zero-shot prompting. Claude has never seen examples from you of what positive, negative, or neutral means in your specific context.

When you show Claude three examples of feedback you've already classified, then ask it to classify new feedback the same way, you're doing few-shot prompting.

Zero-shot works when Claude already understands the task deeply. Few-shot works when you need Claude to match your specific style, criteria, or edge cases.

When zero-shot is enough

Zero-shot prompting is your default for a reason: it's fast, requires no prep work, and Claude is remarkably capable.

Use zero-shot for:

General knowledge tasks. "What's the difference between REST and GraphQL?" Claude knows this cold. No examples needed.

Straightforward summarization. "Summarize this article in two sentences." Claude gets the concept of summarization from its training. Examples won't meaningfully improve the output.

Standard writing tasks. "Write a professional email thanking a client for their business." This is within Claude's baseline competency.

Logical reasoning. "If all dogs are mammals and Buddy is a dog, is Buddy a mammal?" The reasoning pattern is clear without examples.

I use zero-shot for probably 70% of my work with Claude. It's fast, and it works. The trap is assuming it's always good enough.

When few-shot becomes essential

Few-shot shines when you need Claude to match your specific judgment calls, company voice, or complex classification rules.

Your judgment differs from the obvious one. You might want customer complaints classified as "fixable" vs "system-level issue" vs "unrealistic expectation." Claude needs to see what you consider each category.

Style and tone matter. If you're generating product descriptions, "concise" and "clever" look different across industries. Show Claude your bar with examples.

Edge cases abound. Tax classifications, medical coding, support ticket routing—these domains have gray areas. Examples help Claude navigate them the way you do.

Consistency across a large batch. Processing 500 support tickets? Few-shot gets them classified consistently. Zero-shot might classify ticket #247 differently than #53.

Zero-shot prompting in practice

Here's a real example. Let's say you need Claude to extract key metrics from a business report:


Extract the following from this report:

- Revenue
- Customer count
- Gross margin

Report:
[report text here]

Please provide the values in JSON format.

This works because the task is concrete. Claude knows what "revenue" means across contexts. The request is unambiguous.

Now, what if the report uses unconventional terminology? What if sometimes "revenue" is listed as "top-line sales" or "gross receipts"? Now zero-shot gets risky. But it's still worth trying first.

Few-shot prompting in practice

Here's where few-shot earns its keep. Imagine you're building a system to categorize support tickets by urgency. Your company has specific rules:


You will categorize support tickets into: CRITICAL, HIGH, MEDIUM, LOW

Here are examples of how I've categorized previous tickets:

Example 1:
Ticket: "System is down for all users in US region"
Category: CRITICAL
Reason: Complete service outage affecting all users

Example 2:
Ticket: "Feature X would be really nice to have someday"
Category: LOW
Reason: This is a feature request, not a problem with current functionality

Example 3:
Ticket: "Forgot my password and can't log in"
Category: HIGH
Reason: User is fully blocked, but issue affects only one account

Example 4:
Ticket: "Dashboard loads slowly sometimes, maybe every 5th time"
Category: MEDIUM
Reason: Intermittent performance issue affecting user experience

Now categorize this ticket:
Ticket: "Export to CSV works fine but takes 3 minutes for large datasets"
Category: ?

Notice what we did: we showed Claude what CRITICAL, HIGH, MEDIUM, and LOW mean to your business. A slow export might be CRITICAL in a financial firm (where traders need speed) but MEDIUM for most teams. The examples teach Claude your priorities.

The practical testing framework

Here's how I decide between zero-shot and few-shot:

Step 1: Try zero-shot. Seriously. Spend 2 minutes on a zero-shot prompt. If the output is good, you're done.

Step 2: Assess the output. Is it accurate? Does it match your criteria? If yes, zero-shot wins.

Step 3: If zero-shot misses, identify why. Is Claude misunderstanding the task? The criteria? Is it matching general knowledge instead of your specific needs?

Step 4: Create few-shot examples. Pick 2-5 examples that show Claude what you actually want. Include edge cases if they're important.

Step 5: Test few-shot. Compare results to zero-shot. Usually you'll see improvement in 10-15 examples within your real workload.

Practical tips for few-shot that actually work

Use 3-5 examples, not 10. More examples can confuse Claude instead of clarify. Three good examples beat ten mediocre ones.

Make examples realistic. If your edge cases are weird, include weird examples. If your typical input is messy, show messy examples.

Vary your examples. Don't show three CRITICAL tickets then one LOW ticket. Shuffle the categories so Claude doesn't assume order matters.

Include brief reasoning. "Reason: X" tells Claude why you categorized that way. This transfers to new examples.

Test with your actual data. If you're classifying support tickets, use real support tickets in your examples. Not sanitized versions.

When to upgrade to system prompts

Once you've fine-tuned your approach with few-shot, consider using Claude's system parameter. This is where you place your instructions and examples permanently:

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system="You are a support ticket classifier. [Insert your few-shot examples here]",
    messages=[
        {"role": "user", "content": "Categorize this ticket: [new ticket]"}
    ]
)

System prompts keep your instructions consistent across API calls and are slightly more effective than putting everything in the user message.

The bottom line

Start with zero-shot. It's fast and Claude is smart. When you notice inconsistencies or misalignment with your specific criteria, move to few-shot. Show Claude what you actually want with 3-5 real examples, then evaluate results on your actual workload.

Most teams find they need few-shot for maybe 20-30% of their tasks. The rest happily runs on zero-shot with a well-written prompt.

The mistake is overthinking this. Zero-shot isn't "beginner" and few-shot isn't "advanced"—they're tools for different jobs. Use the right tool and move on.