All posts
few-shot promptingfew-shot examples AIin-context learning

Few-Shot Prompting: How to Train AI Models In-Context

A practical guide to few-shot prompting: when to use it, how to select good examples, and how to structure in-context learning for complex AI tasks.

7 min read

Few-shot prompting is the most underused technique in production AI applications. Developers default to zero-shot — write an instruction, send it, hope the output matches expectations — and then spend time post-processing outputs that a well-placed example would have fixed in the prompt itself.

Understanding when and how to use few-shot prompting is the difference between an AI feature that works and one that works reliably.

Zero-Shot, One-Shot, Few-Shot

Zero-shot: An instruction with no examples. The model infers the expected output from the instruction alone.

Classify the following support ticket as: bug, feature_request, or question.
Ticket: "The export button doesn't work on Safari."

Works well when the task is unambiguous and the output format is standard.

One-shot: One input/output example before the task. Anchors the model to a specific format or style.

Few-shot: Two to five examples before the task. Teaches the model a pattern it cannot infer from the instruction alone.

The cost of few-shot is token count. The benefit is dramatically reduced output variance — the model has seen what "correct" looks like in your specific context.

When Few-Shot Outperforms Zero-Shot

Use few-shot when any of these are true:

Your output format is non-standard. If the correct output does not look like something in the model's training data, it cannot reliably produce it from a description alone. A single example shows it directly.

Your task involves domain-specific classification. The model does not know your category taxonomy. Three examples with your labels are worth more than a paragraph describing them.

You need style consistency. Tone, vocabulary, and sentence structure are easier to demonstrate than describe. One well-chosen example calibrates the model's style more precisely than any adjective.

Zero-shot outputs vary too much run-to-run. High variance is a signal that the model is uncertain about what correct looks like. Examples reduce that uncertainty.

Selecting Good Examples

Bad examples are worse than no examples — they teach the model the wrong pattern.

Represent the full range of inputs. If your task handles three distinct cases, each example should represent a different case. Do not pick three similar examples.

Match the difficulty of real inputs. Edge cases in your examples calibrate the model for edge cases in production. Easy examples produce a model that fails on hard inputs.

Use the exact output format you require. Every deviation in your examples — a trailing period, an extra space, a slightly different key name — will appear in the model's output. Be precise.

Keep examples short. Long examples consume tokens without proportionally improving accuracy. The model learns from the structure, not the length.

Few-Shot Template

Task: [Instruction]

Examples:
Input: [Example input 1]
Output: [Example output 1]

Input: [Example input 2]
Output: [Example output 2]

Input: [Example input 3]
Output: [Example output 3]

Input: [Actual input]
Output:

The trailing Output: with no content is the prompt. It cues the model to continue the pattern. Do not write anything after it.

A Real Example: Ticket Classification

Task: Classify support tickets into exactly one of: bug, feature_request, billing, question.
Return only the label — no explanation.

Examples:
Input: "Dashboard crashes when I click export with more than 1000 rows"
Output: bug

Input: "Can you add a dark mode option to the settings panel?"
Output: feature_request

Input: "I was charged twice this month for the Pro plan"
Output: billing

Input: "How do I invite team members to my workspace?"
Output: question

Input: "The API returns 500 when payload is over 10kb"
Output:

This prompt produces a single-word label every time. Zero-shot on the same task produces labels with explanations, inconsistent casing, and occasional novel categories.

When NOT to Use Few-Shot

The task is already unambiguous. Adding examples to Translate this sentence to French adds tokens without improving accuracy.

Your context window is constrained. Five examples that consume 800 tokens may not be worth it for a simple classification task. Measure zero-shot accuracy first.

Examples are hard to write. If you cannot write three examples that you are confident are correct, the task definition itself is unclear. Fix the task definition first.

Combining Few-Shot with Other Techniques

Few-shot compounds well with role prompting. Assign a persona first, then provide examples that demonstrate how that persona responds. The model applies both constraints simultaneously.

It also works well with chain-of-thought: include a reasoning step in each example output, and the model will reason through new inputs before giving a final answer — which reduces classification errors on ambiguous inputs.

For prompt quality scoring, few-shot examples count toward context density but not toward task atomicity. A prompt with five examples and an ambiguous instruction still scores low. The instruction and the examples must both be precise.

Promptuner detects few-shot structure in your prompt and evaluates whether your examples follow the input/output pattern consistently — flagging mismatches before you submit.


Few-shot prompting is not a workaround for a bad instruction. It is a precision tool for tasks where "correct" is easier to show than describe. Use it deliberately, select examples carefully, and measure the improvement in output consistency.

Free Prompt Optimizer

Score and refine your prompts in real-time — inside every AI tool you use.

Install Free
// More reading

From the Blog