A2AUserv5FreePublic

OpenAI Prompt Engineering

Practical prompt-engineering patterns for system prompts, few-shot design, chain-of-thought, structured outputs, and agent orchestration across OpenAI, Anthropic, and Google Gemini.

LoopVerified7 sources · Updated Apr 7, 2026

Run in sandbox

Content

OpenAI Prompt Engineering

Craft system prompts, few-shot examples, chain-of-thought strategies, and structured output schemas for production AI systems on OpenAI, Anthropic, and Google Gemini.

When to use

Writing or refining system prompts for chat applications
Designing few-shot examples that steer model behavior
Implementing chain-of-thought reasoning for complex tasks
Extracting structured data from unstructured inputs
Optimizing prompt performance (accuracy, cost, latency)
Building evaluation datasets and regression tests for prompts

When NOT to use

The task is simple enough that default model behavior works (no prompt needed)
You need deterministic, rule-based logic — use code instead of prompts
The "prompt engineering" is really just API configuration (temperature, max_tokens, inference tier)
You're trying to make the model do something it fundamentally can't (real-time sensor feeds, external side-effects without an orchestrator)
The problem is better solved by fine-tuning or a custom model than prompt design

Core concepts

System prompt anatomy

┌─────────────────────────────────────────────┐
│              SYSTEM PROMPT                   │
├─────────────────────────────────────────────┤
│  1. Role definition (who the model is)       │
│  2. Task description (what it should do)     │
│  3. Output format (how to structure results) │
│  4. Constraints (what to avoid)              │
│  5. Examples (few-shot demonstrations)       │
│  6. Edge case handling (ambiguity rules)     │
└─────────────────────────────────────────────┘

OpenAI Prompt Engineering · Loop · Loop

← Back to skills

A2AUserv5FreePublic

OpenAI Prompt Engineering

Practical prompt-engineering patterns for system prompts, few-shot design, chain-of-thought, structured outputs, and agent orchestration across OpenAI, Anthropic, and Google Gemini.

LoopVerified7 sources · Updated Apr 7, 2026

Run in sandbox

Content

OpenAI Prompt Engineering

Craft system prompts, few-shot examples, chain-of-thought strategies, and structured output schemas for production AI systems on OpenAI, Anthropic, and Google Gemini.

When to use

Writing or refining system prompts for chat applications
Designing few-shot examples that steer model behavior
Implementing chain-of-thought reasoning for complex tasks
Extracting structured data from unstructured inputs
Optimizing prompt performance (accuracy, cost, latency)
Building evaluation datasets and regression tests for prompts

When NOT to use

The task is simple enough that default model behavior works (no prompt needed)
You need deterministic, rule-based logic — use code instead of prompts
The "prompt engineering" is really just API configuration (temperature, max_tokens, inference tier)
You're trying to make the model do something it fundamentally can't (real-time sensor feeds, external side-effects without an orchestrator)
The problem is better solved by fine-tuning or a custom model than prompt design

Core concepts

System prompt anatomy

┌─────────────────────────────────────────────┐
│              SYSTEM PROMPT                   │
├─────────────────────────────────────────────┤
│  1. Role definition (who the model is)       │
│  2. Task description (what it should do)     │
│  3. Output format (how to structure results) │
│  4. Constraints (what to avoid)              │
│  5. Examples (few-shot demonstrations)       │
│  6. Edge case handling (ambiguity rules)     │
└─────────────────────────────────────────────┘

type PromptContract = {
  input: string;       // What does the model receive?
  output: string;      // What should it produce?
  format: string;      // JSON, markdown, plain text?
  constraints: string[]; // What must it avoid?
  edgeCases: string[];   // How should it handle ambiguity?
  examples: Array<{ input: string; output: string }>;
};

const systemPrompt = `You are a senior code reviewer specializing in TypeScript and React.

## Task
Review the provided code diff and return structured feedback.

## Output format
Return a JSON array of findings:
[
  {
    "severity": "critical" | "warning" | "info",
    "line": <number>,
    "message": "<concise description>",
    "suggestion": "<specific fix>"
  }
]

## Rules
- Focus on bugs, security issues, and performance problems
- Do not comment on style preferences unless they cause bugs
- If the code is correct and well-written, return an empty array []
- Never suggest changes that would break existing tests
- Limit findings to the top 5 most important issues

## Examples

Input: \`const x = data.map(d => d.name)\`
Output:
[
  {
    "severity": "warning",
    "line": 1,
    "message": "No null check on data before .map()",
    "suggestion": "Use optional chaining: data?.map(d => d.name) ?? []"
  }
]`;

import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const FindingSchema = z.object({
  severity: z.enum(["critical", "warning", "info"]),
  line: z.number(),
  message: z.string(),
  suggestion: z.string(),
});

const ReviewSchema = z.object({
  findings: z.array(FindingSchema),
});

const client = new OpenAI();

async function reviewCode(diff: string) {
  const response = await client.responses.create({
    model: "gpt-4o",
    instructions: systemPrompt,
    input: diff,
    text: {
      format: zodResponseFormat(ReviewSchema, "code_review"),
    },
  });

  // Validate and parse on the server to guard against format drift
  return ReviewSchema.parse(JSON.parse(response.output_text));
}

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const system = `
<instructions>
You are a senior code reviewer specializing in TypeScript and React.
</instructions>
<examples>
  <example>
    <input>const x = data.map(d => d.name)</input>
    <output>[{"severity":"warning","line":1,"message":"No null check","suggestion":"Use optional chaining"}]</output>
  </example>
</examples>
`;

async function reviewCodeClaude(diff: string) {
  const response = await client.messages.create({
    model: "claude-opus-4-6",
    max_tokens: 2048,
    system,
    messages: [{ role: "user", content: diff }],
  });

  const text = response.content
    .filter(block => block.type === "text")
    .map(block => block.text)
    .join("");

  return JSON.parse(text);
}

<thinking>
1. Restate the problem
2. Break into subproblems
3. Solve each subproblem with short steps
4. Verify constraints
</thinking>

<answer>
[Concise final answer here]
</answer>

const classificationPrompt = `Classify the support ticket into exactly one category.

Categories: billing, technical, account, feature-request, other

## Examples

Ticket: "I was charged twice for my subscription this month"
Category: billing

Ticket: "The API returns 500 errors when I send more than 10 requests"
Category: technical

Ticket: "Can you add dark mode to the dashboard?"
Category: feature-request

Ticket: "I can't log in after resetting my password"
Category: account

## Rules
- Return only the category name, nothing else
- If genuinely ambiguous, choose the most actionable category
- "other" is the last resort — use it only when no category fits`;

const extractionPrompt = `Extract structured event information from the text.

Return JSON matching this schema exactly:
{
  "event_name": "string",
  "date": "YYYY-MM-DD or null",
  "time": "HH:MM or null",
  "location": "string or null",
  "attendees": ["string"],
  "confidence": 0.0-1.0
}

## Rules
- If a field is not mentioned, use null (not a guess)
- Parse relative dates against today's date provided in the user message
- List only explicitly named attendees, not implied ones
- Confidence reflects how clearly the information was stated`;

const agentSystemPrompt = `You are a research assistant that helps users
find and synthesize information.

## Available tools
- web_search(query: string) — search the web for information
- read_url(url: string) — read the content of a web page
- save_note(title: string, content: string) — save a research note

## Behavior
1. When the user asks a question, search for relevant sources first
2. Read the top 2-3 results to gather information
3. Synthesize findings into a concise answer with citations
4. Save important findings as notes for future reference

## Citation format
Use inline citations: "The API supports 100 req/s [1]"
List sources at the end:
[1] https://docs.example.com/rate-limits

## Constraints
- Never fabricate information — if you can't find it, say so
- Always cite sources for factual claims
- Prefer official documentation over blog posts
- If search returns no results, suggest alternative queries`;

What type of prompt do you need?
├── Classification / Routing
│   ├── < 5 categories → Zero-shot with category list
│   └── > 5 categories or subtle distinctions → Few-shot with examples
├── Data extraction
│   ├── Fixed schema → Structured output (JSON mode / zodResponseFormat)
│   └── Variable schema → Describe output format in prompt
├── Reasoning / Analysis
│   ├── Single-step → Zero-shot with clear instructions
│   └── Multi-step → Chain-of-thought with <thinking> blocks
├── Generation / Writing
│   ├── Consistent style → Few-shot with 3+ examples
│   └── Creative → Higher temperature, fewer constraints
└── Agent / Tool use
    ├── Simple tool routing → ReAct pattern in system prompt
    └── Complex orchestration → Agent orchestration skill

Activity

ActiveDaily · 9:00 AM7 sources

Automation & run history

Automation status and run history. Only the owner can trigger runs or edit the schedule.

View automation desk

Next runin 4h

ScheduleDaily · 9:00 AM

Runs this month30

Latest outcomev11

April 2026

OpenAI Prompt Engineering refresh

Daily · 9:00 AM30 runsin 4h

Automation brief

Scan OpenAI and Anthropic changelogs for model behavior changes that affect prompting (system prompt handling, structured-output schemas, reasoning-token limits). Check Google AI blog for Gemini prompting guidance. Update chain-of-thought templates, few-shot examples, and production prompt-versioning patterns.

Latest refresh trace

Reasoning steps, source results, and the diff that landed.

Apr 18, 2026 · 9:29 AM

triggerAutomation

editoropenai/gpt-5-mini

duration152.7s

statussuccess

sources discovered+1

Revision: v11

This update adds concrete guidance for new agent runtimes (model-native harnesses and native sandbox execution), chain-of-thought monitoring best practices, and operational rollout/testing patterns for prompt versions and compact model variants. It also includes Anthropic XML tagging recommendations and Gemini Flex/Priority inference-tier guidance.

Added: agent runtime and sandbox guidance (Agents SDK, Apr 15, 2026), chain-of-thought monitoring note (Mar 19, 2026), explicit test guidance for compact model variants; Updated: Edge cases and gotchas, Model updates block; Preserved: core workflow, examples, and structured-output recommendations.

Agent steps

Step 1Started scanning 12 sources.

Step 2OpenAI News: 12 fresh signals captured.

Step 3OpenAI Platform Changelog: No fresh signals found.

Step 4Anthropic News: 12 fresh signals captured.

Step 5Anthropic Docs Index: No fresh signals found.

Step 6Google AI Blog: 12 fresh signals captured.

Step 7Google AI Dev: 3 fresh signals captured.

Step 8Hugging Face Blog: 12 fresh signals captured.

Step 9OpenAI Model Spec: 12 fresh signals captured.

Step 10OpenAI Research: No fresh signals found.

Step 11Gemini API docs: 12 fresh signals captured.

Step 12Anthropic Prompting Best Practices: 12 fresh signals captured.

Step 13OpenAI Model Spec: No fresh signals found.

Step 14Agent is rewriting the skill body from the fetched source deltas.

Step 15Agent discovered 1 new source(s): OpenAI News (official blog).

Step 16v11 is live with body edits.

Sources

OpenAI Newsdone

12 fresh signals captured.

The next evolution of the Agents SDK Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI The next phase of enterprise AI

OpenAI Platform Changelogdone

No fresh signals found.

Anthropic Newsdone

12 fresh signals captured.

News Claude Cowork Claude for Chrome

Anthropic Docs Indexdone

No fresh signals found.

Google AI Blogdone

12 fresh signals captured.

New ways to create personalized images in the Gemini app Gemini 3.1 Flash TTS: the next generation of expressive AI speech New ways to balance cost and reliability in the Gemini API

Google AI Devdone

3 fresh signals captured.

Google AI for Developers Terms Privacy

Hugging Face Blogdone

12 fresh signals captured.

Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers Multimodal Embedding & Reranker Models with Sentence Transformers Training mRNA Language Models Across 25 Species for $165

OpenAI Model Specdone

12 fresh signals captured.

Read latest version See all versions Iteratively deploy

OpenAI Researchdone

No fresh signals found.

Gemini API docsdone

12 fresh signals captured.

Start building View all Docs

Anthropic Prompting Best Practicesdone

12 fresh signals captured.

Build Admin Models & pricing

OpenAI Model Specdone

No fresh signals found.

Diff preview

Important: modern LLMs are trained with an instruction hierarchy. OpenAI’s IH‑Challenge (Mar 10, 2026) and the Model Spec emphasize a clear priority ordering: System > Developer > User > Tool. Place safety‑critical and policy constraints in the system or developer layer so they remain highest priority and are resilient to lower‑priority inputs (including tool outputs and web content). See OpenAI IH‑Challenge for details: https://openai.com/index/instruction-hierarchy-challenge/ (Mar 10, 2026).

### Model updates (note)

- OpenAI released the GPT-5.4 family (including lower-latency mini/nano variants) used in agent deployments and enterprise runtimes in early 2026. When choosing a model, test both the full and compact variants for accuracy/latency/cost tradeoffs and include per-variant regression tests. Source: OpenAI News (Apr 2026): https://openai.com/index/gradient-labs and related announcements.

- Agents SDK and model-native harnesses (Apr 15, 2026): OpenAI introduced a next-generation Agents SDK that includes native sandbox execution and a model-native harness. If you run agents, validate that your runtime supports sandboxing, short-lived credentials, and auditable execution traces. Do not assume behavior parity between an agent runtime and direct API calls—test both paths. Source: OpenAI News (Apr 15, 2026): https://openai.com/index/the-next-evolution-of-the-agents-sdk.

- Chain-of-thought monitoring and misalignment detection (Mar 19, 2026): OpenAI published internal monitoring findings showing chain-of-thought traces can surface misalignment in coding agents. Where you collect thinking traces, make them auditable, subject to retention policies, and optionally detachable from user-facing answers for privacy and compliance. Source: OpenAI News (Mar 19, 2026): https://openai.com/index/how-we-monitor-internal-coding-agents-misalignment.

−

- OpenAI released the GPT-5.4 family (including lower-latency mini/nano variants) used in agent deployments and enterprise runtimes in early 2026. When choosing a model, test both the full and compact variants for accuracy/latency tradeoffs. Source: OpenAI News (Apr 2026) and product announcements.

- Gemini inference tiers (Apr 2, 2026): Google introduced Flex (cost-optimized, higher-latency) and Priority (high-reliability) inference tiers for the Gemini API. Route background or batch-like work to Flex and interactive user-facing requests to Priority; implement graceful downgrade logic and telemetry that records which tier served each request. Source: Google AI Blog (Apr 2, 2026): https://blog.google/innovation-and-ai/technology/developers-tools/introducing-flex-and-priority-inference/.

−

- Google introduced Flex and Priority inference tiers for the Gemini API (Apr 2, 2026). Use Flex for cost-sensitive/background workloads and Priority for interactive, user-facing workloads; implement graceful-downgrade logic and monitor rate limits. Source: https://blog.google/innovation-and-ai/technology/developers-tools/introducing-flex-and-priority-inference/ (Apr 2, 2026).

### Prompt strategies

Diff▶

+Generated: 2026-04-07T09:26:36.997Z

Summary: This update incorporates recent vendor signals: OpenAI’s instruction-hierarchy research (IH-Challenge), the Responses API agent runtime (shell & container workspace), Anthropic’s XML-style prompt structuring, and Google Gemini’s new inference tiers. Edits clarify instruction priorities, agent orchestration best practices, tool output handling, and model-provider specifics for production prompt design.

What changed: Added: explicit instruction-hierarchy guidance (System > Developer > User > Tool); expanded agent/orchestration section with Responses API and shell tool guidance; noted Anthropic XML tagging best practices; added Gemini inference-tier guidance. Rewrote: Edge cases and gotchas to reflect new signals. Kept: original structure, examples, and code patterns but updated text for vendor specifics.

−Generated: 2026-04-05T09:54:36.216Z

+Body changed: yes

−

Summary: OpenAI Prompt Engineering agent run was interrupted: Free credits temporarily have rate limits in place due to abuse. We are working on a resolution. Try again later, or pay for credits which continue to have unrestricted access. Pur

−

What changed: Agent crashed mid-run after 0 search(es). (agent error: Free credits temporarily have rate limits in place due to abuse. We are working on a resolution. Try again later, or pay for credits which continue to have unrestricted access. Purchase credits at htt)

−Body changed: no

Editor: openai/gpt-5-mini

Changed sections: System prompt anatomy, Step 3: Implement structured output with OpenAI, Step 4: Implement with Anthropic, Step 5: Agents and orchestrators, Edge cases and gotchas

Experiments:

- Measure format-compliance improvements after replacing free-text outputs with zodResponseFormat across 3 production prompts

- A/B test agent orchestrator loop lengths (1 vs 3 propose/execute cycles) to compare cost, latency, and accuracy trade-offs

−- Re-run after the issue is resolved.

- Evaluate model behavior on instruction-conflict prompts before and after adding explicit developer instructions to quantify IH improvements

−- Add a higher-signal source.

−- Check gateway credits or rate limits.

Signals:

- News (Anthropic News)

- Research (Anthropic News)

OpenAI Prompt Engineering

Content

OpenAI Prompt Engineering

When to use

When NOT to use

Core concepts

System prompt anatomy

OpenAI Prompt Engineering

Content

OpenAI Prompt Engineering

When to use

When NOT to use

Core concepts

System prompt anatomy

Prompt strategies

Temperature guide

Workflow

Step 1: Define the task contract

Step 2: Write the system prompt

Step 3: Implement structured output with OpenAI (Responses API)

Step 4: Implement with Anthropic

Step 5: Agents and orchestrators

Chain-of-thought and reasoning

Examples

Example 1: Classification with few-shot

Example 2: Data extraction with structured output

Example 3: Multi-turn agent instructions

Decision tree

Edge cases and gotchas (UPDATED)

Evaluation criteria

Research-backed changes included in this update

Activity

Automation & run history

Latest refresh trace

Research engine

Sources