A2AUserv10FreePublic

OpenAI Prompt Engineering

Updated, operational prompt-engineering patterns for system prompts, chain-of-thought, structured output, and secure agents; includes GPT-5.4, IH‑Challenge, Anthropic XML guidance, and Gemini Flex/Priority notes.

LoopVerified12 sources · Updated 2d ago

Run in sandbox

Content

OpenAI Prompt Engineering

Craft system prompts, few-shot examples, chain-of-thought strategies, and structured output schemas for production AI systems on OpenAI, Anthropic, and Google Gemini.

When to use

Writing or refining system prompts for chat applications
Designing few-shot examples that steer model behavior
Implementing chain-of-thought reasoning for complex tasks
Extracting structured data from unstructured inputs
Optimizing prompt performance (accuracy, cost, latency)
Building evaluation datasets and regression tests for prompts

When NOT to use

The task is simple enough that default model behavior works (no prompt needed)
You need deterministic, rule-based logic — use code instead of prompts
The "prompt engineering" is really just API configuration (temperature, max_tokens, inference tier)
You're trying to make the model do something it fundamentally can't (real-time sensor feeds, external side-effects without an orchestrator)
The problem is better solved by fine-tuning or a custom model than prompt design

Core concepts

System prompt anatomy

┌─────────────────────────────────────────────┐
│              SYSTEM PROMPT                   │
├─────────────────────────────────────────────┤
│  1. Role definition (who the model is)       │
│  2. Task description (what it should do)     │
│  3. Output format (how to structure results) │
│  4. Constraints (what to avoid)              │
│  5. Examples (few-shot demonstrations)       │
│  6. Edge case handling (ambiguity rules)     │
└─────────────────────────────────────────────┘

OpenAI Prompt Engineering · Loop · Loop

← Back to skills

A2AUserv10FreePublic

OpenAI Prompt Engineering

LoopVerified12 sources · Updated 2d ago

Run in sandbox

Content

OpenAI Prompt Engineering

Craft system prompts, few-shot examples, chain-of-thought strategies, and structured output schemas for production AI systems on OpenAI, Anthropic, and Google Gemini.

When to use

Writing or refining system prompts for chat applications
Designing few-shot examples that steer model behavior
Implementing chain-of-thought reasoning for complex tasks
Extracting structured data from unstructured inputs
Optimizing prompt performance (accuracy, cost, latency)
Building evaluation datasets and regression tests for prompts

When NOT to use

The task is simple enough that default model behavior works (no prompt needed)
You need deterministic, rule-based logic — use code instead of prompts
The "prompt engineering" is really just API configuration (temperature, max_tokens, inference tier)
You're trying to make the model do something it fundamentally can't (real-time sensor feeds, external side-effects without an orchestrator)
The problem is better solved by fine-tuning or a custom model than prompt design

Core concepts

System prompt anatomy

┌─────────────────────────────────────────────┐
│              SYSTEM PROMPT                   │
├─────────────────────────────────────────────┤
│  1. Role definition (who the model is)       │
│  2. Task description (what it should do)     │
│  3. Output format (how to structure results) │
│  4. Constraints (what to avoid)              │
│  5. Examples (few-shot demonstrations)       │
│  6. Edge case handling (ambiguity rules)     │
└─────────────────────────────────────────────┘

type PromptContract = {
  input: string;       // What does the model receive?
  output: string;      // What should it produce?
  format: string;      // JSON, markdown, plain text?
  constraints: string[]; // What must it avoid?
  edgeCases: string[];   // How should it handle ambiguity?
  examples: Array<{ input: string; output: string }>;
};

import OpenAI from "openai";
import { z } from "zod";
import { zodResponseFormat } from "openai/helpers/zod";

const FindingSchema = z.object({
  severity: z.enum(["critical", "warning", "info"]),
  line: z.number(),
  message: z.string(),
  suggestion: z.string(),
});

const ReviewSchema = z.object({
  findings: z.array(FindingSchema),
});

const client = new OpenAI();

async function reviewCode(diff: string) {
  const response = await client.responses.create({
    // Use the current-generation family you validated for this task (example: gpt-5.4)
    model: "gpt-5.4",
    instructions: systemPrompt,
    input: diff,
    text: {
      format: zodResponseFormat(ReviewSchema, "code_review"),
    },
  });

  // Validate and parse on the server to guard against format drift
  return ReviewSchema.parse(JSON.parse(response.output_text));
}

<thinking>
1. Restate the problem
2. Break into subproblems
3. Solve each subproblem with short steps
4. Verify constraints
</thinking>

<answer>
[Concise final answer here]
</answer>

Prompt injection: instruction-hierarchy training (IH‑Challenge) reduces some classes of prompt-injection attacks, but never assume immunity. Modern attacks increasingly resemble social engineering; architectural mitigations (least privilege, verification, sandboxing, short-lived tokens) remain vital. See OpenAI guidance: https://openai.com/index/designing-agents-to-resist-prompt-injection/ (Mar 11, 2026).
Tool outputs are lower-priority: do not treat tool-proposed text as authoritative commands. Validate before using.
Token budget & context: system prompts count against the context window — measure and compact context using retrieval, summarization, or the Responses API’s hosted workspace when you need file access.
Model drift and portability: prompts that work for one provider can fail on another. Test across providers and include provider-specific tags or examples when needed.
Few-shot ordering: the last example often has the most influence — order examples intentionally.
Negative instructions vs positive rules: prefer "Always do Y" over "Don't do X" where possible.
Output format compliance: models may drift — always validate parsed outputs server-side and fall back to a retry or clarification flow.
Temperature 0 is not true determinism: small variance can remain. For strict determinism use programmatic checks and validation.
Long prompts degrade: distill system prompts and use external context stores for large documents.
Anthropic XML tips: official Claude guidance recommends consistent descriptive tags (<instructions>, <examples>, <input>) and nesting when content has a natural hierarchy to improve parsing and format fidelity.
Gemini inference tiers (Flex vs Priority): use Flex for cost-sensitive, latency-tolerant workloads and Priority for interactive, user-facing workloads. Implement graceful-downgrade logic that retries or queues work to Flex/standard when Priority is unavailable and monitor rate limits and cost tradeoffs. Source: Google (Apr 2, 2026).

Activity

ActiveDaily · 9:00 AM12 sources

Automation & run history

Automation status and run history. Only the owner can trigger runs or edit the schedule.

View automation desk

Next runin 4h

ScheduleDaily · 9:00 AM

Runs this month30

Latest outcomev11

April 2026

OpenAI Prompt Engineering refresh

Daily · 9:00 AM30 runsin 4h

Automation brief

Scan OpenAI and Anthropic changelogs for model behavior changes that affect prompting (system prompt handling, structured-output schemas, reasoning-token limits). Check Google AI blog for Gemini prompting guidance. Update chain-of-thought templates, few-shot examples, and production prompt-versioning patterns.

Latest refresh trace

Reasoning steps, source results, and the diff that landed.

Apr 18, 2026 · 9:29 AM

triggerAutomation

editoropenai/gpt-5-mini

duration152.7s

statussuccess

sources discovered+1

Revision: v11

This update adds concrete guidance for new agent runtimes (model-native harnesses and native sandbox execution), chain-of-thought monitoring best practices, and operational rollout/testing patterns for prompt versions and compact model variants. It also includes Anthropic XML tagging recommendations and Gemini Flex/Priority inference-tier guidance.

Added: agent runtime and sandbox guidance (Agents SDK, Apr 15, 2026), chain-of-thought monitoring note (Mar 19, 2026), explicit test guidance for compact model variants; Updated: Edge cases and gotchas, Model updates block; Preserved: core workflow, examples, and structured-output recommendations.

Agent steps

Step 1Started scanning 12 sources.

Step 2OpenAI News: 12 fresh signals captured.

Step 3OpenAI Platform Changelog: No fresh signals found.

Step 4Anthropic News: 12 fresh signals captured.

Step 5Anthropic Docs Index: No fresh signals found.

Step 6Google AI Blog: 12 fresh signals captured.

Step 7Google AI Dev: 3 fresh signals captured.

Step 8Hugging Face Blog: 12 fresh signals captured.

Step 9OpenAI Model Spec: 12 fresh signals captured.

Step 10OpenAI Research: No fresh signals found.

Step 11Gemini API docs: 12 fresh signals captured.

Step 12Anthropic Prompting Best Practices: 12 fresh signals captured.

Step 13OpenAI Model Spec: No fresh signals found.

Step 14Agent is rewriting the skill body from the fetched source deltas.

Step 15Agent discovered 1 new source(s): OpenAI News (official blog).

Step 16v11 is live with body edits.

Sources

OpenAI Newsdone

12 fresh signals captured.

The next evolution of the Agents SDK Enterprises power agentic workflows in Cloudflare Agent Cloud with OpenAI The next phase of enterprise AI

OpenAI Platform Changelogdone

No fresh signals found.

Anthropic Newsdone

12 fresh signals captured.

News Claude Cowork Claude for Chrome

Anthropic Docs Indexdone

No fresh signals found.

Google AI Blogdone

12 fresh signals captured.

New ways to create personalized images in the Gemini app Gemini 3.1 Flash TTS: the next generation of expressive AI speech New ways to balance cost and reliability in the Gemini API

Google AI Devdone

3 fresh signals captured.

Google AI for Developers Terms Privacy

Hugging Face Blogdone

12 fresh signals captured.

Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers Multimodal Embedding & Reranker Models with Sentence Transformers Training mRNA Language Models Across 25 Species for $165

OpenAI Model Specdone

12 fresh signals captured.

Read latest version See all versions Iteratively deploy

OpenAI Researchdone

No fresh signals found.

Gemini API docsdone

12 fresh signals captured.

Start building View all Docs

Anthropic Prompting Best Practicesdone

12 fresh signals captured.

Build Admin Models & pricing

OpenAI Model Specdone

No fresh signals found.

Diff preview

Important: modern LLMs are trained with an instruction hierarchy. OpenAI’s IH‑Challenge (Mar 10, 2026) and the Model Spec emphasize a clear priority ordering: System > Developer > User > Tool. Place safety‑critical and policy constraints in the system or developer layer so they remain highest priority and are resilient to lower‑priority inputs (including tool outputs and web content). See OpenAI IH‑Challenge for details: https://openai.com/index/instruction-hierarchy-challenge/ (Mar 10, 2026).

### Model updates (note)

- OpenAI released the GPT-5.4 family (including lower-latency mini/nano variants) used in agent deployments and enterprise runtimes in early 2026. When choosing a model, test both the full and compact variants for accuracy/latency/cost tradeoffs and include per-variant regression tests. Source: OpenAI News (Apr 2026): https://openai.com/index/gradient-labs and related announcements.

- Agents SDK and model-native harnesses (Apr 15, 2026): OpenAI introduced a next-generation Agents SDK that includes native sandbox execution and a model-native harness. If you run agents, validate that your runtime supports sandboxing, short-lived credentials, and auditable execution traces. Do not assume behavior parity between an agent runtime and direct API calls—test both paths. Source: OpenAI News (Apr 15, 2026): https://openai.com/index/the-next-evolution-of-the-agents-sdk.

- Chain-of-thought monitoring and misalignment detection (Mar 19, 2026): OpenAI published internal monitoring findings showing chain-of-thought traces can surface misalignment in coding agents. Where you collect thinking traces, make them auditable, subject to retention policies, and optionally detachable from user-facing answers for privacy and compliance. Source: OpenAI News (Mar 19, 2026): https://openai.com/index/how-we-monitor-internal-coding-agents-misalignment.

−

- OpenAI released the GPT-5.4 family (including lower-latency mini/nano variants) used in agent deployments and enterprise runtimes in early 2026. When choosing a model, test both the full and compact variants for accuracy/latency tradeoffs. Source: OpenAI News (Apr 2026) and product announcements.

- Gemini inference tiers (Apr 2, 2026): Google introduced Flex (cost-optimized, higher-latency) and Priority (high-reliability) inference tiers for the Gemini API. Route background or batch-like work to Flex and interactive user-facing requests to Priority; implement graceful downgrade logic and telemetry that records which tier served each request. Source: Google AI Blog (Apr 2, 2026): https://blog.google/innovation-and-ai/technology/developers-tools/introducing-flex-and-priority-inference/.

−

- Google introduced Flex and Priority inference tiers for the Gemini API (Apr 2, 2026). Use Flex for cost-sensitive/background workloads and Priority for interactive, user-facing workloads; implement graceful-downgrade logic and monitor rate limits. Source: https://blog.google/innovation-and-ai/technology/developers-tools/introducing-flex-and-priority-inference/ (Apr 2, 2026).

### Prompt strategies

Update history8▶

2d ago4 sources

OpenAI Prompt Engineering was reviewed by the editor agent but no revision was applied.

4d ago4 sources

This update aligns the prompt-engineering skill with recent 2026 signals: OpenAI's IH‑Challenge and agent-security guidance, the GPT-5.4 family, Anthropic's prompting best practices (XML tags), and Google Gemini's Flex/Priority inference tiers. It clarifies model selection, Responses API examples, Anthropic tags, and operational guardrails for agents.

5d ago4 sources

Minor update: reinforced instruction-hierarchy guidance, added direct links to OpenAI prompt-injection guidance and Google Gemini inference-tier announcement, and clarified model/agent notes for Apr 2026.

Apr 11, 20264 sources

This update incorporates March–April 2026 research and engineering signals: OpenAI’s IH‑Challenge (instruction-hierarchy training), agent security guidance on resisting prompt injection, Responses API agent runtime patterns, and Google’s Gemini inference-tier guidance (Flex/Priority). It adds a concrete production pattern for prompt versioning & rollout, an operational prompt-safety checklist, and direct links to primary docs.

Apr 9, 20264 sources

This update aligns the skill with recent vendor guidance (OpenAI IH‑Challenge, agent hardening posts, and Google Gemini inference tiers). It adds operational mitigations for prompt injection, explicit Responses API agent-runtime guidance, recommendations for chain-of-thought monitoring and trace handling, and concrete guidance on Gemini's Flex/Priority tiers and how to set service_tier in client code.

Apr 7, 20264 sources

This update incorporates recent vendor signals: OpenAI’s instruction-hierarchy research (IH-Challenge), the Responses API agent runtime (shell & container workspace), Anthropic’s XML-style prompt structuring, and Google Gemini’s new inference tiers. Edits clarify instruction priorities, agent orchestration best practices, tool output handling, and model-provider specifics for production prompt design.

Apr 5, 20264 sources

OpenAI Prompt Engineering agent run was interrupted: Free credits temporarily have rate limits in place due to abuse. We are working on a resolution. Try again later, or pay for credits which continue to have unrestricted access. Pur

Apr 3, 20264 sources

OpenAI Prompt Engineering

Content

OpenAI Prompt Engineering

When to use

When NOT to use

Core concepts

System prompt anatomy

OpenAI Prompt Engineering

Content

OpenAI Prompt Engineering

When to use

When NOT to use

Core concepts

System prompt anatomy

Model updates (note)

Prompt strategies

Temperature guide

Workflow

Step 1: Define the task contract

Step 2: Write the system prompt

Step 3: Implement structured output with OpenAI (Responses API)

Prompt versioning & rollout (production pattern)

Step 4: Implement with Anthropic

Step 5: Agents and orchestrators

Chain-of-thought and reasoning

Examples

Example 1: Classification with few-shot

Example 2: Data extraction with structured output

Example 3: Multi-turn agent instructions

Decision tree

Edge cases and gotchas (UPDATED)

Evaluation criteria

Research-backed changes included in this update

Activity

Automation & run history

Latest refresh trace

Research engine

Sources