Creative: brainstorming, story generation, exploration
Workflow
Step 1: Define the task contract
Before writing any prompt, answer these questions:
type PromptContract = {
input: string; // What does the model receive?
output: string; // What should it produce?
format: string; // JSON, markdown, plain text?
constraints: string[]; // What must it avoid?
edgeCases: string[]; // How should it handle ambiguity?
examples: Array<{ input: string; output: string }>;
};
Step 2: Write the system prompt
const systemPrompt = \`You are a senior code reviewer specializing in TypeScript and React.
## Task
Review the provided code diff and return structured feedback.
## Output format
Return a JSON array of findings:
[
{
"severity": "critical" | "warning" | "info",
"line": <number>,
"message": "<concise description>",
"suggestion": "<specific fix>"
}
]
## Rules
- Focus on bugs, security issues, and performance problems
- Do not comment on style preferences unless they cause bugs
- If the code is correct and well-written, return an empty array []
- Never suggest changes that would break existing tests
- Limit findings to the top 5 most important issues
## Examples
Input: \\\`const x = data.map(d => d.name)\\\`
Output:
[
{
"severity": "warning",
"line": 1,
"message": "No null check on data before .map()",
"suggestion": "Use optional chaining: data?.map(d => d.name) ?? []"
}
]\`;
Step 5: Add chain-of-thought for complex reasoning
const chainOfThoughtPrompt = \`You are solving a complex problem.
## Process
Think through this step by step:
1. **Understand**: Restate the problem in your own words
2. **Decompose**: Break it into sub-problems
3. **Solve**: Work through each sub-problem
4. **Verify**: Check your solution against the original constraints
5. **Format**: Present the final answer
## Output format
<thinking>
[Your step-by-step reasoning here — be thorough]
</thinking>
<answer>
[Your final, concise answer here]
</answer>\`;
Examples
Example 1: Classification with few-shot
const classificationPrompt = \`Classify the support ticket into exactly one category.
Categories: billing, technical, account, feature-request, other
## Examples
Ticket: "I was charged twice for my subscription this month"
Category: billing
Ticket: "The API returns 500 errors when I send more than 10 requests"
Category: technical
Ticket: "Can you add dark mode to the dashboard?"
Category: feature-request
Ticket: "I can't log in after resetting my password"
Category: account
## Rules
- Return only the category name, nothing else
- If genuinely ambiguous, choose the most actionable category
- "other" is the last resort — use it only when no category fits\`;
Example 2: Data extraction with structured output
const extractionPrompt = \`Extract structured event information from the text.
Return JSON matching this schema exactly:
{
"event_name": "string",
"date": "YYYY-MM-DD or null",
"time": "HH:MM or null",
"location": "string or null",
"attendees": ["string"],
"confidence": 0.0-1.0
}
## Rules
- If a field is not mentioned, use null (not a guess)
- Parse relative dates against today's date provided in the user message
- List only explicitly named attendees, not implied ones
- Confidence reflects how clearly the information was stated\`;
Example 3: Multi-turn agent instructions
const agentSystemPrompt = \`You are a research assistant that helps users
find and synthesize information.
## Available tools
- web_search(query: string) — search the web for information
- read_url(url: string) — read the content of a web page
- save_note(title: string, content: string) — save a research note
## Behavior
1. When the user asks a question, search for relevant sources first
2. Read the top 2-3 results to gather information
3. Synthesize findings into a concise answer with citations
4. Save important findings as notes for future reference
## Citation format
Use inline citations: "The API supports 100 req/s [1]"
List sources at the end:
[1] https://docs.example.com/rate-limits
## Constraints
- Never fabricate information — if you can't find it, say so
- Always cite sources for factual claims
- Prefer official documentation over blog posts
- If search returns no results, suggest alternative queries\`;
Decision tree
What type of prompt do you need?
├── Classification / Routing
│ ├── < 5 categories → Zero-shot with category list
│ └── > 5 categories or subtle distinctions → Few-shot with examples
├── Data extraction
│ ├── Fixed schema → Structured output (JSON mode / zodResponseFormat)
│ └── Variable schema → Describe output format in prompt
├── Reasoning / Analysis
│ ├── Single-step → Zero-shot with clear instructions
│ └── Multi-step → Chain-of-thought with <thinking> blocks
├── Generation / Writing
│ ├── Consistent style → Few-shot with 3+ examples
│ └── Creative → Higher temperature, fewer constraints
└── Agent / Tool use
├── Simple tool routing → ReAct pattern in system prompt
└── Complex orchestration → Agent orchestration skill
Edge cases and gotchas
Prompt injection: Never put untrusted user input directly in the system prompt — use the user message field
Token budget: System prompts count against context window — measure and optimize
Model drift: Prompts that work on GPT-4o may not work on Claude — test across providers
Few-shot ordering: The last example has the most influence — put your best example last
Negative instructions: "Don't do X" is weaker than "Always do Y instead of X"
Output format compliance: Models may drift from the requested format — use structured outputs or regex validation
Temperature 0 is not deterministic: It's greedy decoding, not true determinism — outputs can still vary slightly
Long prompts degrade: Performance drops as system prompts exceed ~2000 tokens — distill ruthlessly
XML tags in Claude: Anthropic models respond well to XML structure: <rules>, <examples>, <output>
JSON mode quirks: OpenAI's JSON mode requires "JSON" in the prompt; structured outputs (zodResponseFormat) don't
Evaluation criteria
Criterion
How to measure
Accuracy
% of outputs matching ground-truth labels
Format compliance
% of outputs parseable as the requested format
Consistency
Variance across 10 identical requests (temperature 0)
Cost efficiency
Tokens consumed per successful completion
Latency
Time-to-first-token and total generation time
Robustness
Accuracy on adversarial / edge-case inputs
Provider portability
Works on both OpenAI and Anthropic without rewrite
Research-backed changes
The biggest delta is Google AI for Developers. Fold the concrete changes into the operating notes, then discard the fluff.
Scan OpenAI and Anthropic changelogs for model behavior changes that affect prompting (system prompt handling, structured-output schemas, reasoning-token limits). Check Google AI blog for Gemini prompting guidance. Update chain-of-thought templates, few-shot examples, and production prompt-versioning patterns.
Latest refresh trace
Reasoning steps, source results, and the diff that landed.
Apr 18, 2026 · 9:29 AM
triggerAutomation
editoropenai/gpt-5-mini
duration152.7s
statussuccess
sources discovered+1
Revision: v11
This update adds concrete guidance for new agent runtimes (model-native harnesses and native sandbox execution), chain-of-thought monitoring best practices, and operational rollout/testing patterns for prompt versions and compact model variants. It also includes Anthropic XML tagging recommendations and Gemini Flex/Priority inference-tier guidance.
Added: agent runtime and sandbox guidance (Agents SDK, Apr 15, 2026), chain-of-thought monitoring note (Mar 19, 2026), explicit test guidance for compact model variants; Updated: Edge cases and gotchas, Model updates block; Preserved: core workflow, examples, and structured-output recommendations.
Agent steps
Step 1Started scanning 12 sources.
Step 2OpenAI News: 12 fresh signals captured.
Step 3OpenAI Platform Changelog: No fresh signals found.
Step 4Anthropic News: 12 fresh signals captured.
Step 5Anthropic Docs Index: No fresh signals found.
Step 6Google AI Blog: 12 fresh signals captured.
Step 7Google AI Dev: 3 fresh signals captured.
Step 8Hugging Face Blog: 12 fresh signals captured.
Step 9OpenAI Model Spec: 12 fresh signals captured.
Step 10OpenAI Research: No fresh signals found.
Step 11Gemini API docs: 12 fresh signals captured.
Step 12Anthropic Prompting Best Practices: 12 fresh signals captured.
Step 13OpenAI Model Spec: No fresh signals found.
Step 14Agent is rewriting the skill body from the fetched source deltas.
Step 15Agent discovered 1 new source(s): OpenAI News (official blog).
Important: modern LLMs are trained with an instruction hierarchy. OpenAI’s IH‑Challenge (Mar 10, 2026) and the Model Spec emphasize a clear priority ordering: System > Developer > User > Tool. Place safety‑critical and policy constraints in the system or developer layer so they remain highest priority and are resilient to lower‑priority inputs (including tool outputs and web content). See OpenAI IH‑Challenge for details: https://openai.com/index/instruction-hierarchy-challenge/ (Mar 10, 2026).
### Model updates (note)
+
+- OpenAI released the GPT-5.4 family (including lower-latency mini/nano variants) used in agent deployments and enterprise runtimes in early 2026. When choosing a model, test both the full and compact variants for accuracy/latency/cost tradeoffs and include per-variant regression tests. Source: OpenAI News (Apr 2026): https://openai.com/index/gradient-labs and related announcements.
+
+- Agents SDK and model-native harnesses (Apr 15, 2026): OpenAI introduced a next-generation Agents SDK that includes native sandbox execution and a model-native harness. If you run agents, validate that your runtime supports sandboxing, short-lived credentials, and auditable execution traces. Do not assume behavior parity between an agent runtime and direct API calls—test both paths. Source: OpenAI News (Apr 15, 2026): https://openai.com/index/the-next-evolution-of-the-agents-sdk.
+
+- Chain-of-thought monitoring and misalignment detection (Mar 19, 2026): OpenAI published internal monitoring findings showing chain-of-thought traces can surface misalignment in coding agents. Where you collect thinking traces, make them auditable, subject to retention policies, and optionally detachable from user-facing answers for privacy and compliance. Source: OpenAI News (Mar 19, 2026): https://openai.com/index/how-we-monitor-internal-coding-agents-misalignment.
−- OpenAIreleasedtheGPT-5.4family(includinglower-latencymini/nanovariants)usedinagentdeploymentsandenterpriseruntimesinearly2026.Whenchoosingamodel,testboththefull and compactvariantsforaccuracy/latencytradeoffs. Source: OpenAINews (Apr 2026)andproduct announcements.
+- Gemini inference tiers (Apr 2, 2026): Google introduced Flex (cost-optimized,higher-latency)andPriority(high-reliability)inferencetiersfortheGeminiAPI.Routebackgroundorbatch-likeworktoFlexandinteractiveuser-facingrequeststoPriority;implementgracefuldowngradelogic and telemetry that recordswhichtierservedeachrequest. Source: GoogleAIBlog (Apr 2,2026):https://blog.google/innovation-and-ai/technology/developers-tools/introducing-flex-and-priority-inference/.
−- Google introduced Flex and Priority inference tiers for the Gemini API (Apr 2, 2026). Use Flex for cost-sensitive/background workloads and Priority for interactive, user-facing workloads; implement graceful-downgrade logic and monitor rate limits. Source: https://blog.google/innovation-and-ai/technology/developers-tools/introducing-flex-and-priority-inference/ (Apr 2, 2026).
### Prompt strategies
Research engine
OpenAI Prompt Engineering now combines 7 tracked sources with 1 trusted upstream skill packs. Instead of waiting on a single fixed link, it tracks canonical feeds, discovers new docs from index-like surfaces, and folds those deltas into sandbox-usable guidance.
OpenAI Prompt Engineering has unusually strong source quality and broad utility, so it deserves prominent placement.
Discovery process
1. Track canonical signals
Monitor 3 feed-like sources for release notes, changelog entries, and durable upstream deltas.
2. Discover net-new docs and leads
Scan 4 discovery-oriented sources such as docs indexes and sitemaps, then rank extracted links against explicit query hints instead of trusting nav order.
3. Transplant from trusted upstreams
Fold implementation patterns from OpenAI Docs so the skill inherits a real operating model instead of boilerplate prose.
4. Keep the sandbox honest
Ship prompts, MCP recommendations, and automation language that can actually be executed in Loop's sandbox instead of abstract advice theater.
Summary: OpenAI Prompt Engineering agent run was interrupted: Free credits temporarily have rate limits in place due to abuse. We are working on a resolution. Try again later, or pay for credits which continue to have unrestricted access. Pur
What changed: Agent crashed mid-run after 0 search(es). (agent error: Free credits temporarily have rate limits in place due to abuse. We are working on a resolution. Try again later, or pay for credits which continue to have unrestricted access. Purchase credits at htt)
Body changed: no
@@ −8 +8 @@
- Add a higher-signal source.
- Check gateway credits or rate limits.
Signals:
−- Google AI for Developers (Google AI Dev)
−- Terms (Google AI Dev)
−- Privacy (Google AI Dev)
- News (Anthropic News)
+- Research (Anthropic News)
+- Economic Futures (Anthropic News)
+- Try Claude (Anthropic News)
Update history3▶
Apr 5, 20264 sources
OpenAI Prompt Engineering agent run was interrupted: Free credits temporarily have rate limits in place due to abuse. We are working on a resolution. Try again later, or pay for credits which continue to have unrestricted access. Pur
Apr 3, 20264 sources
OpenAI Prompt Engineering agent run was interrupted: Free credits temporarily have rate limits in place due to abuse. We are working on a resolution. Try again later, or pay for credits which continue to have unrestricted access. Pur
Apr 1, 20264 sources
OpenAI Prompt Engineering now tracks Google AI for Developers and 3 other fresh signals.