## Core concepts (updated April 2026)
−- OpenAI Responses API + shell/container (March 11, 2026): OpenAI’s engineering post adds a shell tool and a hosted container workspace (filesystem, optional structured storage like SQLite, restricted networking) to the Responses API. This enables running commands, managing intermediate files, and producing large artifacts without bloating prompts. Use the hosted container for long-running or stateful steps where artifacts or file outputs are required.
+- OpenAI Responses API + shell/container (March 11, 2026): OpenAI equips the Responses API with a shell tool and a hosted container workspace (filesystem, optional structured storage like SQLite, restricted networking). Use the hosted container for long-running or stateful steps where artifacts or file outputs are required, and prefer the shell when you need broad OS-level tooling (grep, curl, awk, compilers).
−- OpenAI Agents SDK (April 15, 2026): the Agents SDK now includes native sandbox execution and a model-native harness to build secure, long-running agents across files and tools. The SDK provides higher-level orchestration primitives (skills, guarded execution contexts) that reduce boilerplate for common agent patterns. Prefer using the Agents SDK primitives for production agent deployments when available, and fall back to the raw Responses API + orchestrator loop for custom integrations.
+- OpenAI Agents SDK (April 15, 2026): the Agents SDK introduces a model-native harness and native sandbox execution (see example pip install "openai-agents>=0.14.0" in the announcement). The SDK provides higher-level primitives (Runner, SandboxAgent, SandboxRunConfig, manifest entries) that reduce orchestration boilerplate for file- and tool-heavy agents. Prefer Agents SDK primitives for production agents when they match your threat model and deployment constraints.
+- Anthropic Programmatic Tool Calling (PTC): Anthropic supports programmatic tool calling where Claude can write and execute code inside a code-execution sandbox. PTC lets the model orchestrate multiple tool calls inside the sandbox (reducing round trips and tokens) while surfacing explicit `tool_use` events that your orchestrator must fulfill with matching `tool_result` responses and the original `tool_use_id`.
−- Anthropic Programmatic Tool Calling (PTC): Anthropic supports programmatic tool calling where Claude can write and execute code inside a sandbox that calls tools programmatically. PTC reduces round trips and token usage for complex, multi-tool workflows by letting the model coordinate multiple invocations locally before returning a summarized result. The agent loop still surfaces explicit tool_use events and requires your orchestrator to fulfill tool_result responses with matching tool_use_id values.
+ - Compatibility note: PTC requires a specific code execution tool version (e.g., code_execution_20260120) and is only available on compatible Claude models — check the official compatibility table for the exact model list before relying on it in production.
−- Vercel AI SDK (v6 Beta): Vercel AI SDK v6 tightens inputSchema/outputSchema integration and introduces agent harness components and UI object-generation primitives. When building front-ends that must render schema-driven forms, validate model-generated arguments, or confirm side effects before execution, prefer the v6 agent harness (or later) for tighter wiring between model outputs and UI validation.
+- Vercel AI SDK (v6 Beta): Vercel AI SDK v6 introduces agent abstractions (ToolLoopAgent/Agent interface), tool-execution approval flows (human-in-the-loop), and stabilized structured output generation. Use the v6 agent abstraction when you want built-in loop control, tool approval UI, and inputSchema/outputSchema wiring between model outputs and your front-end forms.
- Cross-provider abstraction points
− - Id pairing: OpenAI uses function_call / function_call_output with call ids; Anthropic uses tool_use / tool_result with tool_use_id. Your orchestrator must preserve and round-trip these ids exactly.
+ - Id pairing: OpenAI uses function_call / function_call_output with call ids; Anthropic uses tool_use / tool_result with tool_use_id. Preserve and round-trip these ids exactly in your orchestrator.
- Schema enforcement: Use Zod or JSON Schema to validate inputs/outputs. When providers offer strict schema enforcement, enable it for critical paths but still perform server-side validation as a safety net.
- Provider-run vs client-run tools: Provider-run tools execute on provider infrastructure and can have different retention/PII policies—document these differences and choose client-run for sensitive data unless explicit contractual guarantees exist.
@@ −63 +64 @@
Notes and best practices:
- Use the OpenAI shell tool and hosted container workspace when your agent needs files, a filesystem, or restricted network access. The container solves common problems like where to place intermediate files and how to keep prompts compact.
−- Prefer the Agents SDK higher-level primitives (skills, guarded execution contexts) for production agents where available — they encapsulate sandboxing and harness patterns.
+- Prefer the Agents SDK higher-level primitives (Runner, SandboxAgent, Manifest entries) for production agents where available — they encapsulate sandboxing and harness patterns and reduce bespoke harness code.
### Step 4: Orchestrator loop — Anthropic (practical)
+- When Claude emits `tool_use` blocks, execute each client-callable tool and return `tool_result` blocks that include the original `tool_use_id` so the model can continue.
+- For programmatic tool calling (code execution), expect the model to generate code that will call tools inside the sandbox. Those internal calls are surfaced as `tool_use` events to your orchestrator; fulfill them and return matching `tool_result` objects.
+- Implementation checklist:
+ - Check the required code execution tool version (example: code_execution_20260120) and the model list that supports it before deploying PTC.
+ - Validate and sanitize any inputs that will be interpolated into sandbox-executed code to reduce injection risk.
+ - Treat PTC as a way to reduce tokens/latency for tightly-coupled multi-invocation tasks, but add extra monitoring and runtime validation because correctness shifts into the sandboxed code path.
−- When Claude emits `tool_use` blocks, execute each client-callable tool and return `tool_result` blocks that include the original `tool_use_id`.
+ - Return structured errors and allow the model to decide to retry, back off, or ask for clarification.
−- For programmatic tool calling (code execution), expect the model to generate code that will call tools; these tool calls still surface as `tool_use` events to your orchestrator.
−- Use strict schema enforcement where available and avoid parallel execution when ordering matters.
−- PTC tradeoffs: programmatic calling reduces token usage and latency for multi-step tasks but shifts correctness burden to the sandboxed code path — add additional validation and monitoring for that surfaced code path.
## Examples and patterns
@@ −81 +85 @@
const tools = [{ name: 'get_weather', description: 'Get current weather', parameters: { type: 'object', properties: { location: { type: 'string' }, units: { type: 'string', enum: ['celsius','fahrenheit'], default: 'fahrenheit' } }, required: ['location'] } }];
const res = await client.responses.create({ model: 'gpt-5.4', tools, input: [{ role: 'user', content: "What's the weather in Paris?" }] });
// If response contains a function_call, validate and execute, then send back the paired function_call_output referencing the same call id.
+
+- Anthropic programmatic tool calling (concrete notes):
+
+// Programmatic tool calling lets Claude execute code in a sandbox that runs multiple tool calls locally. The code execution feature must be enabled and compatible with the model version you choose. Data retention for PTC follows the feature's retention policy and is not ZDR by default; verify compliance for sensitive workloads.
+
+- Vercel AI SDK v6 (ToolLoopAgent example):
+import { ToolLoopAgent } from 'ai';
−- Anthropic programmatic tool calling (conceptual):
+import { weatherTool } from '@/tool/weather';
+export const weatherAgent = new ToolLoopAgent({ model: 'anthropic/claude-sonnet-4.5', instructions: 'You are a helpful weather assistant.', tools: { weather: weatherTool } });
−// Claude can write code that calls your tools programmatically inside a sandbox. Each invocation surfaces as a `tool_use` to your orchestrator. Fulfill it and return `tool_result` with matching `tool_use_id` so the model continues.
+const result = await weatherAgent.generate({ prompt: 'What is the weather in San Francisco?' });
- Parallel tool execution pattern:
- Identify independent tool calls (no shared side effects or data dependencies).