Content
Audio Transcribe
Transcribe audio using OpenAI, with optional speaker diarization when requested. Prefer the bundled CLI for deterministic, repeatable runs.
Workflow
- Collect inputs: audio file path(s), desired response format (text/json/diarized_json), optional language hint, and any known speaker references.
- Verify
OPENAI_API_KEYis set. If missing, ask the user to set it locally (do not ask them to paste the key). - Run the bundled
transcribe_diarize.pyCLI with sensible defaults (fast text transcription). - Validate the output: transcription quality, speaker labels, and segment boundaries; iterate with a single targeted change if needed.
- Save outputs under
output/transcribe/when working in this repo.
Decision rules
- Default to
gpt-4o-mini-transcribewith--response-format textfor fast transcription. - If the user wants speaker labels or diarization, use
--model gpt-4o-transcribe-diarize --response-format diarized_json. - If audio is longer than ~30 seconds, keep
--chunking-strategy auto. - Prompting is not supported for
gpt-4o-transcribe-diarize.
Output conventions
- Use
output/transcribe/<job-id>/for evaluation runs. - Use
--out-dirfor multiple files to avoid overwriting.
Dependencies (install if missing)
Prefer uv for dependency management.
