0:00 / 0:56
News
OpenAI Launches GPT-Realtime-2 - First Voice Model With GPT-5-Class Reasoning
calendar_today Date:
schedule Duration: 0:56
database
Summary Report
OpenAI launches GPT-Realtime-2 in the API, the first voice model with GPT-5-class reasoning, alongside Realtime-Translate for live multilingual speech and Realtime-Whisper for streaming transcription.
- 01. GPT-Realtime-2 is OpenAI's first voice model with GPT-5-class reasoning, built for live conversational agents.
- 02. It carries a 128k token context window, calls tools mid-sentence, and handles interruptions cleanly.
- 03. Realtime-Translate converts speech from over 70 input languages into 13 output languages in real time.
- 04. Realtime-Whisper is a streaming speech-to-text model that captions speech as it happens.
- 05. Pricing is $32 per million audio input tokens and $64 per million output for Realtime-2; Translate is $0.034/min, Whisper $0.017/min.
OpenAI has launched GPT-Realtime-2 through its API, marking the first voice model to incorporate GPT-5-class reasoning capabilities. The model processes audio input, performs reasoning, and generates spoken responses in a single pass, enabling more natural conversational interactions than previous voice AI systems.
The release includes two companion models that expand the platform's capabilities. GPT-Realtime-Translate provides live speech translation across more than 70 input languages into 13 output languages, maintaining real-time pace with speakers. GPT-Realtime-Whisper offers streaming transcription services, generating captions as users speak.
GPT-Realtime-2 features a 128,000-token context window and can execute tool calls mid-conversation whilst handling interruptions smoothly. The model maintains conversational flow even whilst processing complex reasoning tasks. OpenAI has positioned customer service as the primary early application, though the company also targets education, media, events, and creator platforms.
Pricing is set at $32 per million audio input tokens and $64 for output tokens. The translation service costs 3.5 cents per minute, whilst the Whisper transcription model charges just under 2 cents per minute. OpenAI states these models represent a shift from basic call-and-response systems towards voice interfaces capable of performing substantive work.
Meta Data
Company:
LLM: