diff --git a/docs/realtime/guide.md b/docs/realtime/guide.md new file mode 100644 index 000000000..9ea2525cf --- /dev/null +++ b/docs/realtime/guide.md @@ -0,0 +1,143 @@ +# Guide + +This guide provides an in-depth look at building voice-enabled AI agents using the OpenAI Agents SDK's realtime capabilities. + +!!! warning "Beta feature" +Realtime agents are in beta. Expect some breaking changes as we improve the implementation. + +## Overview + +Realtime agents allow for conversational flows, processing audio and text inputs in real time and responding with realtime audio. They maintain persistent connections with OpenAI's Realtime API, enabling natural voice conversations with low latency and the ability to handle interruptions gracefully. + +## Architecture + +### Core Components + +The realtime system consists of several key components: + +- **RealtimeAgent**: An agent, configured wiht instructions, tools and handoffs. +- **RealtimeRunner**: Manages configuration. You can call `runner.run()` to get a session. +- **RealtimeSession**: A single interaction session. You typically create one each time a user starts a conversation, and keep it alive until the conversation is done. +- **RealtimeModel**: The underlying model interface (typically OpenAI's WebSocket implementation) + +### Session flow + +A typical realtime session follows this flow: + +1. **Create your RealtimeAgent(s)** with instructions, tools and handoffs. +2. **Set up a RealtimeRunner** with the agent and configuration options +3. **Start the session** using `await runner.run()` which returns a RealtimeSession. +4. **Send audio or text messages** to the session using `send_audio()` or `send_message()` +5. **Listen for events** by iterating over the session - events include audio output, transcripts, tool calls, handoffs, and errors +6. **Handle interruptions** when users speak over the agent, which automatically stops current audio generation + +The session maintains the conversation history and manages the persistent connection with the realtime model. + +## Agent configuration + +RealtimeAgent works similarly to the regular Agent class with some key differences. For full API details, see the [`RealtimeAgent`][agents.realtime.agent.RealtimeAgent] API reference. + +Key differences from regular agents: + +- Model choice is configured at the session level, not the agent level. +- No structured output support (`outputType` is not supported). +- Voice can be configured per agent but cannot be changed after the first agent speaks. +- All other features like tools, handoffs, and instructions work the same way. + +## Session configuration + +### Model settings + +The session configuration allows you to control the underlying realtime model behavior. You can configure the model name (such as `gpt-4o-realtime-preview`), voice selection (alloy, echo, fable, onyx, nova, shimmer), and supported modalities (text and/or audio). Audio formats can be set for both input and output, with PCM16 being the default. + +### Audio configuration + +Audio settings control how the session handles voice input and output. You can configure input audio transcription using models like Whisper, set language preferences, and provide transcription prompts to improve accuracy for domain-specific terms. Turn detection settings control when the agent should start and stop responding, with options for voice activity detection thresholds, silence duration, and padding around detected speech. + +## Tools and Functions + +### Adding Tools + +Just like regular agents, realtime agents support function tools that execute during conversations: + +```python +from agents import function_tool + +@function_tool +def get_weather(city: str) -> str: + """Get current weather for a city.""" + # Your weather API logic here + return f"The weather in {city} is sunny, 72Β°F" + +@function_tool +def book_appointment(date: str, time: str, service: str) -> str: + """Book an appointment.""" + # Your booking logic here + return f"Appointment booked for {service} on {date} at {time}" + +agent = RealtimeAgent( + name="Assistant", + instructions="You can help with weather and appointments.", + tools=[get_weather, book_appointment], +) +``` + +## Handoffs + +### Creating Handoffs + +Handoffs allow transferring conversations between specialized agents. + +```python +from agents.realtime import realtime_handoff + +# Specialized agents +billing_agent = RealtimeAgent( + name="Billing Support", + instructions="You specialize in billing and payment issues.", +) + +technical_agent = RealtimeAgent( + name="Technical Support", + instructions="You handle technical troubleshooting.", +) + +# Main agent with handoffs +main_agent = RealtimeAgent( + name="Customer Service", + instructions="You are the main customer service agent. Hand off to specialists when needed.", + handoffs=[ + realtime_handoff(billing_agent, tool_description="Transfer to billing support"), + realtime_handoff(technical_agent, tool_description="Transfer to technical support"), + ] +) +``` + +## Event handling + +The session streams events that you can listen to by iterating over the session object. Events include audio output chunks, transcription results, tool execution start and end, agent handoffs, and errors. Key events to handle include: + +- **audio**: Raw audio data from the agent's response +- **audio_end**: Agent finished speaking +- **audio_interrupted**: User interrupted the agent +- **tool_start/tool_end**: Tool execution lifecycle +- **handoff**: Agent handoff occurred +- **error**: Error occurred during processing + +For complete event details, see [`RealtimeSessionEvent`][agents.realtime.events.RealtimeSessionEvent]. + +## Guardrails + +Only output guardrails are supported for realtime agents. These guardrails are debounced and run periodically (not on every word) to avoid performance issues during real-time generation. The default debounce length is 100 characters, but this is configurable. + +When a guardrail is triggered, it generates a `guardrail_tripped` event and can interrupt the agent's current response. The debounce behavior helps balance safety with real-time performance requirements. Unlike text agents, realtime agents do **not** raise an Exception when guardrails are tripped. + +## Audio processing + +Send audio to the session using [`session.send_audio(audio_bytes)`][agents.realtime.session.RealtimeSession.send_audio] or send text using [`session.send_message()`][agents.realtime.session.RealtimeSession.send_message]. + +For audio output, listen for `audio` events and play the audio data through your preferred audio library. Make sure to listen for `audio_interrupted` events to stop playback immediately and clear any queued audio when the user interrupts the agent. + +## Examples + +For complete working examples, check out the [examples/realtime directory](https://github.com/openai/openai-agents-python/tree/main/examples/realtime) which includes demos with and without UI components. diff --git a/docs/realtime/quickstart.md b/docs/realtime/quickstart.md new file mode 100644 index 000000000..2cee550ea --- /dev/null +++ b/docs/realtime/quickstart.md @@ -0,0 +1,175 @@ +# Quickstart + +Realtime agents enable voice conversations with your AI agents using OpenAI's Realtime API. This guide walks you through creating your first realtime voice agent. + +!!! warning "Beta feature" +Realtime agents are in beta. Expect some breaking changes as we improve the implementation. + +## Prerequisites + +- Python 3.9 or higher +- OpenAI API key +- Basic familiarity with the OpenAI Agents SDK + +## Installation + +If you haven't already, install the OpenAI Agents SDK: + +```bash +pip install openai-agents +``` + +## Creating your first realtime agent + +### 1. Import required components + +```python +import asyncio +from agents.realtime import RealtimeAgent, RealtimeRunner +``` + +### 2. Create a realtime agent + +```python +agent = RealtimeAgent( + name="Assistant", + instructions="You are a helpful voice assistant. Keep your responses conversational and friendly.", +) +``` + +### 3. Set up the runner + +```python +runner = RealtimeRunner( + starting_agent=agent, + config={ + "model_settings": { + "model_name": "gpt-4o-realtime-preview", + "voice": "alloy", + "modalities": ["text", "audio"], + } + } +) +``` + +### 4. Start a session + +```python +async def main(): + # Start the realtime session + session = await runner.run() + + async with session: + # Send a text message to start the conversation + await session.send_message("Hello! How are you today?") + + # The agent will stream back audio in real-time (not shown in this example) + # Listen for events from the session + async for event in session: + if event.type == "response.audio_transcript.done": + print(f"Assistant: {event.transcript}") + elif event.type == "conversation.item.input_audio_transcription.completed": + print(f"User: {event.transcript}") + +# Run the session +asyncio.run(main()) +``` + +## Complete example + +Here's a complete working example: + +```python +import asyncio +from agents.realtime import RealtimeAgent, RealtimeRunner + +async def main(): + # Create the agent + agent = RealtimeAgent( + name="Assistant", + instructions="You are a helpful voice assistant. Keep responses brief and conversational.", + ) + + # Set up the runner with configuration + runner = RealtimeRunner( + starting_agent=agent, + config={ + "model_settings": { + "model_name": "gpt-4o-realtime-preview", + "voice": "alloy", + "modalities": ["text", "audio"], + "input_audio_transcription": { + "model": "whisper-1" + }, + "turn_detection": { + "type": "server_vad", + "threshold": 0.5, + "prefix_padding_ms": 300, + "silence_duration_ms": 200 + } + } + } + ) + + # Start the session + session = await runner.run() + + async with session: + print("Session started! The agent will stream audio responses in real-time.") + + # Process events + async for event in session: + if event.type == "response.audio_transcript.done": + print(f"Assistant: {event.transcript}") + elif event.type == "conversation.item.input_audio_transcription.completed": + print(f"User: {event.transcript}") + elif event.type == "error": + print(f"Error: {event.error}") + break + +if __name__ == "__main__": + asyncio.run(main()) +``` + +## Configuration options + +### Model settings + +- `model_name`: Choose from available realtime models (e.g., `gpt-4o-realtime-preview`) +- `voice`: Select voice (`alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`) +- `modalities`: Enable text and/or audio (`["text", "audio"]`) + +### Audio settings + +- `input_audio_format`: Format for input audio (`pcm16`, `g711_ulaw`, `g711_alaw`) +- `output_audio_format`: Format for output audio +- `input_audio_transcription`: Transcription configuration + +### Turn detection + +- `type`: Detection method (`server_vad`, `semantic_vad`) +- `threshold`: Voice activity threshold (0.0-1.0) +- `silence_duration_ms`: Silence duration to detect turn end +- `prefix_padding_ms`: Audio padding before speech + +## Next steps + +- [Learn more about realtime agents](guide.md) +- Check out working examples in the [examples/realtime](https://github.com/openai/openai-agents-python/tree/main/examples/realtime) folder +- Add tools to your agent +- Implement handoffs between agents +- Set up guardrails for safety + +## Authentication + +Make sure your OpenAI API key is set in your environment: + +```bash +export OPENAI_API_KEY="your-api-key-here" +``` + +Or pass it directly when creating the session: + +```python +session = await runner.run(model_config={"api_key": "your-api-key"}) +``` diff --git a/docs/ref/realtime/agent.md b/docs/ref/realtime/agent.md new file mode 100644 index 000000000..d90833920 --- /dev/null +++ b/docs/ref/realtime/agent.md @@ -0,0 +1,3 @@ +# `RealtimeAgent` + +::: agents.realtime.agent.RealtimeAgent \ No newline at end of file diff --git a/docs/ref/realtime/config.md b/docs/ref/realtime/config.md new file mode 100644 index 000000000..3e50f47ad --- /dev/null +++ b/docs/ref/realtime/config.md @@ -0,0 +1,41 @@ +# Realtime Configuration + +## Run Configuration + +::: agents.realtime.config.RealtimeRunConfig + +## Model Settings + +::: agents.realtime.config.RealtimeSessionModelSettings + +## Audio Configuration + +::: agents.realtime.config.RealtimeInputAudioTranscriptionConfig +::: agents.realtime.config.RealtimeTurnDetectionConfig + +## Guardrails Settings + +::: agents.realtime.config.RealtimeGuardrailsSettings + +## Model Configuration + +::: agents.realtime.model.RealtimeModelConfig + +## Tracing Configuration + +::: agents.realtime.config.RealtimeModelTracingConfig + +## User Input Types + +::: agents.realtime.config.RealtimeUserInput +::: agents.realtime.config.RealtimeUserInputText +::: agents.realtime.config.RealtimeUserInputMessage + +## Client Messages + +::: agents.realtime.config.RealtimeClientMessage + +## Type Aliases + +::: agents.realtime.config.RealtimeModelName +::: agents.realtime.config.RealtimeAudioFormat \ No newline at end of file diff --git a/docs/ref/realtime/events.md b/docs/ref/realtime/events.md new file mode 100644 index 000000000..137d9a643 --- /dev/null +++ b/docs/ref/realtime/events.md @@ -0,0 +1,36 @@ +# Realtime Events + +## Session Events + +::: agents.realtime.events.RealtimeSessionEvent + +## Event Types + +### Agent Events +::: agents.realtime.events.RealtimeAgentStartEvent +::: agents.realtime.events.RealtimeAgentEndEvent + +### Audio Events +::: agents.realtime.events.RealtimeAudio +::: agents.realtime.events.RealtimeAudioEnd +::: agents.realtime.events.RealtimeAudioInterrupted + +### Tool Events +::: agents.realtime.events.RealtimeToolStart +::: agents.realtime.events.RealtimeToolEnd + +### Handoff Events +::: agents.realtime.events.RealtimeHandoffEvent + +### Guardrail Events +::: agents.realtime.events.RealtimeGuardrailTripped + +### History Events +::: agents.realtime.events.RealtimeHistoryAdded +::: agents.realtime.events.RealtimeHistoryUpdated + +### Error Events +::: agents.realtime.events.RealtimeError + +### Raw Model Events +::: agents.realtime.events.RealtimeRawModelEvent \ No newline at end of file diff --git a/docs/ref/realtime/runner.md b/docs/ref/realtime/runner.md new file mode 100644 index 000000000..b2d26bba5 --- /dev/null +++ b/docs/ref/realtime/runner.md @@ -0,0 +1,3 @@ +# `RealtimeRunner` + +::: agents.realtime.runner.RealtimeRunner \ No newline at end of file diff --git a/docs/ref/realtime/session.md b/docs/ref/realtime/session.md new file mode 100644 index 000000000..52ad0b09e --- /dev/null +++ b/docs/ref/realtime/session.md @@ -0,0 +1,3 @@ +# `RealtimeSession` + +::: agents.realtime.session.RealtimeSession \ No newline at end of file diff --git a/examples/realtime/app/README.md b/examples/realtime/app/README.md new file mode 100644 index 000000000..3a7176707 --- /dev/null +++ b/examples/realtime/app/README.md @@ -0,0 +1,40 @@ +# Realtime Demo App + +A web-based realtime voice assistant demo with a FastAPI backend and HTML/JS frontend. + +## Installation + +Install the required dependencies: + +```bash +uv add fastapi uvicorn websockets +``` + +## Usage + +Start the application with a single command: + +```bash +cd examples/realtime/app && uv run python server.py +``` + +Then open your browser to: http://localhost:8000 + +## How to Use + +1. Click **Connect** to establish a realtime session +2. Audio capture starts automatically - just speak naturally +3. Click the **Mic On/Off** button to mute/unmute your microphone +4. Watch the conversation unfold in the left pane +5. Monitor raw events in the right pane (click to expand/collapse) +6. Click **Disconnect** when done + +## Architecture + +- **Backend**: FastAPI server with WebSocket connections for real-time communication +- **Session Management**: Each connection gets a unique session with the OpenAI Realtime API +- **Audio Processing**: 24kHz mono audio capture and playback +- **Event Handling**: Full event stream processing with transcript generation +- **Frontend**: Vanilla JavaScript with clean, responsive CSS + +The demo showcases the core patterns for building realtime voice applications with the OpenAI Agents SDK. diff --git a/examples/realtime/app/server.py b/examples/realtime/app/server.py new file mode 100644 index 000000000..db2cd7bda --- /dev/null +++ b/examples/realtime/app/server.py @@ -0,0 +1,172 @@ +import asyncio +import base64 +import json +import logging +import struct +from contextlib import asynccontextmanager +from typing import Any, assert_never + +from fastapi import FastAPI, WebSocket, WebSocketDisconnect +from fastapi.responses import FileResponse +from fastapi.staticfiles import StaticFiles + +from agents import function_tool +from agents.realtime import RealtimeAgent, RealtimeRunner, RealtimeSession, RealtimeSessionEvent + +logging.basicConfig(level=logging.INFO) +logger = logging.getLogger(__name__) + + +@function_tool +def get_weather(city: str) -> str: + """Get the weather in a city.""" + return f"The weather in {city} is sunny." + + +@function_tool +def get_secret_number() -> int: + """Returns the secret number, if the user asks for it.""" + return 71 + + +haiku_agent = RealtimeAgent( + name="Haiku Agent", + instructions="You are a haiku poet. You must respond ONLY in traditional haiku format (5-7-5 syllables). Every response should be a proper haiku about the topic. Do not break character.", + tools=[], +) + +agent = RealtimeAgent( + name="Assistant", + instructions="If the user wants poetry or haikus, you can hand them off to the haiku agent via the transfer_to_haiku_agent tool.", + tools=[get_weather, get_secret_number], + handoffs=[haiku_agent], +) + + +class RealtimeWebSocketManager: + def __init__(self): + self.active_sessions: dict[str, RealtimeSession] = {} + self.session_contexts: dict[str, Any] = {} + self.websockets: dict[str, WebSocket] = {} + + async def connect(self, websocket: WebSocket, session_id: str): + await websocket.accept() + self.websockets[session_id] = websocket + + runner = RealtimeRunner(agent) + session_context = await runner.run() + session = await session_context.__aenter__() + self.active_sessions[session_id] = session + self.session_contexts[session_id] = session_context + + # Start event processing task + asyncio.create_task(self._process_events(session_id)) + + async def disconnect(self, session_id: str): + if session_id in self.session_contexts: + await self.session_contexts[session_id].__aexit__(None, None, None) + del self.session_contexts[session_id] + if session_id in self.active_sessions: + del self.active_sessions[session_id] + if session_id in self.websockets: + del self.websockets[session_id] + + async def send_audio(self, session_id: str, audio_bytes: bytes): + if session_id in self.active_sessions: + await self.active_sessions[session_id].send_audio(audio_bytes) + + async def _process_events(self, session_id: str): + try: + session = self.active_sessions[session_id] + websocket = self.websockets[session_id] + + async for event in session: + event_data = await self._serialize_event(event) + await websocket.send_text(json.dumps(event_data)) + except Exception as e: + logger.error(f"Error processing events for session {session_id}: {e}") + + async def _serialize_event(self, event: RealtimeSessionEvent) -> dict[str, Any]: + base_event: dict[str, Any] = { + "type": event.type, + } + + if event.type == "agent_start": + base_event["agent"] = event.agent.name + elif event.type == "agent_end": + base_event["agent"] = event.agent.name + elif event.type == "handoff": + base_event["from"] = event.from_agent.name + base_event["to"] = event.to_agent.name + elif event.type == "tool_start": + base_event["tool"] = event.tool.name + elif event.type == "tool_end": + base_event["tool"] = event.tool.name + base_event["output"] = str(event.output) + elif event.type == "audio": + base_event["audio"] = base64.b64encode(event.audio.data).decode("utf-8") + elif event.type == "audio_interrupted": + pass + elif event.type == "audio_end": + pass + elif event.type == "history_updated": + base_event["history"] = [item.model_dump(mode="json") for item in event.history] + elif event.type == "history_added": + pass + elif event.type == "guardrail_tripped": + base_event["guardrail_results"] = [ + {"name": result.guardrail.name} for result in event.guardrail_results + ] + elif event.type == "raw_model_event": + base_event["raw_model_event"] = { + "type": event.data.type, + } + elif event.type == "error": + base_event["error"] = str(event.error) if hasattr(event, "error") else "Unknown error" + else: + assert_never(event) + + return base_event + + +manager = RealtimeWebSocketManager() + + +@asynccontextmanager +async def lifespan(app: FastAPI): + yield + + +app = FastAPI(lifespan=lifespan) + + +@app.websocket("/ws/{session_id}") +async def websocket_endpoint(websocket: WebSocket, session_id: str): + await manager.connect(websocket, session_id) + try: + while True: + data = await websocket.receive_text() + message = json.loads(data) + + if message["type"] == "audio": + # Convert int16 array to bytes + int16_data = message["data"] + audio_bytes = struct.pack(f"{len(int16_data)}h", *int16_data) + await manager.send_audio(session_id, audio_bytes) + + except WebSocketDisconnect: + await manager.disconnect(session_id) + + +app.mount("/", StaticFiles(directory="static", html=True), name="static") + + +@app.get("/") +async def read_index(): + return FileResponse("static/index.html") + + +if __name__ == "__main__": + import uvicorn + + uvicorn.run(app, host="0.0.0.0", port=8000) diff --git a/examples/realtime/app/static/app.js b/examples/realtime/app/static/app.js new file mode 100644 index 000000000..3ec8fcc99 --- /dev/null +++ b/examples/realtime/app/static/app.js @@ -0,0 +1,467 @@ +class RealtimeDemo { + constructor() { + this.ws = null; + this.isConnected = false; + this.isMuted = false; + this.isCapturing = false; + this.audioContext = null; + this.processor = null; + this.stream = null; + this.sessionId = this.generateSessionId(); + + // Audio playback queue + this.audioQueue = []; + this.isPlayingAudio = false; + this.playbackAudioContext = null; + this.currentAudioSource = null; + + this.initializeElements(); + this.setupEventListeners(); + } + + initializeElements() { + this.connectBtn = document.getElementById('connectBtn'); + this.muteBtn = document.getElementById('muteBtn'); + this.status = document.getElementById('status'); + this.messagesContent = document.getElementById('messagesContent'); + this.eventsContent = document.getElementById('eventsContent'); + this.toolsContent = document.getElementById('toolsContent'); + } + + setupEventListeners() { + this.connectBtn.addEventListener('click', () => { + if (this.isConnected) { + this.disconnect(); + } else { + this.connect(); + } + }); + + this.muteBtn.addEventListener('click', () => { + this.toggleMute(); + }); + } + + generateSessionId() { + return 'session_' + Math.random().toString(36).substr(2, 9); + } + + async connect() { + try { + this.ws = new WebSocket(`ws://localhost:8000/ws/${this.sessionId}`); + + this.ws.onopen = () => { + this.isConnected = true; + this.updateConnectionUI(); + this.startContinuousCapture(); + }; + + this.ws.onmessage = (event) => { + const data = JSON.parse(event.data); + this.handleRealtimeEvent(data); + }; + + this.ws.onclose = () => { + this.isConnected = false; + this.updateConnectionUI(); + }; + + this.ws.onerror = (error) => { + console.error('WebSocket error:', error); + }; + + } catch (error) { + console.error('Failed to connect:', error); + } + } + + disconnect() { + if (this.ws) { + this.ws.close(); + } + this.stopContinuousCapture(); + } + + updateConnectionUI() { + if (this.isConnected) { + this.connectBtn.textContent = 'Disconnect'; + this.connectBtn.className = 'connect-btn connected'; + this.status.textContent = 'Connected'; + this.status.className = 'status connected'; + this.muteBtn.disabled = false; + } else { + this.connectBtn.textContent = 'Connect'; + this.connectBtn.className = 'connect-btn disconnected'; + this.status.textContent = 'Disconnected'; + this.status.className = 'status disconnected'; + this.muteBtn.disabled = true; + } + } + + toggleMute() { + this.isMuted = !this.isMuted; + this.updateMuteUI(); + } + + updateMuteUI() { + if (this.isMuted) { + this.muteBtn.textContent = 'πŸ”‡ Mic Off'; + this.muteBtn.className = 'mute-btn muted'; + } else { + this.muteBtn.textContent = '🎀 Mic On'; + this.muteBtn.className = 'mute-btn unmuted'; + if (this.isCapturing) { + this.muteBtn.classList.add('active'); + } + } + } + + async startContinuousCapture() { + if (!this.isConnected || this.isCapturing) return; + + // Check if getUserMedia is available + if (!navigator.mediaDevices || !navigator.mediaDevices.getUserMedia) { + throw new Error('getUserMedia not available. Please use HTTPS or localhost.'); + } + + try { + this.stream = await navigator.mediaDevices.getUserMedia({ + audio: { + sampleRate: 24000, + channelCount: 1, + echoCancellation: true, + noiseSuppression: true + } + }); + + this.audioContext = new AudioContext({ sampleRate: 24000 }); + const source = this.audioContext.createMediaStreamSource(this.stream); + + // Create a script processor to capture audio data + this.processor = this.audioContext.createScriptProcessor(4096, 1, 1); + source.connect(this.processor); + this.processor.connect(this.audioContext.destination); + + this.processor.onaudioprocess = (event) => { + if (!this.isMuted && this.ws && this.ws.readyState === WebSocket.OPEN) { + const inputBuffer = event.inputBuffer.getChannelData(0); + const int16Buffer = new Int16Array(inputBuffer.length); + + // Convert float32 to int16 + for (let i = 0; i < inputBuffer.length; i++) { + int16Buffer[i] = Math.max(-32768, Math.min(32767, inputBuffer[i] * 32768)); + } + + this.ws.send(JSON.stringify({ + type: 'audio', + data: Array.from(int16Buffer) + })); + } + }; + + this.isCapturing = true; + this.updateMuteUI(); + + } catch (error) { + console.error('Failed to start audio capture:', error); + } + } + + stopContinuousCapture() { + if (!this.isCapturing) return; + + this.isCapturing = false; + + if (this.processor) { + this.processor.disconnect(); + this.processor = null; + } + + if (this.audioContext) { + this.audioContext.close(); + this.audioContext = null; + } + + if (this.stream) { + this.stream.getTracks().forEach(track => track.stop()); + this.stream = null; + } + + this.updateMuteUI(); + } + + handleRealtimeEvent(event) { + // Add to raw events pane + this.addRawEvent(event); + + // Add to tools panel if it's a tool or handoff event + if (event.type === 'tool_start' || event.type === 'tool_end' || event.type === 'handoff') { + this.addToolEvent(event); + } + + // Handle specific event types + switch (event.type) { + case 'audio': + this.playAudio(event.audio); + break; + case 'audio_interrupted': + this.stopAudioPlayback(); + break; + case 'history_updated': + this.updateMessagesFromHistory(event.history); + break; + } + } + + + updateMessagesFromHistory(history) { + console.log('updateMessagesFromHistory called with:', history); + + // Clear all existing messages + this.messagesContent.innerHTML = ''; + + // Add messages from history + if (history && Array.isArray(history)) { + console.log('Processing history array with', history.length, 'items'); + history.forEach((item, index) => { + console.log(`History item ${index}:`, item); + if (item.type === 'message') { + const role = item.role; + let content = ''; + + console.log(`Message item - role: ${role}, content:`, item.content); + + if (item.content && Array.isArray(item.content)) { + // Extract text from content array + item.content.forEach(contentPart => { + console.log('Content part:', contentPart); + if (contentPart.type === 'text' && contentPart.text) { + content += contentPart.text; + } else if (contentPart.type === 'input_text' && contentPart.text) { + content += contentPart.text; + } else if (contentPart.type === 'input_audio' && contentPart.transcript) { + content += contentPart.transcript; + } else if (contentPart.type === 'audio' && contentPart.transcript) { + content += contentPart.transcript; + } + }); + } + + console.log(`Final content for ${role}:`, content); + + if (content.trim()) { + this.addMessage(role, content.trim()); + console.log(`Added message: ${role} - ${content.trim()}`); + } + } else { + console.log(`Skipping non-message item of type: ${item.type}`); + } + }); + } else { + console.log('History is not an array or is null/undefined'); + } + + this.scrollToBottom(); + } + + addMessage(type, content) { + const messageDiv = document.createElement('div'); + messageDiv.className = `message ${type}`; + + const bubbleDiv = document.createElement('div'); + bubbleDiv.className = 'message-bubble'; + bubbleDiv.textContent = content; + + messageDiv.appendChild(bubbleDiv); + this.messagesContent.appendChild(messageDiv); + this.scrollToBottom(); + + return messageDiv; + } + + addRawEvent(event) { + const eventDiv = document.createElement('div'); + eventDiv.className = 'event'; + + const headerDiv = document.createElement('div'); + headerDiv.className = 'event-header'; + headerDiv.innerHTML = ` + ${event.type} + β–Ό + `; + + const contentDiv = document.createElement('div'); + contentDiv.className = 'event-content collapsed'; + contentDiv.textContent = JSON.stringify(event, null, 2); + + headerDiv.addEventListener('click', () => { + const isCollapsed = contentDiv.classList.contains('collapsed'); + contentDiv.classList.toggle('collapsed'); + headerDiv.querySelector('span:last-child').textContent = isCollapsed ? 'β–²' : 'β–Ό'; + }); + + eventDiv.appendChild(headerDiv); + eventDiv.appendChild(contentDiv); + this.eventsContent.appendChild(eventDiv); + + // Auto-scroll events pane + this.eventsContent.scrollTop = this.eventsContent.scrollHeight; + } + + addToolEvent(event) { + const eventDiv = document.createElement('div'); + eventDiv.className = 'event'; + + let title = ''; + let description = ''; + let eventClass = ''; + + if (event.type === 'handoff') { + title = `πŸ”„ Handoff`; + description = `From ${event.from} to ${event.to}`; + eventClass = 'handoff'; + } else if (event.type === 'tool_start') { + title = `πŸ”§ Tool Started`; + description = `Running ${event.tool}`; + eventClass = 'tool'; + } else if (event.type === 'tool_end') { + title = `βœ… Tool Completed`; + description = `${event.tool}: ${event.output || 'No output'}`; + eventClass = 'tool'; + } + + eventDiv.innerHTML = ` +
+
+
${title}
+
${description}
+
+ ${new Date().toLocaleTimeString()} +
+ `; + + this.toolsContent.appendChild(eventDiv); + + // Auto-scroll tools pane + this.toolsContent.scrollTop = this.toolsContent.scrollHeight; + } + + async playAudio(audioBase64) { + try { + if (!audioBase64 || audioBase64.length === 0) { + console.warn('Received empty audio data, skipping playback'); + return; + } + + // Add to queue + this.audioQueue.push(audioBase64); + + // Start processing queue if not already playing + if (!this.isPlayingAudio) { + this.processAudioQueue(); + } + + } catch (error) { + console.error('Failed to play audio:', error); + } + } + + async processAudioQueue() { + if (this.isPlayingAudio || this.audioQueue.length === 0) { + return; + } + + this.isPlayingAudio = true; + + // Initialize audio context if needed + if (!this.playbackAudioContext) { + this.playbackAudioContext = new AudioContext({ sampleRate: 24000 }); + } + + while (this.audioQueue.length > 0) { + const audioBase64 = this.audioQueue.shift(); + await this.playAudioChunk(audioBase64); + } + + this.isPlayingAudio = false; + } + + async playAudioChunk(audioBase64) { + return new Promise((resolve, reject) => { + try { + // Decode base64 to ArrayBuffer + const binaryString = atob(audioBase64); + const bytes = new Uint8Array(binaryString.length); + for (let i = 0; i < binaryString.length; i++) { + bytes[i] = binaryString.charCodeAt(i); + } + + const int16Array = new Int16Array(bytes.buffer); + + if (int16Array.length === 0) { + console.warn('Audio chunk has no samples, skipping'); + resolve(); + return; + } + + const float32Array = new Float32Array(int16Array.length); + + // Convert int16 to float32 + for (let i = 0; i < int16Array.length; i++) { + float32Array[i] = int16Array[i] / 32768.0; + } + + const audioBuffer = this.playbackAudioContext.createBuffer(1, float32Array.length, 24000); + audioBuffer.getChannelData(0).set(float32Array); + + const source = this.playbackAudioContext.createBufferSource(); + source.buffer = audioBuffer; + source.connect(this.playbackAudioContext.destination); + + // Store reference to current source + this.currentAudioSource = source; + + source.onended = () => { + this.currentAudioSource = null; + resolve(); + }; + source.start(); + + } catch (error) { + console.error('Failed to play audio chunk:', error); + reject(error); + } + }); + } + + stopAudioPlayback() { + console.log('Stopping audio playback due to interruption'); + + // Stop current audio source if playing + if (this.currentAudioSource) { + try { + this.currentAudioSource.stop(); + this.currentAudioSource = null; + } catch (error) { + console.error('Error stopping audio source:', error); + } + } + + // Clear the audio queue + this.audioQueue = []; + + // Reset playback state + this.isPlayingAudio = false; + + console.log('Audio playback stopped and queue cleared'); + } + + scrollToBottom() { + this.messagesContent.scrollTop = this.messagesContent.scrollHeight; + } +} + +// Initialize the demo when the page loads +document.addEventListener('DOMContentLoaded', () => { + new RealtimeDemo(); +}); \ No newline at end of file diff --git a/examples/realtime/app/static/index.html b/examples/realtime/app/static/index.html new file mode 100644 index 000000000..fbd0de46d --- /dev/null +++ b/examples/realtime/app/static/index.html @@ -0,0 +1,295 @@ + + + + + + Realtime Demo + + + +
+

Realtime Demo

+ +
+ +
+
+
+ Conversation +
+
+ +
+
+ + Disconnected +
+
+ +
+
+
+ Event stream +
+
+ +
+
+ +
+
+ Tools & Handoffs +
+
+ +
+
+
+
+ + + + \ No newline at end of file diff --git a/examples/realtime/cli/demo.py b/examples/realtime/cli/demo.py new file mode 100644 index 000000000..be610b43e --- /dev/null +++ b/examples/realtime/cli/demo.py @@ -0,0 +1,253 @@ +import asyncio +import queue +import sys +import threading +from typing import Any + +import numpy as np +import sounddevice as sd + +from agents import function_tool +from agents.realtime import RealtimeAgent, RealtimeRunner, RealtimeSession, RealtimeSessionEvent + +# Audio configuration +CHUNK_LENGTH_S = 0.05 # 50ms +SAMPLE_RATE = 24000 +FORMAT = np.int16 +CHANNELS = 1 + +# Set up logging for OpenAI agents SDK +# logging.basicConfig( +# level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s" +# ) +# logger.logger.setLevel(logging.ERROR) + + +@function_tool +def get_weather(city: str) -> str: + """Get the weather in a city.""" + return f"The weather in {city} is sunny." + + +agent = RealtimeAgent( + name="Assistant", + instructions="You always greet the user with 'Top of the morning to you'.", + tools=[get_weather], +) + + +def _truncate_str(s: str, max_length: int) -> str: + if len(s) > max_length: + return s[:max_length] + "..." + return s + + +class NoUIDemo: + def __init__(self) -> None: + self.session: RealtimeSession | None = None + self.audio_stream: sd.InputStream | None = None + self.audio_player: sd.OutputStream | None = None + self.recording = False + + # Audio output state for callback system + self.output_queue: queue.Queue[Any] = queue.Queue(maxsize=10) # Buffer more chunks + self.interrupt_event = threading.Event() + self.current_audio_chunk: np.ndarray | None = None # type: ignore + self.chunk_position = 0 + + def _output_callback(self, outdata, frames: int, time, status) -> None: + """Callback for audio output - handles continuous audio stream from server.""" + if status: + print(f"Output callback status: {status}") + + # Check if we should clear the queue due to interrupt + if self.interrupt_event.is_set(): + # Clear the queue and current chunk state + while not self.output_queue.empty(): + try: + self.output_queue.get_nowait() + except queue.Empty: + break + self.current_audio_chunk = None + self.chunk_position = 0 + self.interrupt_event.clear() + outdata.fill(0) + return + + # Fill output buffer from queue and current chunk + outdata.fill(0) # Start with silence + samples_filled = 0 + + while samples_filled < len(outdata): + # If we don't have a current chunk, try to get one from queue + if self.current_audio_chunk is None: + try: + self.current_audio_chunk = self.output_queue.get_nowait() + self.chunk_position = 0 + except queue.Empty: + # No more audio data available - this causes choppiness + # Uncomment next line to debug underruns: + # print(f"Audio underrun: {samples_filled}/{len(outdata)} samples filled") + break + + # Copy data from current chunk to output buffer + remaining_output = len(outdata) - samples_filled + remaining_chunk = len(self.current_audio_chunk) - self.chunk_position + samples_to_copy = min(remaining_output, remaining_chunk) + + if samples_to_copy > 0: + chunk_data = self.current_audio_chunk[ + self.chunk_position : self.chunk_position + samples_to_copy + ] + # More efficient: direct assignment for mono audio instead of reshape + outdata[samples_filled : samples_filled + samples_to_copy, 0] = chunk_data + samples_filled += samples_to_copy + self.chunk_position += samples_to_copy + + # If we've used up the entire chunk, reset for next iteration + if self.chunk_position >= len(self.current_audio_chunk): + self.current_audio_chunk = None + self.chunk_position = 0 + + async def run(self) -> None: + print("Connecting, may take a few seconds...") + + # Initialize audio player with callback + chunk_size = int(SAMPLE_RATE * CHUNK_LENGTH_S) + self.audio_player = sd.OutputStream( + channels=CHANNELS, + samplerate=SAMPLE_RATE, + dtype=FORMAT, + callback=self._output_callback, + blocksize=chunk_size, # Match our chunk timing for better alignment + ) + self.audio_player.start() + + try: + runner = RealtimeRunner(agent) + async with await runner.run() as session: + self.session = session + print("Connected. Starting audio recording...") + + # Start audio recording + await self.start_audio_recording() + print("Audio recording started. You can start speaking - expect lots of logs!") + + # Process session events + async for event in session: + await self._on_event(event) + + finally: + # Clean up audio player + if self.audio_player and self.audio_player.active: + self.audio_player.stop() + if self.audio_player: + self.audio_player.close() + + print("Session ended") + + async def start_audio_recording(self) -> None: + """Start recording audio from the microphone.""" + # Set up audio input stream + self.audio_stream = sd.InputStream( + channels=CHANNELS, + samplerate=SAMPLE_RATE, + dtype=FORMAT, + ) + + self.audio_stream.start() + self.recording = True + + # Start audio capture task + asyncio.create_task(self.capture_audio()) + + async def capture_audio(self) -> None: + """Capture audio from the microphone and send to the session.""" + if not self.audio_stream or not self.session: + return + + # Buffer size in samples + read_size = int(SAMPLE_RATE * CHUNK_LENGTH_S) + + try: + while self.recording: + # Check if there's enough data to read + if self.audio_stream.read_available < read_size: + await asyncio.sleep(0.01) + continue + + # Read audio data + data, _ = self.audio_stream.read(read_size) + + # Convert numpy array to bytes + audio_bytes = data.tobytes() + + # Send audio to session + await self.session.send_audio(audio_bytes) + + # Yield control back to event loop + await asyncio.sleep(0) + + except Exception as e: + print(f"Audio capture error: {e}") + finally: + if self.audio_stream and self.audio_stream.active: + self.audio_stream.stop() + if self.audio_stream: + self.audio_stream.close() + + async def _on_event(self, event: RealtimeSessionEvent) -> None: + """Handle session events.""" + try: + if event.type == "agent_start": + print(f"Agent started: {event.agent.name}") + elif event.type == "agent_end": + print(f"Agent ended: {event.agent.name}") + elif event.type == "handoff": + print(f"Handoff from {event.from_agent.name} to {event.to_agent.name}") + elif event.type == "tool_start": + print(f"Tool started: {event.tool.name}") + elif event.type == "tool_end": + print(f"Tool ended: {event.tool.name}; output: {event.output}") + elif event.type == "audio_end": + print("Audio ended") + elif event.type == "audio": + # Enqueue audio for callback-based playback + np_audio = np.frombuffer(event.audio.data, dtype=np.int16) + try: + self.output_queue.put_nowait(np_audio) + except queue.Full: + # Queue is full - only drop if we have significant backlog + # This prevents aggressive dropping that could cause choppiness + if self.output_queue.qsize() > 8: # Keep some buffer + try: + self.output_queue.get_nowait() + self.output_queue.put_nowait(np_audio) + except queue.Empty: + pass + # If queue isn't too full, just skip this chunk to avoid blocking + elif event.type == "audio_interrupted": + print("Audio interrupted") + # Signal the output callback to clear its queue and state + self.interrupt_event.set() + elif event.type == "error": + print(f"Error: {event.error}") + elif event.type == "history_updated": + pass # Skip these frequent events + elif event.type == "history_added": + pass # Skip these frequent events + elif event.type == "raw_model_event": + print(f"Raw model event: {_truncate_str(str(event.data), 50)}") + else: + print(f"Unknown event type: {event.type}") + except Exception as e: + print(f"Error processing event: {_truncate_str(str(e), 50)}") + + +if __name__ == "__main__": + demo = NoUIDemo() + try: + asyncio.run(demo.run()) + except KeyboardInterrupt: + print("\nExiting...") + sys.exit(0) diff --git a/examples/realtime/ui.py b/examples/realtime/cli/ui.py similarity index 100% rename from examples/realtime/ui.py rename to examples/realtime/cli/ui.py diff --git a/examples/realtime/demo.py b/examples/realtime/demo.py deleted file mode 100644 index 3db051963..000000000 --- a/examples/realtime/demo.py +++ /dev/null @@ -1,115 +0,0 @@ -import asyncio -import os -import sys -from typing import TYPE_CHECKING - -import numpy as np - -from agents.realtime import RealtimeSession - -# Add the current directory to path so we can import ui -sys.path.append(os.path.dirname(os.path.abspath(__file__))) - -from agents import function_tool -from agents.realtime import RealtimeAgent, RealtimeRunner, RealtimeSessionEvent - -if TYPE_CHECKING: - from .ui import AppUI -else: - # Try both import styles - try: - # Try relative import first (when used as a package) - from .ui import AppUI - except ImportError: - # Fall back to direct import (when run as a script) - from ui import AppUI - - -@function_tool -def get_weather(city: str) -> str: - """Get the weather in a city.""" - return f"The weather in {city} is sunny." - - -agent = RealtimeAgent( - name="Assistant", - instructions="You always greet the user with 'Top of the morning to you'.", - tools=[get_weather], -) - - -def _truncate_str(s: str, max_length: int) -> str: - if len(s) > max_length: - return s[:max_length] + "..." - return s - - -class Example: - def __init__(self) -> None: - self.ui = AppUI() - self.ui.connected = asyncio.Event() - self.ui.last_audio_item_id = None - # Set the audio callback - self.ui.set_audio_callback(self.on_audio_recorded) - - self.session: RealtimeSession | None = None - - async def run(self) -> None: - # Start UI in a separate task instead of waiting for it to complete - ui_task = asyncio.create_task(self.ui.run_async()) - - # Set up session immediately without waiting for UI to finish - runner = RealtimeRunner(agent) - async with await runner.run() as session: - self.session = session - self.ui.set_is_connected(True) - async for event in session: - await self._on_event(event) - print("done") - - # Wait for UI task to complete when session ends - await ui_task - - async def on_audio_recorded(self, audio_bytes: bytes) -> None: - # Send the audio to the session - assert self.session is not None - await self.session.send_audio(audio_bytes) - - async def _on_event(self, event: RealtimeSessionEvent) -> None: - try: - if event.type == "agent_start": - self.ui.add_transcript(f"Agent started: {event.agent.name}") - elif event.type == "agent_end": - self.ui.add_transcript(f"Agent ended: {event.agent.name}") - elif event.type == "handoff": - self.ui.add_transcript( - f"Handoff from {event.from_agent.name} to {event.to_agent.name}" - ) - elif event.type == "tool_start": - self.ui.add_transcript(f"Tool started: {event.tool.name}") - elif event.type == "tool_end": - self.ui.add_transcript(f"Tool ended: {event.tool.name}; output: {event.output}") - elif event.type == "audio_end": - self.ui.add_transcript("Audio ended") - elif event.type == "audio": - np_audio = np.frombuffer(event.audio.data, dtype=np.int16) - self.ui.play_audio(np_audio) - elif event.type == "audio_interrupted": - self.ui.add_transcript("Audio interrupted") - elif event.type == "error": - pass - elif event.type == "history_updated": - pass - elif event.type == "history_added": - pass - elif event.type == "raw_model_event": - self.ui.log_message(f"Raw model event: {_truncate_str(str(event.data), 50)}") - else: - self.ui.log_message(f"Unknown event type: {event.type}") - except Exception as e: - self.ui.log_message(f"Error processing event: {_truncate_str(str(e), 50)}") - - -if __name__ == "__main__": - example = Example() - asyncio.run(example.run()) diff --git a/mkdocs.yml b/mkdocs.yml index 19529bf30..9e7f7aeec 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -78,6 +78,9 @@ plugins: - voice/quickstart.md - voice/pipeline.md - voice/tracing.md + - Realtime agents: + - realtime/quickstart.md + - realtime/guide.md - API Reference: - Agents: - ref/index.md @@ -115,6 +118,12 @@ plugins: - ref/tracing/setup.md - ref/tracing/span_data.md - ref/tracing/util.md + - Realtime: + - ref/realtime/agent.md + - ref/realtime/runner.md + - ref/realtime/session.md + - ref/realtime/events.md + - ref/realtime/config.md - Voice: - ref/voice/pipeline.md - ref/voice/workflow.md @@ -163,6 +172,9 @@ plugins: - voice/quickstart.md - voice/pipeline.md - voice/tracing.md + - γƒͺγ‚’γƒ«γ‚Ώγ‚€γƒ γ‚¨γƒΌγ‚Έγ‚§γƒ³γƒˆ: + - realtime/quickstart.md + - realtime/guide.md extra: # Remove material generation message in footer diff --git a/pyproject.toml b/pyproject.toml index 0f9b70852..b72ccd594 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,19 +1,19 @@ [project] name = "openai-agents" -version = "0.2.0" +version = "0.2.1" description = "OpenAI Agents SDK" readme = "README.md" requires-python = ">=3.9" license = "MIT" authors = [{ name = "OpenAI", email = "support@openai.com" }] dependencies = [ - "openai>=1.93.1, <2", + "openai>=1.96.1, <2", "pydantic>=2.10, <3", "griffe>=1.5.6, <2", "typing-extensions>=4.12.2, <5", "requests>=2.0, <3", "types-requests>=2.0, <3", - "mcp>=1.9.4, <2; python_version >= '3.10'", + "mcp>=1.11.0, <2; python_version >= '3.10'", ] classifiers = [ "Typing :: Typed", diff --git a/src/agents/agent.py b/src/agents/agent.py index 9c107a81b..b67a12c0d 100644 --- a/src/agents/agent.py +++ b/src/agents/agent.py @@ -158,7 +158,7 @@ class Agent(AgentBase, Generic[TContext]): usable with OpenAI models, using the Responses API. """ - handoffs: list[Agent[Any] | Handoff[TContext]] = field(default_factory=list) + handoffs: list[Agent[Any] | Handoff[TContext, Any]] = field(default_factory=list) """Handoffs are sub-agents that the agent can delegate to. You can provide a list of handoffs, and the agent can choose to delegate to them if relevant. Allows for separation of concerns and modularity. diff --git a/src/agents/agent_output.py b/src/agents/agent_output.py index ee14956e3..fc58643bd 100644 --- a/src/agents/agent_output.py +++ b/src/agents/agent_output.py @@ -115,8 +115,8 @@ def __init__(self, output_type: type[Any], strict_json_schema: bool = True): except UserError as e: raise UserError( "Strict JSON schema is enabled, but the output type is not valid. " - "Either make the output type strict, or pass output_schema_strict=False to " - "your Agent()" + "Either make the output type strict, " + "or wrap your type with AgentOutputSchema(your_type, strict_json_schema=False)" ) from e def is_plain_text(self) -> bool: diff --git a/src/agents/guardrail.py b/src/agents/guardrail.py index f8a272b53..2bb0f014e 100644 --- a/src/agents/guardrail.py +++ b/src/agents/guardrail.py @@ -244,7 +244,7 @@ def decorator( return InputGuardrail( guardrail_function=f, # If not set, guardrail name uses the function’s name by default. - name=name if name else f.__name__ + name=name if name else f.__name__, ) if func is not None: diff --git a/src/agents/handoffs.py b/src/agents/handoffs.py index cb2752e4f..1ad8831f0 100644 --- a/src/agents/handoffs.py +++ b/src/agents/handoffs.py @@ -18,12 +18,15 @@ from .util._types import MaybeAwaitable if TYPE_CHECKING: - from .agent import Agent + from .agent import Agent, AgentBase # The handoff input type is the type of data passed when the agent is called via a handoff. THandoffInput = TypeVar("THandoffInput", default=Any) +# The agent type that the handoff returns +TAgent = TypeVar("TAgent", bound="AgentBase[Any]", default="Agent[Any]") + OnHandoffWithInput = Callable[[RunContextWrapper[Any], THandoffInput], Any] OnHandoffWithoutInput = Callable[[RunContextWrapper[Any]], Any] @@ -52,7 +55,7 @@ class HandoffInputData: @dataclass -class Handoff(Generic[TContext]): +class Handoff(Generic[TContext, TAgent]): """A handoff is when an agent delegates a task to another agent. For example, in a customer support scenario you might have a "triage agent" that determines which agent should handle the user's request, and sub-agents that specialize in different @@ -69,7 +72,7 @@ class Handoff(Generic[TContext]): """The JSON schema for the handoff input. Can be empty if the handoff does not take an input. """ - on_invoke_handoff: Callable[[RunContextWrapper[Any], str], Awaitable[Agent[TContext]]] + on_invoke_handoff: Callable[[RunContextWrapper[Any], str], Awaitable[TAgent]] """The function that invokes the handoff. The parameters passed are: 1. The handoff run context 2. The arguments from the LLM, as a JSON string. Empty string if input_json_schema is empty. @@ -100,20 +103,22 @@ class Handoff(Generic[TContext]): True, as it increases the likelihood of correct JSON input. """ - is_enabled: bool | Callable[[RunContextWrapper[Any], Agent[Any]], MaybeAwaitable[bool]] = True + is_enabled: bool | Callable[[RunContextWrapper[Any], AgentBase[Any]], MaybeAwaitable[bool]] = ( + True + ) """Whether the handoff is enabled. Either a bool or a Callable that takes the run context and agent and returns whether the handoff is enabled. You can use this to dynamically enable/disable a handoff based on your context/state.""" - def get_transfer_message(self, agent: Agent[Any]) -> str: + def get_transfer_message(self, agent: AgentBase[Any]) -> str: return json.dumps({"assistant": agent.name}) @classmethod - def default_tool_name(cls, agent: Agent[Any]) -> str: + def default_tool_name(cls, agent: AgentBase[Any]) -> str: return _transforms.transform_string_function_style(f"transfer_to_{agent.name}") @classmethod - def default_tool_description(cls, agent: Agent[Any]) -> str: + def default_tool_description(cls, agent: AgentBase[Any]) -> str: return ( f"Handoff to the {agent.name} agent to handle the request. " f"{agent.handoff_description or ''}" @@ -128,7 +133,7 @@ def handoff( tool_description_override: str | None = None, input_filter: Callable[[HandoffInputData], HandoffInputData] | None = None, is_enabled: bool | Callable[[RunContextWrapper[Any], Agent[Any]], MaybeAwaitable[bool]] = True, -) -> Handoff[TContext]: ... +) -> Handoff[TContext, Agent[TContext]]: ... @overload @@ -141,7 +146,7 @@ def handoff( tool_name_override: str | None = None, input_filter: Callable[[HandoffInputData], HandoffInputData] | None = None, is_enabled: bool | Callable[[RunContextWrapper[Any], Agent[Any]], MaybeAwaitable[bool]] = True, -) -> Handoff[TContext]: ... +) -> Handoff[TContext, Agent[TContext]]: ... @overload @@ -153,7 +158,7 @@ def handoff( tool_name_override: str | None = None, input_filter: Callable[[HandoffInputData], HandoffInputData] | None = None, is_enabled: bool | Callable[[RunContextWrapper[Any], Agent[Any]], MaybeAwaitable[bool]] = True, -) -> Handoff[TContext]: ... +) -> Handoff[TContext, Agent[TContext]]: ... def handoff( @@ -163,8 +168,9 @@ def handoff( on_handoff: OnHandoffWithInput[THandoffInput] | OnHandoffWithoutInput | None = None, input_type: type[THandoffInput] | None = None, input_filter: Callable[[HandoffInputData], HandoffInputData] | None = None, - is_enabled: bool | Callable[[RunContextWrapper[Any], Agent[Any]], MaybeAwaitable[bool]] = True, -) -> Handoff[TContext]: + is_enabled: bool + | Callable[[RunContextWrapper[Any], Agent[TContext]], MaybeAwaitable[bool]] = True, +) -> Handoff[TContext, Agent[TContext]]: """Create a handoff from an agent. Args: @@ -202,7 +208,7 @@ def handoff( async def _invoke_handoff( ctx: RunContextWrapper[Any], input_json: str | None = None - ) -> Agent[Any]: + ) -> Agent[TContext]: if input_type is not None and type_adapter is not None: if input_json is None: _error_tracing.attach_error_to_current_span( @@ -239,6 +245,18 @@ async def _invoke_handoff( # If there is a need, we can make this configurable in the future input_json_schema = ensure_strict_json_schema(input_json_schema) + async def _is_enabled(ctx: RunContextWrapper[Any], agent_base: AgentBase[Any]) -> bool: + from .agent import Agent + + assert callable(is_enabled), "is_enabled must be non-null here" + assert isinstance(agent_base, Agent), "Can't handoff to a non-Agent" + result = is_enabled(ctx, agent_base) + + if inspect.isawaitable(result): + return await result + + return result + return Handoff( tool_name=tool_name, tool_description=tool_description, @@ -246,5 +264,5 @@ async def _invoke_handoff( on_invoke_handoff=_invoke_handoff, input_filter=input_filter, agent_name=agent.name, - is_enabled=is_enabled, + is_enabled=_is_enabled if callable(is_enabled) else is_enabled, ) diff --git a/src/agents/mcp/server.py b/src/agents/mcp/server.py index 91a9274fc..66332549c 100644 --- a/src/agents/mcp/server.py +++ b/src/agents/mcp/server.py @@ -28,6 +28,17 @@ class MCPServer(abc.ABC): """Base class for Model Context Protocol servers.""" + def __init__(self, use_structured_content: bool = False): + """ + Args: + use_structured_content: Whether to use `tool_result.structured_content` when calling an + MCP tool.Defaults to False for backwards compatibility - most MCP servers still + include the structured content in the `tool_result.content`, and using it by + default will cause duplicate content. You can set this to True if you know the + server will not duplicate the structured content in the `tool_result.content`. + """ + self.use_structured_content = use_structured_content + @abc.abstractmethod async def connect(self): """Connect to the server. For example, this might mean spawning a subprocess or @@ -86,6 +97,7 @@ def __init__( cache_tools_list: bool, client_session_timeout_seconds: float | None, tool_filter: ToolFilter = None, + use_structured_content: bool = False, ): """ Args: @@ -98,7 +110,13 @@ def __init__( client_session_timeout_seconds: the read timeout passed to the MCP ClientSession. tool_filter: The tool filter to use for filtering tools. + use_structured_content: Whether to use `tool_result.structured_content` when calling an + MCP tool. Defaults to False for backwards compatibility - most MCP servers still + include the structured content in the `tool_result.content`, and using it by + default will cause duplicate content. You can set this to True if you know the + server will not duplicate the structured content in the `tool_result.content`. """ + super().__init__(use_structured_content=use_structured_content) self.session: ClientSession | None = None self.exit_stack: AsyncExitStack = AsyncExitStack() self._cleanup_lock: asyncio.Lock = asyncio.Lock() @@ -346,6 +364,7 @@ def __init__( name: str | None = None, client_session_timeout_seconds: float | None = 5, tool_filter: ToolFilter = None, + use_structured_content: bool = False, ): """Create a new MCP server based on the stdio transport. @@ -364,11 +383,17 @@ def __init__( command. client_session_timeout_seconds: the read timeout passed to the MCP ClientSession. tool_filter: The tool filter to use for filtering tools. + use_structured_content: Whether to use `tool_result.structured_content` when calling an + MCP tool. Defaults to False for backwards compatibility - most MCP servers still + include the structured content in the `tool_result.content`, and using it by + default will cause duplicate content. You can set this to True if you know the + server will not duplicate the structured content in the `tool_result.content`. """ super().__init__( cache_tools_list, client_session_timeout_seconds, tool_filter, + use_structured_content, ) self.params = StdioServerParameters( @@ -429,6 +454,7 @@ def __init__( name: str | None = None, client_session_timeout_seconds: float | None = 5, tool_filter: ToolFilter = None, + use_structured_content: bool = False, ): """Create a new MCP server based on the HTTP with SSE transport. @@ -449,11 +475,17 @@ def __init__( client_session_timeout_seconds: the read timeout passed to the MCP ClientSession. tool_filter: The tool filter to use for filtering tools. + use_structured_content: Whether to use `tool_result.structured_content` when calling an + MCP tool. Defaults to False for backwards compatibility - most MCP servers still + include the structured content in the `tool_result.content`, and using it by + default will cause duplicate content. You can set this to True if you know the + server will not duplicate the structured content in the `tool_result.content`. """ super().__init__( cache_tools_list, client_session_timeout_seconds, tool_filter, + use_structured_content, ) self.params = params @@ -514,6 +546,7 @@ def __init__( name: str | None = None, client_session_timeout_seconds: float | None = 5, tool_filter: ToolFilter = None, + use_structured_content: bool = False, ): """Create a new MCP server based on the Streamable HTTP transport. @@ -535,11 +568,17 @@ def __init__( client_session_timeout_seconds: the read timeout passed to the MCP ClientSession. tool_filter: The tool filter to use for filtering tools. + use_structured_content: Whether to use `tool_result.structured_content` when calling an + MCP tool. Defaults to False for backwards compatibility - most MCP servers still + include the structured content in the `tool_result.content`, and using it by + default will cause duplicate content. You can set this to True if you know the + server will not duplicate the structured content in the `tool_result.content`. """ super().__init__( cache_tools_list, client_session_timeout_seconds, tool_filter, + use_structured_content, ) self.params = params diff --git a/src/agents/mcp/util.py b/src/agents/mcp/util.py index 18cf4440a..6b2b4679f 100644 --- a/src/agents/mcp/util.py +++ b/src/agents/mcp/util.py @@ -198,11 +198,19 @@ async def invoke_mcp_tool( # string. We'll try to convert. if len(result.content) == 1: tool_output = result.content[0].model_dump_json() + # Append structured content if it exists and we're using it. + if server.use_structured_content and result.structuredContent: + tool_output = f"{tool_output}\n{json.dumps(result.structuredContent)}" elif len(result.content) > 1: - tool_output = json.dumps([item.model_dump(mode="json") for item in result.content]) + tool_results = [item.model_dump(mode="json") for item in result.content] + if server.use_structured_content and result.structuredContent: + tool_results.append(result.structuredContent) + tool_output = json.dumps(tool_results) + elif server.use_structured_content and result.structuredContent: + tool_output = json.dumps(result.structuredContent) else: - logger.error(f"Errored MCP tool result: {result}") - tool_output = "Error running tool." + # Empty content is a valid result (e.g., "no results found") + tool_output = "[]" current_span = get_current_span() if current_span: diff --git a/src/agents/models/chatcmpl_converter.py b/src/agents/models/chatcmpl_converter.py index d3c71c24e..351dc5db7 100644 --- a/src/agents/models/chatcmpl_converter.py +++ b/src/agents/models/chatcmpl_converter.py @@ -484,7 +484,7 @@ def tool_to_openai(cls, tool: Tool) -> ChatCompletionToolParam: ) @classmethod - def convert_handoff_tool(cls, handoff: Handoff[Any]) -> ChatCompletionToolParam: + def convert_handoff_tool(cls, handoff: Handoff[Any, Any]) -> ChatCompletionToolParam: return { "type": "function", "function": { diff --git a/src/agents/models/chatcmpl_stream_handler.py b/src/agents/models/chatcmpl_stream_handler.py index 83fa32abc..6133af344 100644 --- a/src/agents/models/chatcmpl_stream_handler.py +++ b/src/agents/models/chatcmpl_stream_handler.py @@ -53,6 +53,9 @@ class StreamingState: refusal_content_index_and_output: tuple[int, ResponseOutputRefusal] | None = None reasoning_content_index_and_output: tuple[int, ResponseReasoningItem] | None = None function_calls: dict[int, ResponseFunctionToolCall] = field(default_factory=dict) + # Fields for real-time function call streaming + function_call_streaming: dict[int, bool] = field(default_factory=dict) + function_call_output_idx: dict[int, int] = field(default_factory=dict) class SequenceNumber: @@ -255,9 +258,7 @@ async def handle_stream( # Accumulate the refusal string in the output part state.refusal_content_index_and_output[1].refusal += delta.refusal - # Handle tool calls - # Because we don't know the name of the function until the end of the stream, we'll - # save everything and yield events at the end + # Handle tool calls with real-time streaming support if delta.tool_calls: for tc_delta in delta.tool_calls: if tc_delta.index not in state.function_calls: @@ -268,15 +269,76 @@ async def handle_stream( type="function_call", call_id="", ) + state.function_call_streaming[tc_delta.index] = False + tc_function = tc_delta.function + # Accumulate arguments as they come in state.function_calls[tc_delta.index].arguments += ( tc_function.arguments if tc_function else "" ) or "" - state.function_calls[tc_delta.index].name += ( - tc_function.name if tc_function else "" - ) or "" - state.function_calls[tc_delta.index].call_id = tc_delta.id or "" + + # Set function name directly (it's correct from the first function call chunk) + if tc_function and tc_function.name: + state.function_calls[tc_delta.index].name = tc_function.name + + if tc_delta.id: + state.function_calls[tc_delta.index].call_id = tc_delta.id + + function_call = state.function_calls[tc_delta.index] + + # Start streaming as soon as we have function name and call_id + if (not state.function_call_streaming[tc_delta.index] and + function_call.name and + function_call.call_id): + + # Calculate the output index for this function call + function_call_starting_index = 0 + if state.reasoning_content_index_and_output: + function_call_starting_index += 1 + if state.text_content_index_and_output: + function_call_starting_index += 1 + if state.refusal_content_index_and_output: + function_call_starting_index += 1 + + # Add offset for already started function calls + function_call_starting_index += sum( + 1 for streaming in state.function_call_streaming.values() if streaming + ) + + # Mark this function call as streaming and store its output index + state.function_call_streaming[tc_delta.index] = True + state.function_call_output_idx[ + tc_delta.index + ] = function_call_starting_index + + # Send initial function call added event + yield ResponseOutputItemAddedEvent( + item=ResponseFunctionToolCall( + id=FAKE_RESPONSES_ID, + call_id=function_call.call_id, + arguments="", # Start with empty arguments + name=function_call.name, + type="function_call", + ), + output_index=function_call_starting_index, + type="response.output_item.added", + sequence_number=sequence_number.get_and_increment(), + ) + + # Stream arguments if we've started streaming this function call + if (state.function_call_streaming.get(tc_delta.index, False) and + tc_function and + tc_function.arguments): + + output_index = state.function_call_output_idx[tc_delta.index] + yield ResponseFunctionCallArgumentsDeltaEvent( + delta=tc_function.arguments, + item_id=FAKE_RESPONSES_ID, + output_index=output_index, + type="response.function_call_arguments.delta", + sequence_number=sequence_number.get_and_increment(), + ) if state.reasoning_content_index_and_output: yield ResponseReasoningSummaryPartDoneEvent( @@ -327,42 +389,71 @@ async def handle_stream( sequence_number=sequence_number.get_and_increment(), ) - # Actually send events for the function calls - for function_call in state.function_calls.values(): - # First, a ResponseOutputItemAdded for the function call - yield ResponseOutputItemAddedEvent( - item=ResponseFunctionToolCall( - id=FAKE_RESPONSES_ID, - call_id=function_call.call_id, - arguments=function_call.arguments, - name=function_call.name, - type="function_call", - ), - output_index=function_call_starting_index, - type="response.output_item.added", - sequence_number=sequence_number.get_and_increment(), - ) - # Then, yield the args - yield ResponseFunctionCallArgumentsDeltaEvent( - delta=function_call.arguments, - item_id=FAKE_RESPONSES_ID, - output_index=function_call_starting_index, - type="response.function_call_arguments.delta", - sequence_number=sequence_number.get_and_increment(), - ) - # Finally, the ResponseOutputItemDone - yield ResponseOutputItemDoneEvent( - item=ResponseFunctionToolCall( - id=FAKE_RESPONSES_ID, - call_id=function_call.call_id, - arguments=function_call.arguments, - name=function_call.name, - type="function_call", - ), - output_index=function_call_starting_index, - type="response.output_item.done", - sequence_number=sequence_number.get_and_increment(), - ) + # Send completion events for function calls + for index, function_call in state.function_calls.items(): + if state.function_call_streaming.get(index, False): + # Function call was streamed, just send the completion event + output_index = state.function_call_output_idx[index] + yield ResponseOutputItemDoneEvent( + item=ResponseFunctionToolCall( + id=FAKE_RESPONSES_ID, + call_id=function_call.call_id, + arguments=function_call.arguments, + name=function_call.name, + type="function_call", + ), + output_index=output_index, + type="response.output_item.done", + sequence_number=sequence_number.get_and_increment(), + ) + else: + # Function call was not streamed (fallback to old behavior) + # This handles edge cases where function name never arrived + fallback_starting_index = 0 + if state.reasoning_content_index_and_output: + fallback_starting_index += 1 + if state.text_content_index_and_output: + fallback_starting_index += 1 + if state.refusal_content_index_and_output: + fallback_starting_index += 1 + + # Add offset for already started function calls + fallback_starting_index += sum( + 1 for streaming in state.function_call_streaming.values() if streaming + ) + + # Send all events at once (backward compatibility) + yield ResponseOutputItemAddedEvent( + item=ResponseFunctionToolCall( + id=FAKE_RESPONSES_ID, + call_id=function_call.call_id, + arguments=function_call.arguments, + name=function_call.name, + type="function_call", + ), + output_index=fallback_starting_index, + type="response.output_item.added", + sequence_number=sequence_number.get_and_increment(), + ) + yield ResponseFunctionCallArgumentsDeltaEvent( + delta=function_call.arguments, + item_id=FAKE_RESPONSES_ID, + output_index=fallback_starting_index, + type="response.function_call_arguments.delta", + sequence_number=sequence_number.get_and_increment(), + ) + yield ResponseOutputItemDoneEvent( + item=ResponseFunctionToolCall( + id=FAKE_RESPONSES_ID, + call_id=function_call.call_id, + arguments=function_call.arguments, + name=function_call.name, + type="function_call", + ), + output_index=fallback_starting_index, + type="response.output_item.done", + sequence_number=sequence_number.get_and_increment(), + ) # Finally, send the Response completed event outputs: list[ResponseOutputItem] = [] diff --git a/src/agents/models/openai_responses.py b/src/agents/models/openai_responses.py index 76c67903c..f6da60b08 100644 --- a/src/agents/models/openai_responses.py +++ b/src/agents/models/openai_responses.py @@ -370,7 +370,7 @@ def get_response_format( def convert_tools( cls, tools: list[Tool], - handoffs: list[Handoff[Any]], + handoffs: list[Handoff[Any, Any]], ) -> ConvertedTools: converted_tools: list[ToolParam] = [] includes: list[ResponseIncludable] = [] diff --git a/src/agents/realtime/__init__.py b/src/agents/realtime/__init__.py index 0e3e12f75..49c131389 100644 --- a/src/agents/realtime/__init__.py +++ b/src/agents/realtime/__init__.py @@ -30,6 +30,7 @@ RealtimeToolEnd, RealtimeToolStart, ) +from .handoffs import realtime_handoff from .items import ( AssistantMessageItem, AssistantText, @@ -92,6 +93,8 @@ "RealtimeAgentHooks", "RealtimeRunHooks", "RealtimeRunner", + # Handoffs + "realtime_handoff", # Config "RealtimeAudioFormat", "RealtimeClientMessage", diff --git a/src/agents/realtime/agent.py b/src/agents/realtime/agent.py index 9bbed8cb4..30e80a95b 100644 --- a/src/agents/realtime/agent.py +++ b/src/agents/realtime/agent.py @@ -3,10 +3,11 @@ import dataclasses import inspect from collections.abc import Awaitable -from dataclasses import dataclass +from dataclasses import dataclass, field from typing import Any, Callable, Generic, cast from ..agent import AgentBase +from ..handoffs import Handoff from ..lifecycle import AgentHooksBase, RunHooksBase from ..logger import logger from ..run_context import RunContextWrapper, TContext @@ -53,6 +54,14 @@ class RealtimeAgent(AgentBase, Generic[TContext]): return a string. """ + handoffs: list[RealtimeAgent[Any] | Handoff[TContext, RealtimeAgent[Any]]] = field( + default_factory=list + ) + """Handoffs are sub-agents that the agent can delegate to. You can provide a list of handoffs, + and the agent can choose to delegate to them if relevant. Allows for separation of concerns and + modularity. + """ + hooks: RealtimeAgentHooks | None = None """A class that receives callbacks on various lifecycle events for this agent. """ diff --git a/src/agents/realtime/config.py b/src/agents/realtime/config.py index 7f874cfb0..f8a203589 100644 --- a/src/agents/realtime/config.py +++ b/src/agents/realtime/config.py @@ -9,6 +9,7 @@ from typing_extensions import NotRequired, TypeAlias, TypedDict from ..guardrail import OutputGuardrail +from ..handoffs import Handoff from ..model_settings import ToolChoice from ..tool import Tool @@ -27,52 +28,95 @@ RealtimeAudioFormat: TypeAlias = Union[Literal["pcm16", "g711_ulaw", "g711_alaw"], str] +"""The audio format for realtime audio streams.""" class RealtimeClientMessage(TypedDict): """A raw message to be sent to the model.""" type: str # explicitly required + """The type of the message.""" + other_data: NotRequired[dict[str, Any]] """Merged into the message body.""" class RealtimeInputAudioTranscriptionConfig(TypedDict): + """Configuration for audio transcription in realtime sessions.""" + language: NotRequired[str] + """The language code for transcription.""" + model: NotRequired[Literal["gpt-4o-transcribe", "gpt-4o-mini-transcribe", "whisper-1"] | str] + """The transcription model to use.""" + prompt: NotRequired[str] + """An optional prompt to guide transcription.""" class RealtimeTurnDetectionConfig(TypedDict): """Turn detection config. Allows extra vendor keys if needed.""" type: NotRequired[Literal["semantic_vad", "server_vad"]] + """The type of voice activity detection to use.""" + create_response: NotRequired[bool] + """Whether to create a response when a turn is detected.""" + eagerness: NotRequired[Literal["auto", "low", "medium", "high"]] + """How eagerly to detect turn boundaries.""" + interrupt_response: NotRequired[bool] + """Whether to allow interrupting the assistant's response.""" + prefix_padding_ms: NotRequired[int] + """Padding time in milliseconds before turn detection.""" + silence_duration_ms: NotRequired[int] + """Duration of silence in milliseconds to trigger turn detection.""" + threshold: NotRequired[float] + """The threshold for voice activity detection.""" class RealtimeSessionModelSettings(TypedDict): """Model settings for a realtime model session.""" model_name: NotRequired[RealtimeModelName] + """The name of the realtime model to use.""" instructions: NotRequired[str] + """System instructions for the model.""" + modalities: NotRequired[list[Literal["text", "audio"]]] + """The modalities the model should support.""" + voice: NotRequired[str] + """The voice to use for audio output.""" input_audio_format: NotRequired[RealtimeAudioFormat] + """The format for input audio streams.""" + output_audio_format: NotRequired[RealtimeAudioFormat] + """The format for output audio streams.""" + input_audio_transcription: NotRequired[RealtimeInputAudioTranscriptionConfig] + """Configuration for transcribing input audio.""" + turn_detection: NotRequired[RealtimeTurnDetectionConfig] + """Configuration for detecting conversation turns.""" tool_choice: NotRequired[ToolChoice] + """How the model should choose which tools to call.""" + tools: NotRequired[list[Tool]] + """List of tools available to the model.""" + + handoffs: NotRequired[list[Handoff]] + """List of handoff configurations.""" tracing: NotRequired[RealtimeModelTracingConfig | None] + """Configuration for request tracing.""" class RealtimeGuardrailsSettings(TypedDict): @@ -100,7 +144,10 @@ class RealtimeModelTracingConfig(TypedDict): class RealtimeRunConfig(TypedDict): + """Configuration for running a realtime agent session.""" + model_settings: NotRequired[RealtimeSessionModelSettings] + """Settings for the realtime model session.""" output_guardrails: NotRequired[list[OutputGuardrail[Any]]] """List of output guardrails to run on the agent's responses.""" @@ -115,14 +162,27 @@ class RealtimeRunConfig(TypedDict): class RealtimeUserInputText(TypedDict): + """A text input from the user.""" + type: Literal["input_text"] + """The type identifier for text input.""" + text: str + """The text content from the user.""" class RealtimeUserInputMessage(TypedDict): + """A message input from the user.""" + type: Literal["message"] + """The type identifier for message inputs.""" + role: Literal["user"] + """The role identifier for user messages.""" + content: list[RealtimeUserInputText] + """List of text content items in the message.""" RealtimeUserInput: TypeAlias = Union[str, RealtimeUserInputMessage] +"""User input that can be a string or structured message.""" diff --git a/src/agents/realtime/handoffs.py b/src/agents/realtime/handoffs.py new file mode 100644 index 000000000..a3e5151f6 --- /dev/null +++ b/src/agents/realtime/handoffs.py @@ -0,0 +1,165 @@ +from __future__ import annotations + +import inspect +from typing import TYPE_CHECKING, Any, Callable, cast, overload + +from pydantic import TypeAdapter +from typing_extensions import TypeVar + +from ..exceptions import ModelBehaviorError, UserError +from ..handoffs import Handoff +from ..run_context import RunContextWrapper, TContext +from ..strict_schema import ensure_strict_json_schema +from ..tracing.spans import SpanError +from ..util import _error_tracing, _json +from ..util._types import MaybeAwaitable + +if TYPE_CHECKING: + from ..agent import AgentBase + from . import RealtimeAgent + + +# The handoff input type is the type of data passed when the agent is called via a handoff. +THandoffInput = TypeVar("THandoffInput", default=Any) + +OnHandoffWithInput = Callable[[RunContextWrapper[Any], THandoffInput], Any] +OnHandoffWithoutInput = Callable[[RunContextWrapper[Any]], Any] + + +@overload +def realtime_handoff( + agent: RealtimeAgent[TContext], + *, + tool_name_override: str | None = None, + tool_description_override: str | None = None, + is_enabled: bool + | Callable[[RunContextWrapper[Any], RealtimeAgent[Any]], MaybeAwaitable[bool]] = True, +) -> Handoff[TContext, RealtimeAgent[TContext]]: ... + + +@overload +def realtime_handoff( + agent: RealtimeAgent[TContext], + *, + on_handoff: OnHandoffWithInput[THandoffInput], + input_type: type[THandoffInput], + tool_description_override: str | None = None, + tool_name_override: str | None = None, + is_enabled: bool + | Callable[[RunContextWrapper[Any], RealtimeAgent[Any]], MaybeAwaitable[bool]] = True, +) -> Handoff[TContext, RealtimeAgent[TContext]]: ... + + +@overload +def realtime_handoff( + agent: RealtimeAgent[TContext], + *, + on_handoff: OnHandoffWithoutInput, + tool_description_override: str | None = None, + tool_name_override: str | None = None, + is_enabled: bool + | Callable[[RunContextWrapper[Any], RealtimeAgent[Any]], MaybeAwaitable[bool]] = True, +) -> Handoff[TContext, RealtimeAgent[TContext]]: ... + + +def realtime_handoff( + agent: RealtimeAgent[TContext], + tool_name_override: str | None = None, + tool_description_override: str | None = None, + on_handoff: OnHandoffWithInput[THandoffInput] | OnHandoffWithoutInput | None = None, + input_type: type[THandoffInput] | None = None, + is_enabled: bool + | Callable[[RunContextWrapper[Any], RealtimeAgent[Any]], MaybeAwaitable[bool]] = True, +) -> Handoff[TContext, RealtimeAgent[TContext]]: + """Create a handoff from a RealtimeAgent. + + Args: + agent: The RealtimeAgent to handoff to, or a function that returns a RealtimeAgent. + tool_name_override: Optional override for the name of the tool that represents the handoff. + tool_description_override: Optional override for the description of the tool that + represents the handoff. + on_handoff: A function that runs when the handoff is invoked. + input_type: the type of the input to the handoff. If provided, the input will be validated + against this type. Only relevant if you pass a function that takes an input. + is_enabled: Whether the handoff is enabled. Can be a bool or a callable that takes the run + context and agent and returns whether the handoff is enabled. Disabled handoffs are + hidden from the LLM at runtime. + + Note: input_filter is not supported for RealtimeAgent handoffs. + """ + assert (on_handoff and input_type) or not (on_handoff and input_type), ( + "You must provide either both on_handoff and input_type, or neither" + ) + type_adapter: TypeAdapter[Any] | None + if input_type is not None: + assert callable(on_handoff), "on_handoff must be callable" + sig = inspect.signature(on_handoff) + if len(sig.parameters) != 2: + raise UserError("on_handoff must take two arguments: context and input") + + type_adapter = TypeAdapter(input_type) + input_json_schema = type_adapter.json_schema() + else: + type_adapter = None + input_json_schema = {} + if on_handoff is not None: + sig = inspect.signature(on_handoff) + if len(sig.parameters) != 1: + raise UserError("on_handoff must take one argument: context") + + async def _invoke_handoff( + ctx: RunContextWrapper[Any], input_json: str | None = None + ) -> RealtimeAgent[TContext]: + if input_type is not None and type_adapter is not None: + if input_json is None: + _error_tracing.attach_error_to_current_span( + SpanError( + message="Handoff function expected non-null input, but got None", + data={"details": "input_json is None"}, + ) + ) + raise ModelBehaviorError("Handoff function expected non-null input, but got None") + + validated_input = _json.validate_json( + json_str=input_json, + type_adapter=type_adapter, + partial=False, + ) + input_func = cast(OnHandoffWithInput[THandoffInput], on_handoff) + if inspect.iscoroutinefunction(input_func): + await input_func(ctx, validated_input) + else: + input_func(ctx, validated_input) + elif on_handoff is not None: + no_input_func = cast(OnHandoffWithoutInput, on_handoff) + if inspect.iscoroutinefunction(no_input_func): + await no_input_func(ctx) + else: + no_input_func(ctx) + + return agent + + tool_name = tool_name_override or Handoff.default_tool_name(agent) + tool_description = tool_description_override or Handoff.default_tool_description(agent) + + # Always ensure the input JSON schema is in strict mode + # If there is a need, we can make this configurable in the future + input_json_schema = ensure_strict_json_schema(input_json_schema) + + async def _is_enabled(ctx: RunContextWrapper[Any], agent_base: AgentBase[Any]) -> bool: + assert callable(is_enabled), "is_enabled must be non-null here" + assert isinstance(agent_base, RealtimeAgent), "Can't handoff to a non-RealtimeAgent" + result = is_enabled(ctx, agent_base) + if inspect.isawaitable(result): + return await result + return result + + return Handoff( + tool_name=tool_name, + tool_description=tool_description, + input_json_schema=input_json_schema, + on_invoke_handoff=_invoke_handoff, + input_filter=None, # Not supported for RealtimeAgent handoffs + agent_name=agent.name, + is_enabled=_is_enabled if callable(is_enabled) else is_enabled, + ) diff --git a/src/agents/realtime/items.py b/src/agents/realtime/items.py index a835e7a88..f8a288145 100644 --- a/src/agents/realtime/items.py +++ b/src/agents/realtime/items.py @@ -6,59 +6,127 @@ class InputText(BaseModel): + """Text input content for realtime messages.""" + type: Literal["input_text"] = "input_text" + """The type identifier for text input.""" + text: str | None = None + """The text content.""" # Allow extra data model_config = ConfigDict(extra="allow") class InputAudio(BaseModel): + """Audio input content for realtime messages.""" + type: Literal["input_audio"] = "input_audio" + """The type identifier for audio input.""" + audio: str | None = None + """The base64-encoded audio data.""" + transcript: str | None = None + """The transcript of the audio, if available.""" # Allow extra data model_config = ConfigDict(extra="allow") class AssistantText(BaseModel): + """Text content from the assistant in realtime responses.""" + type: Literal["text"] = "text" + """The type identifier for text content.""" + text: str | None = None + """The text content from the assistant.""" + + # Allow extra data + model_config = ConfigDict(extra="allow") + + +class AssistantAudio(BaseModel): + """Audio content from the assistant in realtime responses.""" + + type: Literal["audio"] = "audio" + """The type identifier for audio content.""" + + audio: str | None = None + """The base64-encoded audio data from the assistant.""" + + transcript: str | None = None + """The transcript of the audio response.""" # Allow extra data model_config = ConfigDict(extra="allow") class SystemMessageItem(BaseModel): + """A system message item in realtime conversations.""" + item_id: str + """Unique identifier for this message item.""" + previous_item_id: str | None = None + """ID of the previous item in the conversation.""" + type: Literal["message"] = "message" + """The type identifier for message items.""" + role: Literal["system"] = "system" + """The role identifier for system messages.""" + content: list[InputText] + """List of text content for the system message.""" # Allow extra data model_config = ConfigDict(extra="allow") class UserMessageItem(BaseModel): + """A user message item in realtime conversations.""" + item_id: str + """Unique identifier for this message item.""" + previous_item_id: str | None = None + """ID of the previous item in the conversation.""" + type: Literal["message"] = "message" + """The type identifier for message items.""" + role: Literal["user"] = "user" + """The role identifier for user messages.""" + content: list[Annotated[InputText | InputAudio, Field(discriminator="type")]] + """List of content items, can be text or audio.""" # Allow extra data model_config = ConfigDict(extra="allow") class AssistantMessageItem(BaseModel): + """An assistant message item in realtime conversations.""" + item_id: str + """Unique identifier for this message item.""" + previous_item_id: str | None = None + """ID of the previous item in the conversation.""" + type: Literal["message"] = "message" + """The type identifier for message items.""" + role: Literal["assistant"] = "assistant" + """The role identifier for assistant messages.""" + status: Literal["in_progress", "completed", "incomplete"] | None = None - content: list[AssistantText] + """The status of the assistant's response.""" + + content: list[Annotated[AssistantText | AssistantAudio, Field(discriminator="type")]] + """List of content items from the assistant, can be text or audio.""" # Allow extra data model_config = ConfigDict(extra="allow") @@ -68,24 +136,49 @@ class AssistantMessageItem(BaseModel): Union[SystemMessageItem, UserMessageItem, AssistantMessageItem], Field(discriminator="role"), ] +"""A message item that can be from system, user, or assistant.""" class RealtimeToolCallItem(BaseModel): + """A tool call item in realtime conversations.""" + item_id: str + """Unique identifier for this tool call item.""" + previous_item_id: str | None = None + """ID of the previous item in the conversation.""" + + call_id: str | None + """The call ID for this tool invocation.""" + type: Literal["function_call"] = "function_call" + """The type identifier for function call items.""" + status: Literal["in_progress", "completed"] + """The status of the tool call execution.""" + arguments: str + """The JSON string arguments passed to the tool.""" + name: str + """The name of the tool being called.""" + output: str | None = None + """The output result from the tool execution.""" # Allow extra data model_config = ConfigDict(extra="allow") RealtimeItem = Union[RealtimeMessageItem, RealtimeToolCallItem] +"""A realtime item that can be a message or tool call.""" class RealtimeResponse(BaseModel): + """A response from the realtime model.""" + id: str + """Unique identifier for this response.""" + output: list[RealtimeMessageItem] + """List of message items in the response.""" diff --git a/src/agents/realtime/openai_realtime.py b/src/agents/realtime/openai_realtime.py index 1c4a4de3c..e8a4749e7 100644 --- a/src/agents/realtime/openai_realtime.py +++ b/src/agents/realtime/openai_realtime.py @@ -10,19 +10,53 @@ import pydantic import websockets -from openai.types.beta.realtime.conversation_item import ConversationItem +from openai.types.beta.realtime.conversation_item import ( + ConversationItem, + ConversationItem as OpenAIConversationItem, +) +from openai.types.beta.realtime.conversation_item_content import ( + ConversationItemContent as OpenAIConversationItemContent, +) +from openai.types.beta.realtime.conversation_item_create_event import ( + ConversationItemCreateEvent as OpenAIConversationItemCreateEvent, +) +from openai.types.beta.realtime.conversation_item_retrieve_event import ( + ConversationItemRetrieveEvent as OpenAIConversationItemRetrieveEvent, +) +from openai.types.beta.realtime.conversation_item_truncate_event import ( + ConversationItemTruncateEvent as OpenAIConversationItemTruncateEvent, +) +from openai.types.beta.realtime.input_audio_buffer_append_event import ( + InputAudioBufferAppendEvent as OpenAIInputAudioBufferAppendEvent, +) +from openai.types.beta.realtime.input_audio_buffer_commit_event import ( + InputAudioBufferCommitEvent as OpenAIInputAudioBufferCommitEvent, +) +from openai.types.beta.realtime.realtime_client_event import ( + RealtimeClientEvent as OpenAIRealtimeClientEvent, +) from openai.types.beta.realtime.realtime_server_event import ( RealtimeServerEvent as OpenAIRealtimeServerEvent, ) from openai.types.beta.realtime.response_audio_delta_event import ResponseAudioDeltaEvent +from openai.types.beta.realtime.response_cancel_event import ( + ResponseCancelEvent as OpenAIResponseCancelEvent, +) +from openai.types.beta.realtime.response_create_event import ( + ResponseCreateEvent as OpenAIResponseCreateEvent, +) from openai.types.beta.realtime.session_update_event import ( Session as OpenAISessionObject, SessionTool as OpenAISessionTool, + SessionTracing as OpenAISessionTracing, + SessionTracingTracingConfiguration as OpenAISessionTracingConfiguration, + SessionUpdateEvent as OpenAISessionUpdateEvent, ) from pydantic import TypeAdapter from typing_extensions import assert_never from websockets.asyncio.client import ClientConnection +from agents.handoffs import Handoff from agents.tool import FunctionTool, Tool from agents.util._types import MaybeAwaitable @@ -135,12 +169,11 @@ async def _send_tracing_config( ) -> None: """Update tracing configuration via session.update event.""" if tracing_config is not None: + converted_tracing_config = _ConversionHelper.convert_tracing_config(tracing_config) await self._send_raw_message( - RealtimeModelSendRawMessage( - message={ - "type": "session.update", - "other_data": {"session": {"tracing": tracing_config}}, - } + OpenAISessionUpdateEvent( + session=OpenAISessionObject(tracing=converted_tracing_config), + type="session.update", ) ) @@ -199,7 +232,11 @@ async def _listen_for_messages(self): async def send_event(self, event: RealtimeModelSendEvent) -> None: """Send an event to the model.""" if isinstance(event, RealtimeModelSendRawMessage): - await self._send_raw_message(event) + converted = _ConversionHelper.try_convert_raw_message(event) + if converted is not None: + await self._send_raw_message(converted) + else: + logger.error(f"Failed to convert raw message: {event}") elif isinstance(event, RealtimeModelSendUserInput): await self._send_user_input(event) elif isinstance(event, RealtimeModelSendAudio): @@ -214,77 +251,33 @@ async def send_event(self, event: RealtimeModelSendEvent) -> None: assert_never(event) raise ValueError(f"Unknown event type: {type(event)}") - async def _send_raw_message(self, event: RealtimeModelSendRawMessage) -> None: + async def _send_raw_message(self, event: OpenAIRealtimeClientEvent) -> None: """Send a raw message to the model.""" assert self._websocket is not None, "Not connected" - converted_event = { - "type": event.message["type"], - } - - converted_event.update(event.message.get("other_data", {})) - - await self._websocket.send(json.dumps(converted_event)) + await self._websocket.send(event.model_dump_json(exclude_none=True, exclude_unset=True)) async def _send_user_input(self, event: RealtimeModelSendUserInput) -> None: - message = ( - event.user_input - if isinstance(event.user_input, dict) - else { - "type": "message", - "role": "user", - "content": [{"type": "input_text", "text": event.user_input}], - } - ) - other_data = { - "item": message, - } - - await self._send_raw_message( - RealtimeModelSendRawMessage( - message={"type": "conversation.item.create", "other_data": other_data} - ) - ) - await self._send_raw_message( - RealtimeModelSendRawMessage(message={"type": "response.create"}) - ) + converted = _ConversionHelper.convert_user_input_to_item_create(event) + await self._send_raw_message(converted) + await self._send_raw_message(OpenAIResponseCreateEvent(type="response.create")) async def _send_audio(self, event: RealtimeModelSendAudio) -> None: - base64_audio = base64.b64encode(event.audio).decode("utf-8") - await self._send_raw_message( - RealtimeModelSendRawMessage( - message={ - "type": "input_audio_buffer.append", - "other_data": { - "audio": base64_audio, - }, - } - ) - ) + converted = _ConversionHelper.convert_audio_to_input_audio_buffer_append(event) + await self._send_raw_message(converted) if event.commit: await self._send_raw_message( - RealtimeModelSendRawMessage(message={"type": "input_audio_buffer.commit"}) + OpenAIInputAudioBufferCommitEvent(type="input_audio_buffer.commit") ) async def _send_tool_output(self, event: RealtimeModelSendToolOutput) -> None: - await self._send_raw_message( - RealtimeModelSendRawMessage( - message={ - "type": "conversation.item.create", - "other_data": { - "item": { - "type": "function_call_output", - "output": event.output, - "call_id": event.tool_call.id, - }, - }, - } - ) - ) + converted = _ConversionHelper.convert_tool_output(event) + await self._send_raw_message(converted) tool_item = RealtimeToolCallItem( item_id=event.tool_call.id or "", previous_item_id=event.tool_call.previous_item_id, + call_id=event.tool_call.call_id, type="function_call", status="completed", arguments=event.tool_call.arguments, @@ -294,9 +287,7 @@ async def _send_tool_output(self, event: RealtimeModelSendToolOutput) -> None: await self._emit_event(RealtimeModelItemUpdatedEvent(item=tool_item)) if event.start_response: - await self._send_raw_message( - RealtimeModelSendRawMessage(message={"type": "response.create"}) - ) + await self._send_raw_message(OpenAIResponseCreateEvent(type="response.create")) async def _send_interrupt(self, event: RealtimeModelSendInterrupt) -> None: if not self._current_item_id or not self._audio_start_time: @@ -307,18 +298,12 @@ async def _send_interrupt(self, event: RealtimeModelSendInterrupt) -> None: elapsed_time_ms = (datetime.now() - self._audio_start_time).total_seconds() * 1000 if elapsed_time_ms > 0 and elapsed_time_ms < self._audio_length_ms: await self._emit_event(RealtimeModelAudioInterruptedEvent()) - await self._send_raw_message( - RealtimeModelSendRawMessage( - message={ - "type": "conversation.item.truncate", - "other_data": { - "item_id": self._current_item_id, - "content_index": self._current_audio_content_index, - "audio_end_ms": elapsed_time_ms, - }, - } - ) + converted = _ConversionHelper.convert_interrupt( + self._current_item_id, + self._current_audio_content_index or 0, + int(elapsed_time_ms), ) + await self._send_raw_message(converted) self._current_item_id = None self._audio_start_time = None @@ -354,6 +339,7 @@ async def _handle_output_item(self, item: ConversationItem) -> None: tool_call = RealtimeToolCallItem( item_id=item.id or "", previous_item_id=None, + call_id=item.call_id, type="function_call", # We use the same item for tool call and output, so it will be completed by the # output being added @@ -365,7 +351,7 @@ async def _handle_output_item(self, item: ConversationItem) -> None: await self._emit_event(RealtimeModelItemUpdatedEvent(item=tool_call)) await self._emit_event( RealtimeModelToolCallEvent( - call_id=item.id or "", + call_id=item.call_id or "", name=item.name or "", arguments=item.arguments or "", id=item.id or "", @@ -378,7 +364,9 @@ async def _handle_output_item(self, item: ConversationItem) -> None: "item_id": item.id or "", "type": item.type, "role": item.role, - "content": item.content, + "content": ( + [content.model_dump() for content in item.content] if item.content else [] + ), "status": "in_progress", } ) @@ -404,9 +392,7 @@ async def close(self) -> None: async def _cancel_response(self) -> None: if self._ongoing_response: - await self._send_raw_message( - RealtimeModelSendRawMessage(message={"type": "response.cancel"}) - ) + await self._send_raw_message(OpenAIResponseCancelEvent(type="response.cancel")) self._ongoing_response = False async def _handle_ws_event(self, event: dict[str, Any]): @@ -466,16 +452,13 @@ async def _handle_ws_event(self, event: dict[str, Any]): parsed.type == "conversation.item.input_audio_transcription.completed" or parsed.type == "conversation.item.truncated" ): - await self._send_raw_message( - RealtimeModelSendRawMessage( - message={ - "type": "conversation.item.retrieve", - "other_data": { - "item_id": self._current_item_id, - }, - } + if self._current_item_id: + await self._send_raw_message( + OpenAIConversationItemRetrieveEvent( + type="conversation.item.retrieve", + item_id=self._current_item_id, + ) ) - ) if parsed.type == "conversation.item.input_audio_transcription.completed": await self._emit_event( RealtimeModelInputAudioTranscriptionCompletedEvent( @@ -504,14 +487,7 @@ async def _handle_ws_event(self, event: dict[str, Any]): async def _update_session_config(self, model_settings: RealtimeSessionModelSettings) -> None: session_config = self._get_session_config(model_settings) await self._send_raw_message( - RealtimeModelSendRawMessage( - message={ - "type": "session.update", - "other_data": { - "session": session_config.model_dump(exclude_unset=True, exclude_none=True) - }, - } - ) + OpenAISessionUpdateEvent(session=session_config, type="session.update") ) def _get_session_config( @@ -546,10 +522,14 @@ def _get_session_config( "tool_choice", DEFAULT_MODEL_SETTINGS.get("tool_choice"), # type: ignore ), - tools=self._tools_to_session_tools(model_settings.get("tools", [])), + tools=self._tools_to_session_tools( + tools=model_settings.get("tools", []), handoffs=model_settings.get("handoffs", []) + ), ) - def _tools_to_session_tools(self, tools: list[Tool]) -> list[OpenAISessionTool]: + def _tools_to_session_tools( + self, tools: list[Tool], handoffs: list[Handoff] + ) -> list[OpenAISessionTool]: converted_tools: list[OpenAISessionTool] = [] for tool in tools: if not isinstance(tool, FunctionTool): @@ -562,6 +542,17 @@ def _tools_to_session_tools(self, tools: list[Tool]) -> list[OpenAISessionTool]: type="function", ) ) + + for handoff in handoffs: + converted_tools.append( + OpenAISessionTool( + name=handoff.tool_name, + description=handoff.tool_description, + parameters=handoff.input_json_schema, + type="function", + ) + ) + return converted_tools @@ -582,3 +573,98 @@ def conversation_item_to_realtime_message_item( "status": "in_progress", }, ) + + @classmethod + def try_convert_raw_message( + cls, message: RealtimeModelSendRawMessage + ) -> OpenAIRealtimeClientEvent | None: + try: + data = {} + data["type"] = message.message["type"] + data.update(message.message.get("other_data", {})) + return TypeAdapter(OpenAIRealtimeClientEvent).validate_python(data) + except Exception: + return None + + @classmethod + def convert_tracing_config( + cls, tracing_config: RealtimeModelTracingConfig | Literal["auto"] | None + ) -> OpenAISessionTracing | None: + if tracing_config is None: + return None + elif tracing_config == "auto": + return "auto" + return OpenAISessionTracingConfiguration( + group_id=tracing_config.get("group_id"), + metadata=tracing_config.get("metadata"), + workflow_name=tracing_config.get("workflow_name"), + ) + + @classmethod + def convert_user_input_to_conversation_item( + cls, event: RealtimeModelSendUserInput + ) -> OpenAIConversationItem: + user_input = event.user_input + + if isinstance(user_input, dict): + return OpenAIConversationItem( + type="message", + role="user", + content=[ + OpenAIConversationItemContent( + type="input_text", + text=item.get("text"), + ) + for item in user_input.get("content", []) + ], + ) + else: + return OpenAIConversationItem( + type="message", + role="user", + content=[OpenAIConversationItemContent(type="input_text", text=user_input)], + ) + + @classmethod + def convert_user_input_to_item_create( + cls, event: RealtimeModelSendUserInput + ) -> OpenAIRealtimeClientEvent: + return OpenAIConversationItemCreateEvent( + type="conversation.item.create", + item=cls.convert_user_input_to_conversation_item(event), + ) + + @classmethod + def convert_audio_to_input_audio_buffer_append( + cls, event: RealtimeModelSendAudio + ) -> OpenAIRealtimeClientEvent: + base64_audio = base64.b64encode(event.audio).decode("utf-8") + return OpenAIInputAudioBufferAppendEvent( + type="input_audio_buffer.append", + audio=base64_audio, + ) + + @classmethod + def convert_tool_output(cls, event: RealtimeModelSendToolOutput) -> OpenAIRealtimeClientEvent: + return OpenAIConversationItemCreateEvent( + type="conversation.item.create", + item=OpenAIConversationItem( + type="function_call_output", + output=event.output, + call_id=event.tool_call.call_id, + ), + ) + + @classmethod + def convert_interrupt( + cls, + current_item_id: str, + current_audio_content_index: int, + elapsed_time_ms: int, + ) -> OpenAIRealtimeClientEvent: + return OpenAIConversationItemTruncateEvent( + type="conversation.item.truncate", + item_id=current_item_id, + content_index=current_audio_content_index, + audio_end_ms=elapsed_time_ms, + ) diff --git a/src/agents/realtime/session.py b/src/agents/realtime/session.py index 07791c8d8..6df35b438 100644 --- a/src/agents/realtime/session.py +++ b/src/agents/realtime/session.py @@ -1,6 +1,7 @@ from __future__ import annotations import asyncio +import inspect from collections.abc import AsyncIterator from typing import Any, cast @@ -31,6 +32,7 @@ RealtimeToolEnd, RealtimeToolStart, ) +from .handoffs import realtime_handoff from .items import InputAudio, InputText, RealtimeItem from .model import RealtimeModel, RealtimeModelConfig, RealtimeModelListener from .model_events import ( @@ -255,9 +257,12 @@ async def _put_event(self, event: RealtimeSessionEvent) -> None: async def _handle_tool_call(self, event: RealtimeModelToolCallEvent) -> None: """Handle a tool call event.""" - all_tools = await self._current_agent.get_all_tools(self._context_wrapper) - function_map = {tool.name: tool for tool in all_tools if isinstance(tool, FunctionTool)} - handoff_map = {tool.name: tool for tool in all_tools if isinstance(tool, Handoff)} + tools, handoffs = await asyncio.gather( + self._current_agent.get_all_tools(self._context_wrapper), + self._get_handoffs(self._current_agent, self._context_wrapper), + ) + function_map = {tool.name: tool for tool in tools if isinstance(tool, FunctionTool)} + handoff_map = {handoff.tool_name: handoff for handoff in handoffs} if event.name in function_map: await self._put_event( @@ -303,7 +308,9 @@ async def _handle_tool_call(self, event: RealtimeModelToolCallEvent) -> None: # Execute the handoff to get the new agent result = await handoff.on_invoke_handoff(self._context_wrapper, event.arguments) if not isinstance(result, RealtimeAgent): - raise UserError(f"Handoff {handoff.name} returned invalid result: {type(result)}") + raise UserError( + f"Handoff {handoff.tool_name} returned invalid result: {type(result)}" + ) # Store previous agent for event previous_agent = self._current_agent @@ -492,11 +499,37 @@ async def _get__updated_model_settings( self, new_agent: RealtimeAgent ) -> RealtimeSessionModelSettings: updated_settings: RealtimeSessionModelSettings = {} - instructions, tools = await asyncio.gather( + instructions, tools, handoffs = await asyncio.gather( new_agent.get_system_prompt(self._context_wrapper), new_agent.get_all_tools(self._context_wrapper), + self._get_handoffs(new_agent, self._context_wrapper), ) updated_settings["instructions"] = instructions or "" updated_settings["tools"] = tools or [] + updated_settings["handoffs"] = handoffs or [] return updated_settings + + @classmethod + async def _get_handoffs( + cls, agent: RealtimeAgent[Any], context_wrapper: RunContextWrapper[Any] + ) -> list[Handoff[Any, RealtimeAgent[Any]]]: + handoffs: list[Handoff[Any, RealtimeAgent[Any]]] = [] + for handoff_item in agent.handoffs: + if isinstance(handoff_item, Handoff): + handoffs.append(handoff_item) + elif isinstance(handoff_item, RealtimeAgent): + handoffs.append(realtime_handoff(handoff_item)) + + async def _check_handoff_enabled(handoff_obj: Handoff[Any, RealtimeAgent[Any]]) -> bool: + attr = handoff_obj.is_enabled + if isinstance(attr, bool): + return attr + res = attr(context_wrapper, agent) + if inspect.isawaitable(res): + return await res + return res + + results = await asyncio.gather(*(_check_handoff_enabled(h) for h in handoffs)) + enabled = [h for h, ok in zip(handoffs, results) if ok] + return enabled diff --git a/tests/mcp/helpers.py b/tests/mcp/helpers.py index 31d43c228..dec713bf6 100644 --- a/tests/mcp/helpers.py +++ b/tests/mcp/helpers.py @@ -4,7 +4,14 @@ from typing import Any from mcp import Tool as MCPTool -from mcp.types import CallToolResult, GetPromptResult, ListPromptsResult, PromptMessage, TextContent +from mcp.types import ( + CallToolResult, + Content, + GetPromptResult, + ListPromptsResult, + PromptMessage, + TextContent, +) from agents.mcp import MCPServer from agents.mcp.server import _MCPServerWithClientSession @@ -61,11 +68,13 @@ def __init__( tool_filter: ToolFilter = None, server_name: str = "fake_mcp_server", ): + super().__init__(use_structured_content=False) self.tools: list[MCPTool] = tools or [] self.tool_calls: list[str] = [] self.tool_results: list[str] = [] self.tool_filter = tool_filter self._server_name = server_name + self._custom_content: list[Content] | None = None def add_tool(self, name: str, input_schema: dict[str, Any]): self.tools.append(MCPTool(name=name, inputSchema=input_schema)) @@ -90,6 +99,11 @@ async def list_tools(self, run_context=None, agent=None): async def call_tool(self, tool_name: str, arguments: dict[str, Any] | None) -> CallToolResult: self.tool_calls.append(tool_name) self.tool_results.append(f"result_{tool_name}_{json.dumps(arguments)}") + + # Allow testing custom content scenarios + if self._custom_content is not None: + return CallToolResult(content=self._custom_content) + return CallToolResult( content=[TextContent(text=self.tool_results[-1], type="text")], ) diff --git a/tests/mcp/test_mcp_tracing.py b/tests/mcp/test_mcp_tracing.py index 54575dcb5..33dfa5ea1 100644 --- a/tests/mcp/test_mcp_tracing.py +++ b/tests/mcp/test_mcp_tracing.py @@ -62,7 +62,7 @@ async def test_mcp_tracing(): "data": { "name": "test_tool_1", "input": "", - "output": '{"type":"text","text":"result_test_tool_1_{}","annotations":null}', # noqa: E501 + "output": '{"type":"text","text":"result_test_tool_1_{}","annotations":null,"meta":null}', # noqa: E501 "mcp_data": {"server": "fake_mcp_server"}, }, }, @@ -133,7 +133,7 @@ async def test_mcp_tracing(): "data": { "name": "test_tool_2", "input": "", - "output": '{"type":"text","text":"result_test_tool_2_{}","annotations":null}', # noqa: E501 + "output": '{"type":"text","text":"result_test_tool_2_{}","annotations":null,"meta":null}', # noqa: E501 "mcp_data": {"server": "fake_mcp_server"}, }, }, @@ -197,7 +197,7 @@ async def test_mcp_tracing(): "data": { "name": "test_tool_3", "input": "", - "output": '{"type":"text","text":"result_test_tool_3_{}","annotations":null}', # noqa: E501 + "output": '{"type":"text","text":"result_test_tool_3_{}","annotations":null,"meta":null}', # noqa: E501 "mcp_data": {"server": "fake_mcp_server"}, }, }, diff --git a/tests/mcp/test_mcp_util.py b/tests/mcp/test_mcp_util.py index 3230e63dd..af63665f8 100644 --- a/tests/mcp/test_mcp_util.py +++ b/tests/mcp/test_mcp_util.py @@ -234,6 +234,60 @@ async def test_agent_convert_schemas_false(): assert baz_tool.strict_json_schema is False, "Shouldn't be converted unless specified" +@pytest.mark.asyncio +async def test_mcp_fastmcp_behavior_verification(): + """Test that verifies the exact FastMCP _convert_to_content behavior we observed. + + Based on our testing, FastMCP's _convert_to_content function behaves as follows: + - None β†’ content=[] β†’ MCPUtil returns "[]" + - [] β†’ content=[] β†’ MCPUtil returns "[]" + - {} β†’ content=[TextContent(text="{}")] β†’ MCPUtil returns full JSON + - [{}] β†’ content=[TextContent(text="{}")] β†’ MCPUtil returns full JSON (flattened) + - [[]] β†’ content=[] β†’ MCPUtil returns "[]" (recursive empty) + """ + + from mcp.types import TextContent + + server = FakeMCPServer() + server.add_tool("test_tool", {}) + + ctx = RunContextWrapper(context=None) + tool = MCPTool(name="test_tool", inputSchema={}) + + # Case 1: None -> "[]". + server._custom_content = [] + result = await MCPUtil.invoke_mcp_tool(server, tool, ctx, "") + assert result == "[]", f"None should return '[]', got {result}" + + # Case 2: [] -> "[]". + server._custom_content = [] + result = await MCPUtil.invoke_mcp_tool(server, tool, ctx, "") + assert result == "[]", f"[] should return '[]', got {result}" + + # Case 3: {} -> {"type":"text","text":"{}","annotations":null,"meta":null}. + server._custom_content = [TextContent(text="{}", type="text")] + result = await MCPUtil.invoke_mcp_tool(server, tool, ctx, "") + expected = '{"type":"text","text":"{}","annotations":null,"meta":null}' + assert result == expected, f"{{}} should return {expected}, got {result}" + + # Case 4: [{}] -> {"type":"text","text":"{}","annotations":null,"meta":null}. + server._custom_content = [TextContent(text="{}", type="text")] + result = await MCPUtil.invoke_mcp_tool(server, tool, ctx, "") + expected = '{"type":"text","text":"{}","annotations":null,"meta":null}' + assert result == expected, f"[{{}}] should return {expected}, got {result}" + + # Case 5: [[]] -> "[]". + server._custom_content = [] + result = await MCPUtil.invoke_mcp_tool(server, tool, ctx, "") + assert result == "[]", f"[[]] should return '[]', got {result}" + + # Case 6: String values work normally. + server._custom_content = [TextContent(text="hello", type="text")] + result = await MCPUtil.invoke_mcp_tool(server, tool, ctx, "") + expected = '{"type":"text","text":"hello","annotations":null,"meta":null}' + assert result == expected, f"String should return {expected}, got {result}" + + @pytest.mark.asyncio async def test_agent_convert_schemas_unset(): """Test that leaving convert_schemas_to_strict unset (defaulting to False) leaves tool schemas diff --git a/tests/models/test_litellm_chatcompletions_stream.py b/tests/models/test_litellm_chatcompletions_stream.py index cd342e444..bd38f8759 100644 --- a/tests/models/test_litellm_chatcompletions_stream.py +++ b/tests/models/test_litellm_chatcompletions_stream.py @@ -214,17 +214,18 @@ async def test_stream_response_yields_events_for_tool_call(monkeypatch) -> None: the model is streaming a function/tool call instead of plain text. The function call will be split across two chunks. """ - # Simulate a single tool call whose ID stays constant and function name/args built over chunks. + # Simulate a single tool call with complete function name in first chunk + # and arguments split across chunks (reflecting real API behavior) tool_call_delta1 = ChoiceDeltaToolCall( index=0, id="tool-id", - function=ChoiceDeltaToolCallFunction(name="my_", arguments="arg1"), + function=ChoiceDeltaToolCallFunction(name="my_func", arguments="arg1"), type="function", ) tool_call_delta2 = ChoiceDeltaToolCall( index=0, id="tool-id", - function=ChoiceDeltaToolCallFunction(name="func", arguments="arg2"), + function=ChoiceDeltaToolCallFunction(name=None, arguments="arg2"), type="function", ) chunk1 = ChatCompletionChunk( @@ -284,18 +285,131 @@ async def patched_fetch_response(self, *args, **kwargs): # The added item should be a ResponseFunctionToolCall. added_fn = output_events[1].item assert isinstance(added_fn, ResponseFunctionToolCall) - assert added_fn.name == "my_func" # Name should be concatenation of both chunks. - assert added_fn.arguments == "arg1arg2" + assert added_fn.name == "my_func" # Name should be complete from first chunk + assert added_fn.arguments == "" # Arguments start empty assert output_events[2].type == "response.function_call_arguments.delta" - assert output_events[2].delta == "arg1arg2" - assert output_events[3].type == "response.output_item.done" - assert output_events[4].type == "response.completed" - assert output_events[2].delta == "arg1arg2" - assert output_events[3].type == "response.output_item.done" - assert output_events[4].type == "response.completed" - assert added_fn.name == "my_func" # Name should be concatenation of both chunks. - assert added_fn.arguments == "arg1arg2" - assert output_events[2].type == "response.function_call_arguments.delta" - assert output_events[2].delta == "arg1arg2" - assert output_events[3].type == "response.output_item.done" - assert output_events[4].type == "response.completed" + assert output_events[2].delta == "arg1" # First argument chunk + assert output_events[3].type == "response.function_call_arguments.delta" + assert output_events[3].delta == "arg2" # Second argument chunk + assert output_events[4].type == "response.output_item.done" + assert output_events[5].type == "response.completed" + # Final function call should have complete arguments + final_fn = output_events[4].item + assert isinstance(final_fn, ResponseFunctionToolCall) + assert final_fn.name == "my_func" + assert final_fn.arguments == "arg1arg2" + + +@pytest.mark.allow_call_model_methods +@pytest.mark.asyncio +async def test_stream_response_yields_real_time_function_call_arguments(monkeypatch) -> None: + """ + Validate that LiteLLM `stream_response` also emits function call arguments in real-time + as they are received, ensuring consistent behavior across model providers. + """ + # Simulate realistic chunks: name first, then arguments incrementally + tool_call_delta1 = ChoiceDeltaToolCall( + index=0, + id="litellm-call-456", + function=ChoiceDeltaToolCallFunction(name="generate_code", arguments=""), + type="function", + ) + tool_call_delta2 = ChoiceDeltaToolCall( + index=0, + function=ChoiceDeltaToolCallFunction(arguments='{"language": "'), + type="function", + ) + tool_call_delta3 = ChoiceDeltaToolCall( + index=0, + function=ChoiceDeltaToolCallFunction(arguments='python", "task": "'), + type="function", + ) + tool_call_delta4 = ChoiceDeltaToolCall( + index=0, + function=ChoiceDeltaToolCallFunction(arguments='hello world"}'), + type="function", + ) + + chunk1 = ChatCompletionChunk( + id="chunk-id", + created=1, + model="fake", + object="chat.completion.chunk", + choices=[Choice(index=0, delta=ChoiceDelta(tool_calls=[tool_call_delta1]))], + ) + chunk2 = ChatCompletionChunk( + id="chunk-id", + created=1, + model="fake", + object="chat.completion.chunk", + choices=[Choice(index=0, delta=ChoiceDelta(tool_calls=[tool_call_delta2]))], + ) + chunk3 = ChatCompletionChunk( + id="chunk-id", + created=1, + model="fake", + object="chat.completion.chunk", + choices=[Choice(index=0, delta=ChoiceDelta(tool_calls=[tool_call_delta3]))], + ) + chunk4 = ChatCompletionChunk( + id="chunk-id", + created=1, + model="fake", + object="chat.completion.chunk", + choices=[Choice(index=0, delta=ChoiceDelta(tool_calls=[tool_call_delta4]))], + usage=CompletionUsage(completion_tokens=1, prompt_tokens=1, total_tokens=2), + ) + + async def fake_stream() -> AsyncIterator[ChatCompletionChunk]: + for c in (chunk1, chunk2, chunk3, chunk4): + yield c + + async def patched_fetch_response(self, *args, **kwargs): + resp = Response( + id="resp-id", + created_at=0, + model="fake-model", + object="response", + output=[], + tool_choice="none", + tools=[], + parallel_tool_calls=False, + ) + return resp, fake_stream() + + monkeypatch.setattr(LitellmModel, "_fetch_response", patched_fetch_response) + model = LitellmProvider().get_model("gpt-4") + output_events = [] + async for event in model.stream_response( + system_instructions=None, + input="", + model_settings=ModelSettings(), + tools=[], + output_schema=None, + handoffs=[], + tracing=ModelTracing.DISABLED, + previous_response_id=None, + prompt=None, + ): + output_events.append(event) + + # Extract events by type + function_args_delta_events = [ + e for e in output_events if e.type == "response.function_call_arguments.delta" + ] + output_item_added_events = [e for e in output_events if e.type == "response.output_item.added"] + + # Verify we got real-time streaming (3 argument delta events) + assert len(function_args_delta_events) == 3 + assert len(output_item_added_events) == 1 + + # Verify the deltas were streamed correctly + expected_deltas = ['{"language": "', 'python", "task": "', 'hello world"}'] + for i, delta_event in enumerate(function_args_delta_events): + assert delta_event.delta == expected_deltas[i] + + # Verify function call metadata + added_event = output_item_added_events[0] + assert isinstance(added_event.item, ResponseFunctionToolCall) + assert added_event.item.name == "generate_code" + assert added_event.item.call_id == "litellm-call-456" diff --git a/tests/realtime/test_conversion_helpers.py b/tests/realtime/test_conversion_helpers.py new file mode 100644 index 000000000..859813edd --- /dev/null +++ b/tests/realtime/test_conversion_helpers.py @@ -0,0 +1,375 @@ +import base64 +from unittest.mock import Mock + +from openai.types.beta.realtime.conversation_item import ConversationItem +from openai.types.beta.realtime.conversation_item_create_event import ConversationItemCreateEvent +from openai.types.beta.realtime.conversation_item_truncate_event import ( + ConversationItemTruncateEvent, +) +from openai.types.beta.realtime.input_audio_buffer_append_event import InputAudioBufferAppendEvent +from openai.types.beta.realtime.session_update_event import ( + SessionTracingTracingConfiguration, +) + +from agents.realtime.config import RealtimeModelTracingConfig +from agents.realtime.model_inputs import ( + RealtimeModelSendAudio, + RealtimeModelSendRawMessage, + RealtimeModelSendToolOutput, + RealtimeModelSendUserInput, + RealtimeModelUserInputMessage, +) +from agents.realtime.openai_realtime import _ConversionHelper + + +class TestConversionHelperTryConvertRawMessage: + """Test suite for _ConversionHelper.try_convert_raw_message method.""" + + def test_try_convert_raw_message_valid_session_update(self): + """Test converting a valid session.update raw message.""" + raw_message = RealtimeModelSendRawMessage( + message={ + "type": "session.update", + "other_data": { + "session": { + "modalities": ["text", "audio"], + "voice": "ash", + } + }, + } + ) + + result = _ConversionHelper.try_convert_raw_message(raw_message) + + assert result is not None + assert result.type == "session.update" + + def test_try_convert_raw_message_valid_response_create(self): + """Test converting a valid response.create raw message.""" + raw_message = RealtimeModelSendRawMessage( + message={ + "type": "response.create", + "other_data": {}, + } + ) + + result = _ConversionHelper.try_convert_raw_message(raw_message) + + assert result is not None + assert result.type == "response.create" + + def test_try_convert_raw_message_invalid_type(self): + """Test converting an invalid message type returns None.""" + raw_message = RealtimeModelSendRawMessage( + message={ + "type": "invalid.message.type", + "other_data": {}, + } + ) + + result = _ConversionHelper.try_convert_raw_message(raw_message) + + assert result is None + + def test_try_convert_raw_message_malformed_data(self): + """Test converting malformed message data returns None.""" + raw_message = RealtimeModelSendRawMessage( + message={ + "type": "session.update", + "other_data": { + "session": "invalid_session_data" # Should be dict + }, + } + ) + + result = _ConversionHelper.try_convert_raw_message(raw_message) + + assert result is None + + def test_try_convert_raw_message_missing_type(self): + """Test converting message without type returns None.""" + raw_message = RealtimeModelSendRawMessage( + message={ + "type": "missing.type.test", + "other_data": {"some": "data"}, + } + ) + + result = _ConversionHelper.try_convert_raw_message(raw_message) + + assert result is None + + +class TestConversionHelperTracingConfig: + """Test suite for _ConversionHelper.convert_tracing_config method.""" + + def test_convert_tracing_config_none(self): + """Test converting None tracing config.""" + result = _ConversionHelper.convert_tracing_config(None) + assert result is None + + def test_convert_tracing_config_auto(self): + """Test converting 'auto' tracing config.""" + result = _ConversionHelper.convert_tracing_config("auto") + assert result == "auto" + + def test_convert_tracing_config_dict_full(self): + """Test converting full tracing config dict.""" + tracing_config: RealtimeModelTracingConfig = { + "group_id": "test-group", + "metadata": {"env": "test"}, + "workflow_name": "test-workflow", + } + + result = _ConversionHelper.convert_tracing_config(tracing_config) + + assert isinstance(result, SessionTracingTracingConfiguration) + assert result.group_id == "test-group" + assert result.metadata == {"env": "test"} + assert result.workflow_name == "test-workflow" + + def test_convert_tracing_config_dict_partial(self): + """Test converting partial tracing config dict.""" + tracing_config: RealtimeModelTracingConfig = { + "group_id": "test-group", + } + + result = _ConversionHelper.convert_tracing_config(tracing_config) + + assert isinstance(result, SessionTracingTracingConfiguration) + assert result.group_id == "test-group" + assert result.metadata is None + assert result.workflow_name is None + + def test_convert_tracing_config_empty_dict(self): + """Test converting empty tracing config dict.""" + tracing_config: RealtimeModelTracingConfig = {} + + result = _ConversionHelper.convert_tracing_config(tracing_config) + + assert isinstance(result, SessionTracingTracingConfiguration) + assert result.group_id is None + assert result.metadata is None + assert result.workflow_name is None + + +class TestConversionHelperUserInput: + """Test suite for _ConversionHelper user input conversion methods.""" + + def test_convert_user_input_to_conversation_item_string(self): + """Test converting string user input to conversation item.""" + event = RealtimeModelSendUserInput(user_input="Hello, world!") + + result = _ConversionHelper.convert_user_input_to_conversation_item(event) + + assert isinstance(result, ConversationItem) + assert result.type == "message" + assert result.role == "user" + assert result.content is not None + assert len(result.content) == 1 + assert result.content[0].type == "input_text" + assert result.content[0].text == "Hello, world!" + + def test_convert_user_input_to_conversation_item_dict(self): + """Test converting dict user input to conversation item.""" + user_input_dict: RealtimeModelUserInputMessage = { + "type": "message", + "role": "user", + "content": [ + {"type": "input_text", "text": "Hello"}, + {"type": "input_text", "text": "World"}, + ], + } + event = RealtimeModelSendUserInput(user_input=user_input_dict) + + result = _ConversionHelper.convert_user_input_to_conversation_item(event) + + assert isinstance(result, ConversationItem) + assert result.type == "message" + assert result.role == "user" + assert result.content is not None + assert len(result.content) == 2 + assert result.content[0].type == "input_text" + assert result.content[0].text == "Hello" + assert result.content[1].type == "input_text" + assert result.content[1].text == "World" + + def test_convert_user_input_to_conversation_item_dict_empty_content(self): + """Test converting dict user input with empty content.""" + user_input_dict: RealtimeModelUserInputMessage = { + "type": "message", + "role": "user", + "content": [], + } + event = RealtimeModelSendUserInput(user_input=user_input_dict) + + result = _ConversionHelper.convert_user_input_to_conversation_item(event) + + assert isinstance(result, ConversationItem) + assert result.type == "message" + assert result.role == "user" + assert result.content is not None + assert len(result.content) == 0 + + def test_convert_user_input_to_item_create(self): + """Test converting user input to item create event.""" + event = RealtimeModelSendUserInput(user_input="Test message") + + result = _ConversionHelper.convert_user_input_to_item_create(event) + + assert isinstance(result, ConversationItemCreateEvent) + assert result.type == "conversation.item.create" + assert isinstance(result.item, ConversationItem) + assert result.item.type == "message" + assert result.item.role == "user" + + +class TestConversionHelperAudio: + """Test suite for _ConversionHelper.convert_audio_to_input_audio_buffer_append.""" + + def test_convert_audio_to_input_audio_buffer_append(self): + """Test converting audio data to input audio buffer append event.""" + audio_data = b"test audio data" + event = RealtimeModelSendAudio(audio=audio_data, commit=False) + + result = _ConversionHelper.convert_audio_to_input_audio_buffer_append(event) + + assert isinstance(result, InputAudioBufferAppendEvent) + assert result.type == "input_audio_buffer.append" + + # Verify base64 encoding + expected_b64 = base64.b64encode(audio_data).decode("utf-8") + assert result.audio == expected_b64 + + def test_convert_audio_to_input_audio_buffer_append_empty(self): + """Test converting empty audio data.""" + audio_data = b"" + event = RealtimeModelSendAudio(audio=audio_data, commit=True) + + result = _ConversionHelper.convert_audio_to_input_audio_buffer_append(event) + + assert isinstance(result, InputAudioBufferAppendEvent) + assert result.type == "input_audio_buffer.append" + assert result.audio == "" + + def test_convert_audio_to_input_audio_buffer_append_large_data(self): + """Test converting large audio data.""" + audio_data = b"x" * 10000 # Large audio buffer + event = RealtimeModelSendAudio(audio=audio_data, commit=False) + + result = _ConversionHelper.convert_audio_to_input_audio_buffer_append(event) + + assert isinstance(result, InputAudioBufferAppendEvent) + assert result.type == "input_audio_buffer.append" + + # Verify it can be decoded back + decoded = base64.b64decode(result.audio) + assert decoded == audio_data + + +class TestConversionHelperToolOutput: + """Test suite for _ConversionHelper.convert_tool_output method.""" + + def test_convert_tool_output(self): + """Test converting tool output to conversation item create event.""" + mock_tool_call = Mock() + mock_tool_call.call_id = "call_123" + + event = RealtimeModelSendToolOutput( + tool_call=mock_tool_call, + output="Function executed successfully", + start_response=False, + ) + + result = _ConversionHelper.convert_tool_output(event) + + assert isinstance(result, ConversationItemCreateEvent) + assert result.type == "conversation.item.create" + assert isinstance(result.item, ConversationItem) + assert result.item.type == "function_call_output" + assert result.item.output == "Function executed successfully" + assert result.item.call_id == "call_123" + + def test_convert_tool_output_no_call_id(self): + """Test converting tool output with None call_id.""" + mock_tool_call = Mock() + mock_tool_call.call_id = None + + event = RealtimeModelSendToolOutput( + tool_call=mock_tool_call, + output="Output without call ID", + start_response=False, + ) + + result = _ConversionHelper.convert_tool_output(event) + + assert isinstance(result, ConversationItemCreateEvent) + assert result.type == "conversation.item.create" + assert result.item.call_id is None + + def test_convert_tool_output_empty_output(self): + """Test converting tool output with empty output.""" + mock_tool_call = Mock() + mock_tool_call.call_id = "call_456" + + event = RealtimeModelSendToolOutput( + tool_call=mock_tool_call, + output="", + start_response=True, + ) + + result = _ConversionHelper.convert_tool_output(event) + + assert isinstance(result, ConversationItemCreateEvent) + assert result.item.output == "" + assert result.item.call_id == "call_456" + + +class TestConversionHelperInterrupt: + """Test suite for _ConversionHelper.convert_interrupt method.""" + + def test_convert_interrupt(self): + """Test converting interrupt parameters to conversation item truncate event.""" + current_item_id = "item_789" + current_audio_content_index = 2 + elapsed_time_ms = 1500 + + result = _ConversionHelper.convert_interrupt( + current_item_id, current_audio_content_index, elapsed_time_ms + ) + + assert isinstance(result, ConversationItemTruncateEvent) + assert result.type == "conversation.item.truncate" + assert result.item_id == "item_789" + assert result.content_index == 2 + assert result.audio_end_ms == 1500 + + def test_convert_interrupt_zero_time(self): + """Test converting interrupt with zero elapsed time.""" + result = _ConversionHelper.convert_interrupt("item_1", 0, 0) + + assert isinstance(result, ConversationItemTruncateEvent) + assert result.type == "conversation.item.truncate" + assert result.item_id == "item_1" + assert result.content_index == 0 + assert result.audio_end_ms == 0 + + def test_convert_interrupt_large_values(self): + """Test converting interrupt with large values.""" + result = _ConversionHelper.convert_interrupt("item_xyz", 99, 999999) + + assert isinstance(result, ConversationItemTruncateEvent) + assert result.type == "conversation.item.truncate" + assert result.item_id == "item_xyz" + assert result.content_index == 99 + assert result.audio_end_ms == 999999 + + def test_convert_interrupt_empty_item_id(self): + """Test converting interrupt with empty item ID.""" + result = _ConversionHelper.convert_interrupt("", 1, 100) + + assert isinstance(result, ConversationItemTruncateEvent) + assert result.type == "conversation.item.truncate" + assert result.item_id == "" + assert result.content_index == 1 + assert result.audio_end_ms == 100 diff --git a/tests/realtime/test_openai_realtime.py b/tests/realtime/test_openai_realtime.py index 9ecc433ca..5cb0eb0fa 100644 --- a/tests/realtime/test_openai_realtime.py +++ b/tests/realtime/test_openai_realtime.py @@ -292,6 +292,7 @@ async def test_handle_tool_call_event_success(self, model): "output_index": 0, "item": { "id": "call_123", + "call_id": "call_123", "type": "function_call", "status": "completed", "name": "get_weather", diff --git a/tests/realtime/test_realtime_handoffs.py b/tests/realtime/test_realtime_handoffs.py new file mode 100644 index 000000000..07385fe20 --- /dev/null +++ b/tests/realtime/test_realtime_handoffs.py @@ -0,0 +1,96 @@ +"""Tests for realtime handoff functionality.""" + +from unittest.mock import Mock + +import pytest + +from agents import Agent +from agents.realtime import RealtimeAgent, realtime_handoff + + +def test_realtime_handoff_creation(): + """Test basic realtime handoff creation.""" + realtime_agent = RealtimeAgent(name="test_agent") + handoff_obj = realtime_handoff(realtime_agent) + + assert handoff_obj.agent_name == "test_agent" + assert handoff_obj.tool_name == "transfer_to_test_agent" + assert handoff_obj.input_filter is None # Should not support input filters + assert handoff_obj.is_enabled is True + + +def test_realtime_handoff_with_custom_params(): + """Test realtime handoff with custom parameters.""" + realtime_agent = RealtimeAgent( + name="helper_agent", + handoff_description="Helps with general tasks", + ) + + handoff_obj = realtime_handoff( + realtime_agent, + tool_name_override="custom_handoff", + tool_description_override="Custom handoff description", + is_enabled=False, + ) + + assert handoff_obj.agent_name == "helper_agent" + assert handoff_obj.tool_name == "custom_handoff" + assert handoff_obj.tool_description == "Custom handoff description" + assert handoff_obj.is_enabled is False + + +@pytest.mark.asyncio +async def test_realtime_handoff_execution(): + """Test that realtime handoff returns the correct agent.""" + realtime_agent = RealtimeAgent(name="target_agent") + handoff_obj = realtime_handoff(realtime_agent) + + # Mock context + mock_context = Mock() + + # Execute handoff + result = await handoff_obj.on_invoke_handoff(mock_context, "") + + assert result is realtime_agent + assert isinstance(result, RealtimeAgent) + + +def test_realtime_handoff_with_on_handoff_callback(): + """Test realtime handoff with custom on_handoff callback.""" + realtime_agent = RealtimeAgent(name="callback_agent") + callback_called = [] + + def on_handoff_callback(ctx): + callback_called.append(True) + + handoff_obj = realtime_handoff( + realtime_agent, + on_handoff=on_handoff_callback, + ) + + assert handoff_obj.agent_name == "callback_agent" + + +def test_regular_agent_handoff_still_works(): + """Test that regular Agent handoffs still work with the new generic types.""" + from agents import handoff + + regular_agent = Agent(name="regular_agent") + handoff_obj = handoff(regular_agent) + + assert handoff_obj.agent_name == "regular_agent" + assert handoff_obj.tool_name == "transfer_to_regular_agent" + # Regular agent handoffs should support input filters + assert hasattr(handoff_obj, "input_filter") + + +def test_type_annotations_work(): + """Test that type annotations work correctly.""" + from agents.handoffs import Handoff + from agents.realtime.handoffs import realtime_handoff + + realtime_agent = RealtimeAgent(name="typed_agent") + handoff_obj = realtime_handoff(realtime_agent) + + # This should be typed as Handoff[Any, RealtimeAgent[Any]] + assert isinstance(handoff_obj, Handoff) diff --git a/tests/realtime/test_session.py b/tests/realtime/test_session.py index 4cc0dae6b..7c1eb53ff 100644 --- a/tests/realtime/test_session.py +++ b/tests/realtime/test_session.py @@ -1,5 +1,5 @@ from typing import cast -from unittest.mock import AsyncMock, Mock +from unittest.mock import AsyncMock, Mock, PropertyMock import pytest @@ -101,6 +101,8 @@ async def close(self): def mock_agent(): agent = Mock(spec=RealtimeAgent) agent.get_all_tools = AsyncMock(return_value=[]) + + type(agent).handoffs = PropertyMock(return_value=[]) return agent @@ -293,7 +295,7 @@ async def test_item_updated_event_updates_existing_item(self, mock_model, mock_a # Check that item was updated assert len(session._history) == 1 updated_item = cast(AssistantMessageItem, session._history[0]) - assert updated_item.content[0].text == "Updated" + assert updated_item.content[0].text == "Updated" # type: ignore # Should have 2 events: raw + history updated (not added) assert session._event_queue.qsize() == 2 @@ -524,7 +526,7 @@ def test_update_existing_item_by_id(self): # Item should be updated result_item = cast(AssistantMessageItem, new_history[0]) assert result_item.item_id == "item_1" - assert result_item.content[0].text == "Updated" + assert result_item.content[0].text == "Updated" # type: ignore def test_update_existing_item_preserves_order(self): """Test that updating existing item preserves its position in history""" @@ -557,13 +559,13 @@ def test_update_existing_item_preserves_order(self): # Middle item should be updated updated_result = cast(AssistantMessageItem, new_history[1]) - assert updated_result.content[0].text == "Updated Second" + assert updated_result.content[0].text == "Updated Second" # type: ignore # Other items should be unchanged item1_result = cast(AssistantMessageItem, new_history[0]) item3_result = cast(AssistantMessageItem, new_history[2]) - assert item1_result.content[0].text == "First" - assert item3_result.content[0].text == "Third" + assert item1_result.content[0].text == "First" # type: ignore + assert item3_result.content[0].text == "Third" # type: ignore def test_insert_new_item_after_previous_item(self): """Test inserting new item after specified previous_item_id""" @@ -598,7 +600,7 @@ def test_insert_new_item_after_previous_item(self): # Content should be correct item2_result = cast(AssistantMessageItem, new_history[1]) - assert item2_result.content[0].text == "Second" + assert item2_result.content[0].text == "Second" # type: ignore def test_insert_new_item_after_nonexistent_previous_item(self): """Test that item with nonexistent previous_item_id gets added to end""" @@ -701,7 +703,7 @@ def test_complex_insertion_scenario(self): assert len(history) == 4 assert [item.item_id for item in history] == ["A", "B", "D", "C"] itemB_result = cast(AssistantMessageItem, history[1]) - assert itemB_result.content[0].text == "Updated B" + assert itemB_result.content[0].text == "Updated B" # type: ignore # Test 3: Tool call execution flow (_handle_tool_call method) @@ -794,30 +796,26 @@ async def test_function_tool_with_multiple_tools_available(self, mock_model, moc assert sent_output == "result_two" @pytest.mark.asyncio - async def test_handoff_tool_handling(self, mock_model, mock_agent, mock_handoff): - """Test that handoff tools are properly handled""" - from unittest.mock import AsyncMock - - from agents.realtime.agent import RealtimeAgent - - # Create a mock new agent to be returned by handoff - mock_new_agent = Mock(spec=RealtimeAgent) - mock_new_agent.name = "new_agent" - mock_new_agent.instructions = "New agent instructions" - mock_new_agent.get_all_tools = AsyncMock(return_value=[]) - mock_new_agent.get_system_prompt = AsyncMock(return_value="New agent system prompt") - - # Set up handoff to return the new agent - mock_handoff.on_invoke_handoff = AsyncMock(return_value=mock_new_agent) - mock_handoff.name = "test_handoff" + async def test_handoff_tool_handling(self, mock_model): + first_agent = RealtimeAgent( + name="first_agent", + instructions="first_agent_instructions", + tools=[], + handoffs=[], + ) + second_agent = RealtimeAgent( + name="second_agent", + instructions="second_agent_instructions", + tools=[], + handoffs=[], + ) - # Set up agent to return handoff tool - mock_agent.get_all_tools.return_value = [mock_handoff] + first_agent.handoffs = [second_agent] - session = RealtimeSession(mock_model, mock_agent, None) + session = RealtimeSession(mock_model, first_agent, None) tool_call_event = RealtimeModelToolCallEvent( - name="test_handoff", call_id="call_789", arguments="{}" + name=Handoff.default_tool_name(second_agent), call_id="call_789", arguments="{}" ) await session._handle_tool_call(tool_call_event) @@ -829,7 +827,7 @@ async def test_handoff_tool_handling(self, mock_model, mock_agent, mock_handoff) assert session._event_queue.qsize() >= 1 # Verify agent was updated - assert session._current_agent == mock_new_agent + assert session._current_agent == second_agent @pytest.mark.asyncio async def test_unknown_tool_handling(self, mock_model, mock_agent, mock_function_tool): diff --git a/tests/realtime/test_tracing.py b/tests/realtime/test_tracing.py index 4cff46c49..85da63897 100644 --- a/tests/realtime/test_tracing.py +++ b/tests/realtime/test_tracing.py @@ -99,22 +99,18 @@ async def async_websocket(*args, **kwargs): await model._handle_ws_event(session_created_event) # Should send session.update with tracing config - from agents.realtime.model_inputs import RealtimeModelSendRawMessage - - expected_event = RealtimeModelSendRawMessage( - message={ - "type": "session.update", - "other_data": { - "session": { - "tracing": { - "workflow_name": "test_workflow", - "group_id": "group_123", - } - } - }, - } + from openai.types.beta.realtime.session_update_event import ( + SessionTracingTracingConfiguration, + SessionUpdateEvent, ) - mock_send_raw_message.assert_called_once_with(expected_event) + + mock_send_raw_message.assert_called_once() + call_args = mock_send_raw_message.call_args[0][0] + assert isinstance(call_args, SessionUpdateEvent) + assert call_args.type == "session.update" + assert isinstance(call_args.session.tracing, SessionTracingTracingConfiguration) + assert call_args.session.tracing.workflow_name == "test_workflow" + assert call_args.session.tracing.group_id == "group_123" @pytest.mark.asyncio async def test_send_tracing_config_auto_mode(self, model, mock_websocket): @@ -144,15 +140,13 @@ async def async_websocket(*args, **kwargs): await model._handle_ws_event(session_created_event) # Should send session.update with "auto" - from agents.realtime.model_inputs import RealtimeModelSendRawMessage + from openai.types.beta.realtime.session_update_event import SessionUpdateEvent - expected_event = RealtimeModelSendRawMessage( - message={ - "type": "session.update", - "other_data": {"session": {"tracing": "auto"}}, - } - ) - mock_send_raw_message.assert_called_once_with(expected_event) + mock_send_raw_message.assert_called_once() + call_args = mock_send_raw_message.call_args[0][0] + assert isinstance(call_args, SessionUpdateEvent) + assert call_args.type == "session.update" + assert call_args.session.tracing == "auto" @pytest.mark.asyncio async def test_tracing_config_none_skips_session_update(self, model, mock_websocket): @@ -209,22 +203,18 @@ async def async_websocket(*args, **kwargs): await model._handle_ws_event(session_created_event) # Should send session.update with complete tracing config including metadata - from agents.realtime.model_inputs import RealtimeModelSendRawMessage - - expected_event = RealtimeModelSendRawMessage( - message={ - "type": "session.update", - "other_data": { - "session": { - "tracing": { - "workflow_name": "complex_workflow", - "metadata": complex_metadata, - } - } - }, - } + from openai.types.beta.realtime.session_update_event import ( + SessionTracingTracingConfiguration, + SessionUpdateEvent, ) - mock_send_raw_message.assert_called_once_with(expected_event) + + mock_send_raw_message.assert_called_once() + call_args = mock_send_raw_message.call_args[0][0] + assert isinstance(call_args, SessionUpdateEvent) + assert call_args.type == "session.update" + assert isinstance(call_args.session.tracing, SessionTracingTracingConfiguration) + assert call_args.session.tracing.workflow_name == "complex_workflow" + assert call_args.session.tracing.metadata == complex_metadata @pytest.mark.asyncio async def test_tracing_disabled_prevents_tracing(self, mock_websocket): diff --git a/tests/test_handoff_tool.py b/tests/test_handoff_tool.py index 0f7fc2166..291f0a4f5 100644 --- a/tests/test_handoff_tool.py +++ b/tests/test_handoff_tool.py @@ -1,3 +1,4 @@ +import inspect import json from typing import Any @@ -318,6 +319,8 @@ def always_enabled(ctx: RunContextWrapper[Any], agent: Agent[Any]) -> bool: handoff_callable_enabled = handoff(agent, is_enabled=always_enabled) assert callable(handoff_callable_enabled.is_enabled) result = handoff_callable_enabled.is_enabled(RunContextWrapper(agent), agent) + assert inspect.isawaitable(result) + result = await result assert result is True # Test callable that returns False @@ -327,6 +330,8 @@ def always_disabled(ctx: RunContextWrapper[Any], agent: Agent[Any]) -> bool: handoff_callable_disabled = handoff(agent, is_enabled=always_disabled) assert callable(handoff_callable_disabled.is_enabled) result = handoff_callable_disabled.is_enabled(RunContextWrapper(agent), agent) + assert inspect.isawaitable(result) + result = await result assert result is False # Test async callable diff --git a/tests/test_openai_chatcompletions_stream.py b/tests/test_openai_chatcompletions_stream.py index 49e7bc2f4..cbb3c5dae 100644 --- a/tests/test_openai_chatcompletions_stream.py +++ b/tests/test_openai_chatcompletions_stream.py @@ -214,17 +214,18 @@ async def test_stream_response_yields_events_for_tool_call(monkeypatch) -> None: the model is streaming a function/tool call instead of plain text. The function call will be split across two chunks. """ - # Simulate a single tool call whose ID stays constant and function name/args built over chunks. + # Simulate a single tool call with complete function name in first chunk + # and arguments split across chunks (reflecting real OpenAI API behavior) tool_call_delta1 = ChoiceDeltaToolCall( index=0, id="tool-id", - function=ChoiceDeltaToolCallFunction(name="my_", arguments="arg1"), + function=ChoiceDeltaToolCallFunction(name="my_func", arguments="arg1"), type="function", ) tool_call_delta2 = ChoiceDeltaToolCall( index=0, id="tool-id", - function=ChoiceDeltaToolCallFunction(name="func", arguments="arg2"), + function=ChoiceDeltaToolCallFunction(name=None, arguments="arg2"), type="function", ) chunk1 = ChatCompletionChunk( @@ -284,18 +285,154 @@ async def patched_fetch_response(self, *args, **kwargs): # The added item should be a ResponseFunctionToolCall. added_fn = output_events[1].item assert isinstance(added_fn, ResponseFunctionToolCall) - assert added_fn.name == "my_func" # Name should be concatenation of both chunks. - assert added_fn.arguments == "arg1arg2" + assert added_fn.name == "my_func" # Name should be complete from first chunk + assert added_fn.arguments == "" # Arguments start empty assert output_events[2].type == "response.function_call_arguments.delta" - assert output_events[2].delta == "arg1arg2" - assert output_events[3].type == "response.output_item.done" - assert output_events[4].type == "response.completed" - assert output_events[2].delta == "arg1arg2" - assert output_events[3].type == "response.output_item.done" - assert output_events[4].type == "response.completed" - assert added_fn.name == "my_func" # Name should be concatenation of both chunks. - assert added_fn.arguments == "arg1arg2" - assert output_events[2].type == "response.function_call_arguments.delta" - assert output_events[2].delta == "arg1arg2" - assert output_events[3].type == "response.output_item.done" - assert output_events[4].type == "response.completed" + assert output_events[2].delta == "arg1" # First argument chunk + assert output_events[3].type == "response.function_call_arguments.delta" + assert output_events[3].delta == "arg2" # Second argument chunk + assert output_events[4].type == "response.output_item.done" + assert output_events[5].type == "response.completed" + # Final function call should have complete arguments + final_fn = output_events[4].item + assert isinstance(final_fn, ResponseFunctionToolCall) + assert final_fn.name == "my_func" + assert final_fn.arguments == "arg1arg2" + + +@pytest.mark.allow_call_model_methods +@pytest.mark.asyncio +async def test_stream_response_yields_real_time_function_call_arguments(monkeypatch) -> None: + """ + Validate that `stream_response` emits function call arguments in real-time as they + are received, not just at the end. This test simulates the real OpenAI API behavior + where function name comes first, then arguments are streamed incrementally. + """ + # Simulate realistic OpenAI API chunks: name first, then arguments incrementally + tool_call_delta1 = ChoiceDeltaToolCall( + index=0, + id="tool-call-123", + function=ChoiceDeltaToolCallFunction(name="write_file", arguments=""), + type="function", + ) + tool_call_delta2 = ChoiceDeltaToolCall( + index=0, + function=ChoiceDeltaToolCallFunction(arguments='{"filename": "'), + type="function", + ) + tool_call_delta3 = ChoiceDeltaToolCall( + index=0, + function=ChoiceDeltaToolCallFunction(arguments='test.py", "content": "'), + type="function", + ) + tool_call_delta4 = ChoiceDeltaToolCall( + index=0, + function=ChoiceDeltaToolCallFunction(arguments='print(hello)"}'), + type="function", + ) + + chunk1 = ChatCompletionChunk( + id="chunk-id", + created=1, + model="fake", + object="chat.completion.chunk", + choices=[Choice(index=0, delta=ChoiceDelta(tool_calls=[tool_call_delta1]))], + ) + chunk2 = ChatCompletionChunk( + id="chunk-id", + created=1, + model="fake", + object="chat.completion.chunk", + choices=[Choice(index=0, delta=ChoiceDelta(tool_calls=[tool_call_delta2]))], + ) + chunk3 = ChatCompletionChunk( + id="chunk-id", + created=1, + model="fake", + object="chat.completion.chunk", + choices=[Choice(index=0, delta=ChoiceDelta(tool_calls=[tool_call_delta3]))], + ) + chunk4 = ChatCompletionChunk( + id="chunk-id", + created=1, + model="fake", + object="chat.completion.chunk", + choices=[Choice(index=0, delta=ChoiceDelta(tool_calls=[tool_call_delta4]))], + usage=CompletionUsage(completion_tokens=1, prompt_tokens=1, total_tokens=2), + ) + + async def fake_stream() -> AsyncIterator[ChatCompletionChunk]: + for c in (chunk1, chunk2, chunk3, chunk4): + yield c + + async def patched_fetch_response(self, *args, **kwargs): + resp = Response( + id="resp-id", + created_at=0, + model="fake-model", + object="response", + output=[], + tool_choice="none", + tools=[], + parallel_tool_calls=False, + ) + return resp, fake_stream() + + monkeypatch.setattr(OpenAIChatCompletionsModel, "_fetch_response", patched_fetch_response) + model = OpenAIProvider(use_responses=False).get_model("gpt-4") + output_events = [] + async for event in model.stream_response( + system_instructions=None, + input="", + model_settings=ModelSettings(), + tools=[], + output_schema=None, + handoffs=[], + tracing=ModelTracing.DISABLED, + previous_response_id=None, + prompt=None, + ): + output_events.append(event) + + # Extract events by type + created_events = [e for e in output_events if e.type == "response.created"] + output_item_added_events = [e for e in output_events if e.type == "response.output_item.added"] + function_args_delta_events = [ + e for e in output_events if e.type == "response.function_call_arguments.delta" + ] + output_item_done_events = [e for e in output_events if e.type == "response.output_item.done"] + completed_events = [e for e in output_events if e.type == "response.completed"] + + # Verify event structure + assert len(created_events) == 1 + assert len(output_item_added_events) == 1 + assert len(function_args_delta_events) == 3 # Three incremental argument chunks + assert len(output_item_done_events) == 1 + assert len(completed_events) == 1 + + # Verify the function call started as soon as we had name and ID + added_event = output_item_added_events[0] + assert isinstance(added_event.item, ResponseFunctionToolCall) + assert added_event.item.name == "write_file" + assert added_event.item.call_id == "tool-call-123" + assert added_event.item.arguments == "" # Should be empty at start + + # Verify real-time argument streaming + expected_deltas = ['{"filename": "', 'test.py", "content": "', 'print(hello)"}'] + for i, delta_event in enumerate(function_args_delta_events): + assert delta_event.delta == expected_deltas[i] + assert delta_event.item_id == "__fake_id__" # FAKE_RESPONSES_ID + assert delta_event.output_index == 0 + + # Verify completion event has full arguments + done_event = output_item_done_events[0] + assert isinstance(done_event.item, ResponseFunctionToolCall) + assert done_event.item.name == "write_file" + assert done_event.item.arguments == '{"filename": "test.py", "content": "print(hello)"}' + + # Verify final response + completed_event = completed_events[0] + function_call_output = completed_event.response.output[0] + assert isinstance(function_call_output, ResponseFunctionToolCall) + assert function_call_output.name == "write_file" + assert function_call_output.arguments == '{"filename": "test.py", "content": "print(hello)"}' diff --git a/uv.lock b/uv.lock index 7d0621d88..2e14dd804 100644 --- a/uv.lock +++ b/uv.lock @@ -1047,22 +1047,24 @@ wheels = [ [[package]] name = "mcp" -version = "1.9.4" +version = "1.11.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "anyio", marker = "python_full_version >= '3.10'" }, { name = "httpx", marker = "python_full_version >= '3.10'" }, { name = "httpx-sse", marker = "python_full_version >= '3.10'" }, + { name = "jsonschema", marker = "python_full_version >= '3.10'" }, { name = "pydantic", marker = "python_full_version >= '3.10'" }, { name = "pydantic-settings", marker = "python_full_version >= '3.10'" }, { name = "python-multipart", marker = "python_full_version >= '3.10'" }, + { name = "pywin32", marker = "python_full_version >= '3.10' and sys_platform == 'win32'" }, { name = "sse-starlette", marker = "python_full_version >= '3.10'" }, { name = "starlette", marker = "python_full_version >= '3.10'" }, { name = "uvicorn", marker = "python_full_version >= '3.10' and sys_platform != 'emscripten'" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/06/f2/dc2450e566eeccf92d89a00c3e813234ad58e2ba1e31d11467a09ac4f3b9/mcp-1.9.4.tar.gz", hash = "sha256:cfb0bcd1a9535b42edaef89947b9e18a8feb49362e1cc059d6e7fc636f2cb09f", size = 333294, upload-time = "2025-06-12T08:20:30.158Z" } +sdist = { url = "https://files.pythonhosted.org/packages/3a/f5/9506eb5578d5bbe9819ee8ba3198d0ad0e2fbe3bab8b257e4131ceb7dfb6/mcp-1.11.0.tar.gz", hash = "sha256:49a213df56bb9472ff83b3132a4825f5c8f5b120a90246f08b0dac6bedac44c8", size = 406907, upload-time = "2025-07-10T16:41:09.388Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/97/fc/80e655c955137393c443842ffcc4feccab5b12fa7cb8de9ced90f90e6998/mcp-1.9.4-py3-none-any.whl", hash = "sha256:7fcf36b62936adb8e63f89346bccca1268eeca9bf6dfb562ee10b1dfbda9dac0", size = 130232, upload-time = "2025-06-12T08:20:28.551Z" }, + { url = "https://files.pythonhosted.org/packages/92/9c/c9ca79f9c512e4113a5d07043013110bb3369fc7770040c61378c7fbcf70/mcp-1.11.0-py3-none-any.whl", hash = "sha256:58deac37f7483e4b338524b98bc949b7c2b7c33d978f5fafab5bde041c5e2595", size = 155880, upload-time = "2025-07-10T16:41:07.935Z" }, ] [[package]] @@ -1461,7 +1463,7 @@ wheels = [ [[package]] name = "openai" -version = "1.93.1" +version = "1.96.1" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "anyio" }, @@ -1473,14 +1475,14 @@ dependencies = [ { name = "tqdm" }, { name = "typing-extensions" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/5e/a8/e4427729da048cb33bda15e70f09f7520bdf3577bafc546b135ecb36af7d/openai-1.93.1.tar.gz", hash = "sha256:11eb8932965d0f79ecc4cb38a60a0c4cef4bcd5fcf08b99fc9a399fa5f1e50ab", size = 487124, upload-time = "2025-07-07T16:40:38.389Z" } +sdist = { url = "https://files.pythonhosted.org/packages/2f/b5/18fd5e1b6b6c7dca52d60307b3637f9e9e3206a8041a9c8028985dbc6260/openai-1.96.1.tar.gz", hash = "sha256:6d505b5cc550e036bfa3fe99d6cff565b11491d12378d4c353f92ef72b0a408a", size = 489065, upload-time = "2025-07-15T21:39:37.215Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/64/4f/875e5af1fb4e5ed4ea9e4a88f482d9ca2e48932105605b6c516e9a14de25/openai-1.93.1-py3-none-any.whl", hash = "sha256:a2c2946c4f21346d4902311a7440381fd8a33466ee7ca688133d1cad29a9357c", size = 755081, upload-time = "2025-07-07T16:40:36.585Z" }, + { url = "https://files.pythonhosted.org/packages/4f/57/325bbdbdc27b47309be35cb4e0eb8980b0c1bc997194c797c3691d88ae41/openai-1.96.1-py3-none-any.whl", hash = "sha256:0afaab2019bae8e145e7a1baf6953167084f019dd15042c65edd117398c1eb1c", size = 757454, upload-time = "2025-07-15T21:39:34.517Z" }, ] [[package]] name = "openai-agents" -version = "0.2.0" +version = "0.2.1" source = { editable = "." } dependencies = [ { name = "griffe" }, @@ -1537,9 +1539,9 @@ requires-dist = [ { name = "graphviz", marker = "extra == 'viz'", specifier = ">=0.17" }, { name = "griffe", specifier = ">=1.5.6,<2" }, { name = "litellm", marker = "extra == 'litellm'", specifier = ">=1.67.4.post1,<2" }, - { name = "mcp", marker = "python_full_version >= '3.10'", specifier = ">=1.9.4,<2" }, + { name = "mcp", marker = "python_full_version >= '3.10'", specifier = ">=1.11.0,<2" }, { name = "numpy", marker = "python_full_version >= '3.10' and extra == 'voice'", specifier = ">=2.2.0,<3" }, - { name = "openai", specifier = ">=1.93.1,<2" }, + { name = "openai", specifier = ">=1.96.1,<2" }, { name = "pydantic", specifier = ">=2.10,<3" }, { name = "requests", specifier = ">=2.0,<3" }, { name = "types-requests", specifier = ">=2.0,<3" }, @@ -2111,6 +2113,31 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/fc/b8/ff33610932e0ee81ae7f1269c890f697d56ff74b9f5b2ee5d9b7fa2c5355/python_xlib-0.33-py2.py3-none-any.whl", hash = "sha256:c3534038d42e0df2f1392a1b30a15a4ff5fdc2b86cfa94f072bf11b10a164398", size = 182185, upload-time = "2022-12-25T18:52:58.662Z" }, ] +[[package]] +name = "pywin32" +version = "311" +source = { registry = "https://pypi.org/simple" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/7b/40/44efbb0dfbd33aca6a6483191dae0716070ed99e2ecb0c53683f400a0b4f/pywin32-311-cp310-cp310-win32.whl", hash = "sha256:d03ff496d2a0cd4a5893504789d4a15399133fe82517455e78bad62efbb7f0a3", size = 8760432, upload-time = "2025-07-14T20:13:05.9Z" }, + { url = "https://files.pythonhosted.org/packages/5e/bf/360243b1e953bd254a82f12653974be395ba880e7ec23e3731d9f73921cc/pywin32-311-cp310-cp310-win_amd64.whl", hash = "sha256:797c2772017851984b97180b0bebe4b620bb86328e8a884bb626156295a63b3b", size = 9590103, upload-time = "2025-07-14T20:13:07.698Z" }, + { url = "https://files.pythonhosted.org/packages/57/38/d290720e6f138086fb3d5ffe0b6caa019a791dd57866940c82e4eeaf2012/pywin32-311-cp310-cp310-win_arm64.whl", hash = "sha256:0502d1facf1fed4839a9a51ccbcc63d952cf318f78ffc00a7e78528ac27d7a2b", size = 8778557, upload-time = "2025-07-14T20:13:11.11Z" }, + { url = "https://files.pythonhosted.org/packages/7c/af/449a6a91e5d6db51420875c54f6aff7c97a86a3b13a0b4f1a5c13b988de3/pywin32-311-cp311-cp311-win32.whl", hash = "sha256:184eb5e436dea364dcd3d2316d577d625c0351bf237c4e9a5fabbcfa5a58b151", size = 8697031, upload-time = "2025-07-14T20:13:13.266Z" }, + { url = "https://files.pythonhosted.org/packages/51/8f/9bb81dd5bb77d22243d33c8397f09377056d5c687aa6d4042bea7fbf8364/pywin32-311-cp311-cp311-win_amd64.whl", hash = "sha256:3ce80b34b22b17ccbd937a6e78e7225d80c52f5ab9940fe0506a1a16f3dab503", size = 9508308, upload-time = "2025-07-14T20:13:15.147Z" }, + { url = "https://files.pythonhosted.org/packages/44/7b/9c2ab54f74a138c491aba1b1cd0795ba61f144c711daea84a88b63dc0f6c/pywin32-311-cp311-cp311-win_arm64.whl", hash = "sha256:a733f1388e1a842abb67ffa8e7aad0e70ac519e09b0f6a784e65a136ec7cefd2", size = 8703930, upload-time = "2025-07-14T20:13:16.945Z" }, + { url = "https://files.pythonhosted.org/packages/e7/ab/01ea1943d4eba0f850c3c61e78e8dd59757ff815ff3ccd0a84de5f541f42/pywin32-311-cp312-cp312-win32.whl", hash = "sha256:750ec6e621af2b948540032557b10a2d43b0cee2ae9758c54154d711cc852d31", size = 8706543, upload-time = "2025-07-14T20:13:20.765Z" }, + { url = "https://files.pythonhosted.org/packages/d1/a8/a0e8d07d4d051ec7502cd58b291ec98dcc0c3fff027caad0470b72cfcc2f/pywin32-311-cp312-cp312-win_amd64.whl", hash = "sha256:b8c095edad5c211ff31c05223658e71bf7116daa0ecf3ad85f3201ea3190d067", size = 9495040, upload-time = "2025-07-14T20:13:22.543Z" }, + { url = "https://files.pythonhosted.org/packages/ba/3a/2ae996277b4b50f17d61f0603efd8253cb2d79cc7ae159468007b586396d/pywin32-311-cp312-cp312-win_arm64.whl", hash = "sha256:e286f46a9a39c4a18b319c28f59b61de793654af2f395c102b4f819e584b5852", size = 8710102, upload-time = "2025-07-14T20:13:24.682Z" }, + { url = "https://files.pythonhosted.org/packages/a5/be/3fd5de0979fcb3994bfee0d65ed8ca9506a8a1260651b86174f6a86f52b3/pywin32-311-cp313-cp313-win32.whl", hash = "sha256:f95ba5a847cba10dd8c4d8fefa9f2a6cf283b8b88ed6178fa8a6c1ab16054d0d", size = 8705700, upload-time = "2025-07-14T20:13:26.471Z" }, + { url = "https://files.pythonhosted.org/packages/e3/28/e0a1909523c6890208295a29e05c2adb2126364e289826c0a8bc7297bd5c/pywin32-311-cp313-cp313-win_amd64.whl", hash = "sha256:718a38f7e5b058e76aee1c56ddd06908116d35147e133427e59a3983f703a20d", size = 9494700, upload-time = "2025-07-14T20:13:28.243Z" }, + { url = "https://files.pythonhosted.org/packages/04/bf/90339ac0f55726dce7d794e6d79a18a91265bdf3aa70b6b9ca52f35e022a/pywin32-311-cp313-cp313-win_arm64.whl", hash = "sha256:7b4075d959648406202d92a2310cb990fea19b535c7f4a78d3f5e10b926eeb8a", size = 8709318, upload-time = "2025-07-14T20:13:30.348Z" }, + { url = "https://files.pythonhosted.org/packages/c9/31/097f2e132c4f16d99a22bfb777e0fd88bd8e1c634304e102f313af69ace5/pywin32-311-cp314-cp314-win32.whl", hash = "sha256:b7a2c10b93f8986666d0c803ee19b5990885872a7de910fc460f9b0c2fbf92ee", size = 8840714, upload-time = "2025-07-14T20:13:32.449Z" }, + { url = "https://files.pythonhosted.org/packages/90/4b/07c77d8ba0e01349358082713400435347df8426208171ce297da32c313d/pywin32-311-cp314-cp314-win_amd64.whl", hash = "sha256:3aca44c046bd2ed8c90de9cb8427f581c479e594e99b5c0bb19b29c10fd6cb87", size = 9656800, upload-time = "2025-07-14T20:13:34.312Z" }, + { url = "https://files.pythonhosted.org/packages/c0/d2/21af5c535501a7233e734b8af901574572da66fcc254cb35d0609c9080dd/pywin32-311-cp314-cp314-win_arm64.whl", hash = "sha256:a508e2d9025764a8270f93111a970e1d0fbfc33f4153b388bb649b7eec4f9b42", size = 8932540, upload-time = "2025-07-14T20:13:36.379Z" }, + { url = "https://files.pythonhosted.org/packages/59/42/b86689aac0cdaee7ae1c58d464b0ff04ca909c19bb6502d4973cdd9f9544/pywin32-311-cp39-cp39-win32.whl", hash = "sha256:aba8f82d551a942cb20d4a83413ccbac30790b50efb89a75e4f586ac0bb8056b", size = 8760837, upload-time = "2025-07-14T20:12:59.59Z" }, + { url = "https://files.pythonhosted.org/packages/9f/8a/1403d0353f8c5a2f0829d2b1c4becbf9da2f0a4d040886404fc4a5431e4d/pywin32-311-cp39-cp39-win_amd64.whl", hash = "sha256:e0c4cfb0621281fe40387df582097fd796e80430597cb9944f0ae70447bacd91", size = 9590187, upload-time = "2025-07-14T20:13:01.419Z" }, + { url = "https://files.pythonhosted.org/packages/60/22/e0e8d802f124772cec9c75430b01a212f86f9de7546bda715e54140d5aeb/pywin32-311-cp39-cp39-win_arm64.whl", hash = "sha256:62ea666235135fee79bb154e695f3ff67370afefd71bd7fea7512fc70ef31e3d", size = 8778162, upload-time = "2025-07-14T20:13:03.544Z" }, +] + [[package]] name = "pyyaml" version = "6.0.2"