-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Describe the bug
When streaming mode is enabled, events with partial
set to True
are not stored in the session events list. This events list is then used to build the message sequence on future calls to the LLM. Therefore, the LLM will see an incomplete chat history, which can severely impact the quality of its responses.
In my particular use case, I'm instructing the agent to always notify the user before performing any tool calls. The symptom of this issue is that, when streaming is enabled, the agent stops notifying the user after the 1st tool call because the LLM never sees any prior notifications in the chat history. This doesn't happen when streaming is disabled, since in that scenario the agent's responses are not missing from the chat history.
To Reproduce
This is the minimal script that I'm using to replicate the issue.
"""
Example script demonstrating a bug in google-adk where streamed responses
are not stored in session history.
"""
import asyncio
import os
from google.adk.agents.llm_agent import LlmAgent
from google.adk.agents.run_config import RunConfig, StreamingMode
from google.adk.models.lite_llm import LiteLlm
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types
def get_bullet_points(topic: str) -> str:
"""
Returns bullet points about a given topic (can be "streaming", "llms" or "default").
Args:
topic: The topic to get bullet points for
Returns:
A string with bullet points about the topic
"""
bullet_points = {
"streaming": """
• Real-time user experience - users see responses as they're generated
• Lower latency perception - first content appears quickly, reducing wait time
• Interactive interruption - allows stopping generation before completion
""",
"llms": """
• Powerful language understanding capabilities across diverse domains
• Ability to generate human-like text for various applications
• Contextual reasoning to maintain coherence in long interactions
""",
"default": """
• First generic bullet point about the topic
• Second important fact about the requested subject
• Third key insight related to the query
""",
}
return bullet_points.get(topic.lower(), bullet_points["default"])
async def demonstrate_streaming_history_bug():
"""
Demonstrate the bug where partial responses from streaming
aren't stored in session history when using SSE streaming.
"""
# Instantiate the LLM
os.environ["AWS_REGION_NAME"] = "eu-west-1"
llm = LiteLlm(
model="bedrock/converse/eu.anthropic.claude-3-5-sonnet-20240620-v1:0", temperature=0.0, max_tokens=8192
)
# Create session
app_name = "test_app"
session_id = "test_session"
user_id = "test_user"
session_service = InMemorySessionService()
session_service.create_session(app_name=app_name, user_id=user_id, session_id=session_id)
# Create a runner and agent
agent = LlmAgent(model=llm, name="agent", tools=[get_bullet_points])
runner = Runner(agent=agent, app_name=app_name, session_service=session_service)
# First message - with SSE streaming
print("\n=== First interaction with SSE streaming ===")
run_config = RunConfig(streaming_mode=StreamingMode.SSE)
message = types.Content(
role="user", parts=[types.Part(text="Tell me about the benefits of streaming in LLMs in 3 short bullet points")]
)
# Run with streaming enabled
print("Receiving streamed response:\n")
async for event in runner.run_async(
user_id=user_id, session_id=session_id, new_message=message, run_config=run_config
):
if event.content and event.content.parts:
if event.content.parts[0].text:
print(event.content.parts[0].text, end="", flush=True)
if event.get_function_calls():
print(f"\033[34mtool call: {event.get_function_calls()[0].name}\033[0m")
if event.get_function_responses():
print(f"\033[32mtool response: {event.get_function_responses()[0].name}\033[0m")
# Get the session after streaming
session = session_service.get_session(app_name=runner.app_name, user_id=user_id, session_id=session_id)
# Examine session history
print("\n\n=== Session Events After Streaming ===")
for i, event in enumerate(session.events):
print(f"Event {i}: author={event.author}, partial={event.partial}")
if event.content and event.content.parts:
if event.content.parts[0].text:
print(f" - Content: {event.content.parts[0].text[:50]}...")
if event.get_function_calls():
print(f" - Tool call: {event.get_function_calls()[0].name}")
if event.get_function_responses():
print(f" - Tool response: {event.get_function_responses()[0].name}...")
if __name__ == "__main__":
asyncio.run(demonstrate_streaming_history_bug())
Expected behavior
Currently, the script above produces the following output:
=== First interaction with SSE streaming ===
Receiving streamed response:
Certainly! To provide you with accurate and concise information about the benefits of streaming in LLMs, I'll use the get_bullet_points function to retrieve relevant information. Let me do that for you.tool call: get_bullet_points
tool response: get_bullet_points
Certainly! Here are 3 short bullet points about the benefits of streaming in LLMs:
• Real-time user experience - users see responses as they're generated
• Lower latency perception - first content appears quickly, reducing wait time
• Interactive interruption - allows stopping generation before completion
These points highlight the key advantages of using streaming in Large Language Models, focusing on improved user experience, perceived speed, and interactivity.
Certainly! Here are 3 short bullet points about the benefits of streaming in LLMs:
• Real-time user experience - users see responses as they're generated
• Lower latency perception - first content appears quickly, reducing wait time
• Interactive interruption - allows stopping generation before completion
These points highlight the key advantages of using streaming in Large Language Models, focusing on improved user experience, perceived speed, and interactivity.
=== Session Events After Streaming ===
Event 0: author=user, partial=None
- Content: Tell me about the benefits of streaming in LLMs in...
Event 1: author=agent, partial=False
- Tool call: get_bullet_points
Event 2: author=agent, partial=None
- Tool response: get_bullet_points...
Event 3: author=agent, partial=False
- Content:
Certainly! Here are 3 short bullet points about ...
One can see that the 1st agent response ("Certainly! To provide you with accurate and concise information about the benefits of streaming in LLMs (...)") is missing in the session events being logged after running the agent.
Desktop (please complete the following information):
- OS: Windows 11
- Python version(python -V): 3.12.10
- ADK version(pip show google-adk): 0.4.0
Additional Context
The issue is in lines 198-199 of the runners
module: