Incomplete chat history when streaming is enabled

**Describe the bug**
When streaming mode is enabled, events with `partial` set to `True` are not stored in the session events list. This events list is then used to build the message sequence on future calls to the LLM. Therefore, the LLM will see an incomplete chat history, which can severely impact the quality of its responses.
In my particular use case, I'm instructing the agent to always notify the user before performing any tool calls. The symptom of this issue is that, when streaming is enabled, the agent stops notifying the user after the 1st tool call because the LLM never sees any prior notifications in the chat history. This doesn't happen when streaming is disabled, since in that scenario the agent's responses are not missing from the chat history.

**To Reproduce**
This is the minimal script that I'm using to replicate the issue.
```
"""
Example script demonstrating a bug in google-adk where streamed responses
are not stored in session history.
"""

import asyncio
import os

from google.adk.agents.llm_agent import LlmAgent
from google.adk.agents.run_config import RunConfig, StreamingMode
from google.adk.models.lite_llm import LiteLlm
from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService
from google.genai import types


def get_bullet_points(topic: str) -> str:
    """
    Returns bullet points about a given topic (can be "streaming", "llms" or "default").

    Args:
        topic: The topic to get bullet points for

    Returns:
        A string with bullet points about the topic
    """
    bullet_points = {
        "streaming": """
• Real-time user experience - users see responses as they're generated
• Lower latency perception - first content appears quickly, reducing wait time
• Interactive interruption - allows stopping generation before completion
""",
        "llms": """
• Powerful language understanding capabilities across diverse domains
• Ability to generate human-like text for various applications
• Contextual reasoning to maintain coherence in long interactions
""",
        "default": """
• First generic bullet point about the topic
• Second important fact about the requested subject
• Third key insight related to the query
""",
    }
    return bullet_points.get(topic.lower(), bullet_points["default"])


async def demonstrate_streaming_history_bug():
    """
    Demonstrate the bug where partial responses from streaming
    aren't stored in session history when using SSE streaming.
    """
    # Instantiate the LLM
    os.environ["AWS_REGION_NAME"] = "eu-west-1"
    llm = LiteLlm(
        model="bedrock/converse/eu.anthropic.claude-3-5-sonnet-20240620-v1:0", temperature=0.0, max_tokens=8192
    )

    # Create session
    app_name = "test_app"
    session_id = "test_session"
    user_id = "test_user"
    session_service = InMemorySessionService()
    session_service.create_session(app_name=app_name, user_id=user_id, session_id=session_id)

    # Create a runner and agent
    agent = LlmAgent(model=llm, name="agent", tools=[get_bullet_points])
    runner = Runner(agent=agent, app_name=app_name, session_service=session_service)

    # First message - with SSE streaming
    print("\n=== First interaction with SSE streaming ===")
    run_config = RunConfig(streaming_mode=StreamingMode.SSE)

    message = types.Content(
        role="user", parts=[types.Part(text="Tell me about the benefits of streaming in LLMs in 3 short bullet points")]
    )

    # Run with streaming enabled
    print("Receiving streamed response:\n")
    async for event in runner.run_async(
        user_id=user_id, session_id=session_id, new_message=message, run_config=run_config
    ):
        if event.content and event.content.parts:
            if event.content.parts[0].text:
                print(event.content.parts[0].text, end="", flush=True)
            if event.get_function_calls():
                print(f"\033[34mtool call: {event.get_function_calls()[0].name}\033[0m")
            if event.get_function_responses():
                print(f"\033[32mtool response: {event.get_function_responses()[0].name}\033[0m")

    # Get the session after streaming
    session = session_service.get_session(app_name=runner.app_name, user_id=user_id, session_id=session_id)

    # Examine session history
    print("\n\n=== Session Events After Streaming ===")
    for i, event in enumerate(session.events):
        print(f"Event {i}: author={event.author}, partial={event.partial}")
        if event.content and event.content.parts:
            if event.content.parts[0].text:
                print(f" - Content: {event.content.parts[0].text[:50]}...")
            if event.get_function_calls():
                print(f" - Tool call: {event.get_function_calls()[0].name}")
            if event.get_function_responses():
                print(f" - Tool response: {event.get_function_responses()[0].name}...")


if __name__ == "__main__":
    asyncio.run(demonstrate_streaming_history_bug())
```

**Expected behavior**
Currently, the script above produces the following output:
```
=== First interaction with SSE streaming ===
Receiving streamed response:

Certainly! To provide you with accurate and concise information about the benefits of streaming in LLMs, I'll use the get_bullet_points function to retrieve relevant information. Let me do that for you.tool call: get_bullet_points
tool response: get_bullet_points


Certainly! Here are 3 short bullet points about the benefits of streaming in LLMs:

• Real-time user experience - users see responses as they're generated
• Lower latency perception - first content appears quickly, reducing wait time
• Interactive interruption - allows stopping generation before completion

These points highlight the key advantages of using streaming in Large Language Models, focusing on improved user experience, perceived speed, and interactivity.

Certainly! Here are 3 short bullet points about the benefits of streaming in LLMs:

• Real-time user experience - users see responses as they're generated
• Lower latency perception - first content appears quickly, reducing wait time
• Interactive interruption - allows stopping generation before completion

These points highlight the key advantages of using streaming in Large Language Models, focusing on improved user experience, perceived speed, and interactivity.

=== Session Events After Streaming ===
Event 0: author=user, partial=None
 - Content: Tell me about the benefits of streaming in LLMs in...
Event 1: author=agent, partial=False
 - Tool call: get_bullet_points
Event 2: author=agent, partial=None
 - Tool response: get_bullet_points...
Event 3: author=agent, partial=False
 - Content: 

Certainly! Here are 3 short bullet points about ...
```

One can see that the 1st agent response ("Certainly! To provide you with accurate and concise information about the benefits of streaming in LLMs (...)") is missing in the session events being logged after running the agent.

**Desktop (please complete the following information):**
 - OS: Windows 11
 - Python version(python -V): 3.12.10
 - ADK version(pip show google-adk): 0.4.0

**Additional Context**
The issue is in lines 198-199 of the `runners` module:
![Image](https://github.com/user-attachments/assets/f79ebab1-9c99-42cb-bce3-9dde40f16f55)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incomplete chat history when streaming is enabled #627

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incomplete chat history when streaming is enabled #627

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions