LlamaIndex学习3:流式输出

LlamaIndex流式输出

在前两篇文章中,学习了LlamaIndex的基本模型对话和会话状态管理。本篇将聚焦于流式输出(Streaming Output)的实现与应用。流式输出是提升用户体验、实现实时交互的关键能力,尤其适用于对话型AI和长文本生成场景。

1. 流式输出的概念与优势

传统的LLM调用方式通常是一次性返回完整结果,而流式输出则允许模型边生成边输出内容。这种方式带来了多方面的优势:

  • 用户体验提升:显著降低感知延迟,用户能”看到”AI思考的过程
  • 稳定性增强:支持长文本生成,避免超时或内存溢出
  • 交互性增强:便于前端实现打字机效果、实时反馈等交互体验
  • 资源利用优化:可以在内容生成的同时进行处理,减少总体等待时间

LlamaIndex对流式输出提供了良好的支持,底层兼容OpenAI、OpenRouter等主流API的流式接口。

2. 流式输出API解析

LlamaIndex的流式输出主要通过以下几个核心组件实现:

2.1 核心事件类型

from llama_index.core.agent.workflow import (
    AgentInput,       # 输入事件:包含用户输入和系统信息
    AgentOutput,      # 输出事件:包含模型完整响应和工具调用
    ToolCallResult,   # 工具调用结果事件:包含工具的执行结果
    AgentStream,      # 流式输出事件:包含增量文本片段
)

这些事件类型代表着流式输出过程中的不同阶段:

  • AgentInput:代表输入到模型的内容
  • AgentOutput:代表模型生成的完整响应
  • ToolCallResult:代表工具调用的结果
  • AgentStream:代表流式输出片段,其中event.delta属性包含每次传输的文本片段

2.2 异步生成器机制

流式输出本质上是一个异步生成器,使用Python的async for语法逐步获取内容:

handler = workflow.run(user_msg="你的问题")
async for event in handler.stream_events():
    if isinstance(event, AgentStream):
        print(event.delta, end="", flush=True)  # 实时显示增量内容

3. 完整流式输出实现

下面是一个完整的流式输出示例,展示了如何处理不同类型的事件:

from dotenv import load_dotenv
load_dotenv()

from llama_index.llms.openai_like import OpenAILike
from llama_index.core.agent.workflow import AgentWorkflow
import os
import sys
from llama_index.core.agent.workflow import (
    AgentInput,
    AgentOutput,
    ToolCallResult,
    AgentStream,
)

# 导入自定义日志模块
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from utils.logger import get_logger

# 获取日志实例
logger = get_logger()

# 初始化LLM - 这里使用OpenAILike接口调用国内模型
llm = OpenAILike(
    model="Qwen/Qwen2.5-7B-Instruct",
    api_key=os.getenv("SILICONFLOW_API_KEY"),
    api_base=os.getenv("SILICONFLOW_BASE_URL"),
    is_chat_model=True  # 重要参数:确保返回完整内容
)

logger.info("初始化LLM模型完成")

# 创建工作流
workflow = AgentWorkflow.from_tools_or_functions(
    tools_or_functions=[],  # 可以添加工具函数
    llm=llm,
    system_prompt="你是一个智能助手,请热情地回答用户的问题",
    verbose=False
)

logger.info("初始化AgentWorkflow完成")

async def main():
    try:
        logger.info("开始执行用户查询")
        handler = workflow.run(user_msg="怎么把pcm转为MP3?")
        
        # 用于累积流式输出的内容
        accumulated_output = ""
        
        # 处理流式输出事件
        async for event in handler.stream_events():
            # 处理不同类型的事件
            if isinstance(event, AgentStream):
                # 将delta内容追加到累积变量
                if event.delta:
                    accumulated_output += event.delta
                    # 实时在控制台展示当前字符(打字机效果)
                    print(event.delta, end="", flush=True)
                    
            elif isinstance(event, AgentInput):
                # 记录输入信息
                logger.debug("Agent输入: %s", event.input)
                logger.debug("Agent名称: %s", event.current_agent_name)
                
            elif isinstance(event, AgentOutput):
                # 记录完整输出和工具调用
                logger.debug("Agent输出: %s", event.response)
                logger.debug("工具调用: %s", event.tool_calls)
                
            elif isinstance(event, ToolCallResult):
                # 记录工具调用结果
                logger.debug("调用工具: %s", event.tool_name)
                logger.debug("工具参数: %s", event.tool_kwargs)
                logger.debug("工具输出: %s", event.tool_output)
        
        # 获取完整响应
        final_output = str(await handler)
        print('\n\n完整累积输出:', accumulated_output)
        print('最终输出:', final_output)
        logger.info("查询执行完成")
        
    except Exception as e:
        logger.error(f"执行过程中发生错误: {str(e)}")
        raise

if __name__ == "__main__":
    import asyncio
    logger.info("程序启动")
    try:
        asyncio.run(main())
        logger.info("程序正常结束")
    except Exception as e:
        logger.critical(f"程序执行失败: {str(e)}")
        raise

3.1 流程解析

  1. 初始化阶段

    • 加载环境变量配置API密钥
    • 初始化LLM(使用OpenAILike接口调用模型)
    • 设置工作流和系统提示
  2. 流式处理阶段

    • 创建accumulated_output变量累积输出内容
    • 通过async for遍历stream_events()生成的事件
    • 根据事件类型进行不同处理(AgentStream/AgentInput/AgentOutput/ToolCallResult
    • 对于AgentStream事件,将delta内容追加到累积变量
    • 同时使用print(event.delta, end="", flush=True)实现实时打字机效果
  3. 结果获取阶段

    • 使用await handler获取最终完整响应
    • 输出累积的内容和最终结果以便比对

4. 从日志看流式输出的实现原理

通过分析日志文件,我们可以看到流式输出的详细过程:

截取部分日志:

[2025-04-18 15:58:53,981] [INFO] [logger.py:117] - 初始化LLM模型完成
[2025-04-18 15:58:53,981] [INFO] [logger.py:117] - 初始化AgentWorkflow完成
[2025-04-18 15:58:53,981] [INFO] [logger.py:117] - 程序启动
[2025-04-18 15:58:53,982] [INFO] [logger.py:117] - 开始执行用户查询
[2025-04-18 15:58:55,203] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:58:55,205] [DEBUG] [logger.py:106] - input=[ChatMessage(role=<MessageRole.SYSTEM: 'system'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='You are designed to help with a variety of tasks, from answering questions to providing summaries to other types of analyses.\n\n## Tools\n\nYou have access to a wide variety of tools. You are responsible for using the tools in any sequence you deem appropriate to complete the task at hand.\nThis may require breaking the task into subtasks and using different tools to complete each subtask.\n\nYou have access to the following tools:\n\n\n\n## Output Format\n\nPlease answer in the same language as the question and use the following format:\n\n```\nThought: The current language of the user is: (user\'s language). I need to use a tool to help me answer the question.\nAction: tool name (one of ) if using a tool.\nAction Input: the input to the tool, in a JSON format representing the kwargs (e.g. {"input": "hello world", "num_beams": 5})\n```\n\nPlease ALWAYS start with a Thought.\n\nNEVER surround your response with markdown code markers. You may use code markers within your response if you need to.\n\nPlease use a valid JSON format for the Action Input. Do NOT do this {\'input\': \'hello world\', \'num_beams\': 5}.\n\nIf this format is used, the tool will respond in the following format:\n\n```\nObservation: tool response\n```\n\nYou should keep repeating the above format till you have enough information to answer the question without using any more tools. At that point, you MUST respond in one of the following two formats:\n\n```\nThought: I can answer without using any more tools. I\'ll use the user\'s language to answer\nAnswer: [your answer here (In the same language as the user\'s question)]\n```\n\n```\nThought: I cannot answer the question with the provided tools.\nAnswer: [your answer here (In the same language as the user\'s question)]\n```\n\n## Current Conversation\n\nBelow is the current conversation consisting of interleaving human and assistant messages.\n')]), ChatMessage(role=<MessageRole.USER: 'user'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='你每次都要调用工具嘛?可不可以不用工具回答什么是怎么把pcm转为MP3?')])] current_agent_name='Agent'
[2025-04-18 15:58:55,205] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:58:55,205] [DEBUG] [logger.py:106] - --------------这里开始是AgentInput--------------
[2025-04-18 15:58:55,205] [DEBUG] [logger.py:106] - Agent input:  [ChatMessage(role=<MessageRole.SYSTEM: 'system'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='You are designed to help with a variety of tasks, from answering questions to providing summaries to other types of analyses.\n\n## Tools\n\nYou have access to a wide variety of tools. You are responsible for using the tools in any sequence you deem appropriate to complete the task at hand.\nThis may require breaking the task into subtasks and using different tools to complete each subtask.\n\nYou have access to the following tools:\n\n\n\n## Output Format\n\nPlease answer in the same language as the question and use the following format:\n\n```\nThought: The current language of the user is: (user\'s language). I need to use a tool to help me answer the question.\nAction: tool name (one of ) if using a tool.\nAction Input: the input to the tool, in a JSON format representing the kwargs (e.g. {"input": "hello world", "num_beams": 5})\n```\n\nPlease ALWAYS start with a Thought.\n\nNEVER surround your response with markdown code markers. You may use code markers within your response if you need to.\n\nPlease use a valid JSON format for the Action Input. Do NOT do this {\'input\': \'hello world\', \'num_beams\': 5}.\n\nIf this format is used, the tool will respond in the following format:\n\n```\nObservation: tool response\n```\n\nYou should keep repeating the above format till you have enough information to answer the question without using any more tools. At that point, you MUST respond in one of the following two formats:\n\n```\nThought: I can answer without using any more tools. I\'ll use the user\'s language to answer\nAnswer: [your answer here (In the same language as the user\'s question)]\n```\n\n```\nThought: I cannot answer the question with the provided tools.\nAnswer: [your answer here (In the same language as the user\'s question)]\n```\n\n## Current Conversation\n\nBelow is the current conversation consisting of interleaving human and assistant messages.\n')]), ChatMessage(role=<MessageRole.USER: 'user'>, additional_kwargs={}, blocks=[TextBlock(block_type='text', text='你每次都要调用工具嘛?可不可以不用工具回答什么是怎么把pcm转为MP3?')])]
[2025-04-18 15:58:55,205] [DEBUG] [logger.py:106] - Agent name: Agent
[2025-04-18 15:58:55,205] [DEBUG] [logger.py:106] - --------------这里结束AgentInput-------------- 

[2025-04-18 15:59:01,526] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:01,526] [DEBUG] [logger.py:106] - delta='' response='' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '', 'function_call': None, 'refusal': None, 'role': 'assistant', 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 0, 'prompt_tokens': 433, 'total_tokens': 433, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:01,526] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:01,526] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:01,526] [DEBUG] [logger.py:106] - 
[2025-04-18 15:59:01,526] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:01,526] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:01,526] [DEBUG] [logger.py:106] - delta='Thought' response='Thought' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': 'Thought', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 1, 'prompt_tokens': 433, 'total_tokens': 434, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:01,526] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:01,526] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:01,527] [DEBUG] [logger.py:106] - Thought
[2025-04-18 15:59:01,527] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:01,527] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:01,527] [DEBUG] [logger.py:106] - delta=':' response='Thought:' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': ':', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 2, 'prompt_tokens': 433, 'total_tokens': 435, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:01,528] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:01,528] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:01,528] [DEBUG] [logger.py:106] - :
[2025-04-18 15:59:01,528] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:01,528] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:01,528] [DEBUG] [logger.py:106] - delta=' 我' response='Thought: 我' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': ' 我', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 4, 'prompt_tokens': 433, 'total_tokens': 437, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:01,528] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:01,528] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:01,529] [DEBUG] [logger.py:106] -  我
[2025-04-18 15:59:01,529] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:01,597] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:01,597] [DEBUG] [logger.py:106] - delta='需要' response='Thought: 我需要' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '需要', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 5, 'prompt_tokens': 433, 'total_tokens': 438, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:01,597] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:01,597] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:01,597] [DEBUG] [logger.py:106] - 需要
[2025-04-18 15:59:01,597] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:01,644] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:01,644] [DEBUG] [logger.py:106] - delta='使用' response='Thought: 我需要使用' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '使用', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 6, 'prompt_tokens': 433, 'total_tokens': 439, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:01,644] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:01,644] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:01,644] [DEBUG] [logger.py:106] - 使用
[2025-04-18 15:59:01,644] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:01,674] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:01,674] [DEBUG] [logger.py:106] - delta='工具' response='Thought: 我需要使用工具' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '工具', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 7, 'prompt_tokens': 433, 'total_tokens': 440, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:01,674] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:01,674] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:01,675] [DEBUG] [logger.py:106] - 工具
[2025-04-18 15:59:01,675] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:01,809] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:01,809] [DEBUG] [logger.py:106] - delta='来' response='Thought: 我需要使用工具来' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '来', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 8, 'prompt_tokens': 433, 'total_tokens': 441, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:01,809] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:01,809] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:01,809] [DEBUG] [logger.py:106] - 来
[2025-04-18 15:59:01,809] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:01,920] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:01,920] [DEBUG] [logger.py:106] - delta='获取' response='Thought: 我需要使用工具来获取' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '获取', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 9, 'prompt_tokens': 433, 'total_tokens': 442, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:01,920] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:01,921] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:01,921] [DEBUG] [logger.py:106] - 获取
[2025-04-18 15:59:01,921] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:01,961] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:01,961] [DEBUG] [logger.py:106] - delta='关于' response='Thought: 我需要使用工具来获取关于' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '关于', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 10, 'prompt_tokens': 433, 'total_tokens': 443, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:01,961] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:01,961] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:01,961] [DEBUG] [logger.py:106] - 关于
[2025-04-18 15:59:01,961] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,008] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,008] [DEBUG] [logger.py:106] - delta='如何' response='Thought: 我需要使用工具来获取关于如何' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '如何', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 11, 'prompt_tokens': 433, 'total_tokens': 444, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,008] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,008] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,008] [DEBUG] [logger.py:106] - 如何
[2025-04-18 15:59:02,008] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,045] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,045] [DEBUG] [logger.py:106] - delta='将' response='Thought: 我需要使用工具来获取关于如何将' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '将', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 12, 'prompt_tokens': 433, 'total_tokens': 445, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,045] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,045] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,045] [DEBUG] [logger.py:106] - 将
[2025-04-18 15:59:02,045] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,100] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,100] [DEBUG] [logger.py:106] - delta='pcm' response='Thought: 我需要使用工具来获取关于如何将pcm' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': 'pcm', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 13, 'prompt_tokens': 433, 'total_tokens': 446, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,100] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,100] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,100] [DEBUG] [logger.py:106] - pcm
[2025-04-18 15:59:02,100] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,135] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,135] [DEBUG] [logger.py:106] - delta='转换' response='Thought: 我需要使用工具来获取关于如何将pcm转换' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '转换', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 14, 'prompt_tokens': 433, 'total_tokens': 447, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,135] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,135] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,135] [DEBUG] [logger.py:106] - 转换
[2025-04-18 15:59:02,135] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,180] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,180] [DEBUG] [logger.py:106] - delta='为' response='Thought: 我需要使用工具来获取关于如何将pcm转换为' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '为', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 15, 'prompt_tokens': 433, 'total_tokens': 448, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,180] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,180] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,180] [DEBUG] [logger.py:106] - 为
[2025-04-18 15:59:02,181] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,221] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,221] [DEBUG] [logger.py:106] - delta='MP' response='Thought: 我需要使用工具来获取关于如何将pcm转换为MP' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': 'MP', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 16, 'prompt_tokens': 433, 'total_tokens': 449, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,221] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,221] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,221] [DEBUG] [logger.py:106] - MP
[2025-04-18 15:59:02,221] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,270] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,270] [DEBUG] [logger.py:106] - delta='3' response='Thought: 我需要使用工具来获取关于如何将pcm转换为MP3' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '3', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 17, 'prompt_tokens': 433, 'total_tokens': 450, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,270] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,270] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,270] [DEBUG] [logger.py:106] - 3
[2025-04-18 15:59:02,270] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,308] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,308] [DEBUG] [logger.py:106] - delta='的信息' response='Thought: 我需要使用工具来获取关于如何将pcm转换为MP3的信息' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '的信息', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 18, 'prompt_tokens': 433, 'total_tokens': 451, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,308] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,308] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,308] [DEBUG] [logger.py:106] - 的信息
[2025-04-18 15:59:02,308] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,357] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,357] [DEBUG] [logger.py:106] - delta=',' response='Thought: 我需要使用工具来获取关于如何将pcm转换为MP3的信息,' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': ',', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 19, 'prompt_tokens': 433, 'total_tokens': 452, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,357] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,357] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,357] [DEBUG] [logger.py:106] - ,
[2025-04-18 15:59:02,358] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,391] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,391] [DEBUG] [logger.py:106] - delta='然后' response='Thought: 我需要使用工具来获取关于如何将pcm转换为MP3的信息,然后' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '然后', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 20, 'prompt_tokens': 433, 'total_tokens': 453, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,392] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,392] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,392] [DEBUG] [logger.py:106] - 然后
[2025-04-18 15:59:02,392] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,432] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,432] [DEBUG] [logger.py:106] - delta='用' response='Thought: 我需要使用工具来获取关于如何将pcm转换为MP3的信息,然后用' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '用', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 21, 'prompt_tokens': 433, 'total_tokens': 454, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,432] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,432] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,432] [DEBUG] [logger.py:106] - 用
[2025-04-18 15:59:02,432] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,470] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,470] [DEBUG] [logger.py:106] - delta='中文' response='Thought: 我需要使用工具来获取关于如何将pcm转换为MP3的信息,然后用中文' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '中文', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 22, 'prompt_tokens': 433, 'total_tokens': 455, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,470] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,471] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,471] [DEBUG] [logger.py:106] - 中文
[2025-04-18 15:59:02,471] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,512] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,512] [DEBUG] [logger.py:106] - delta='回答' response='Thought: 我需要使用工具来获取关于如何将pcm转换为MP3的信息,然后用中文回答' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '回答', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 23, 'prompt_tokens': 433, 'total_tokens': 456, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,512] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,513] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,513] [DEBUG] [logger.py:106] - 回答
[2025-04-18 15:59:02,513] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,561] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,561] [DEBUG] [logger.py:106] - delta='用户' response='Thought: 我需要使用工具来获取关于如何将pcm转换为MP3的信息,然后用中文回答用户' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '用户', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 24, 'prompt_tokens': 433, 'total_tokens': 457, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,561] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,561] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,561] [DEBUG] [logger.py:106] - 用户
[2025-04-18 15:59:02,561] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,600] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,600] [DEBUG] [logger.py:106] - delta='的问题' response='Thought: 我需要使用工具来获取关于如何将pcm转换为MP3的信息,然后用中文回答用户的问题' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '的问题', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 25, 'prompt_tokens': 433, 'total_tokens': 458, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,600] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,600] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,600] [DEBUG] [logger.py:106] - 的问题
[2025-04-18 15:59:02,600] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

[2025-04-18 15:59:02,632] [DEBUG] [logger.py:106] - --------------这里开始是event--------------
[2025-04-18 15:59:02,632] [DEBUG] [logger.py:106] - delta='。\n' response='Thought: 我需要使用工具来获取关于如何将pcm转换为MP3的信息,然后用中文回答用户的问题。\n' current_agent_name='Agent' tool_calls=[] raw={'id': '019647e8ea426cb39f4fc709d4fb250b', 'choices': [{'delta': {'content': '。\n', 'function_call': None, 'refusal': None, 'role': None, 'tool_calls': None, 'reasoning_content': None}, 'finish_reason': None, 'index': 0, 'logprobs': None}], 'created': 1744963168, 'model': 'Qwen/Qwen2.5-7B-Instruct', 'object': 'chat.completion.chunk', 'service_tier': None, 'system_fingerprint': '', 'usage': {'completion_tokens': 26, 'prompt_tokens': 433, 'total_tokens': 459, 'completion_tokens_details': None, 'prompt_tokens_details': None}}
[2025-04-18 15:59:02,632] [DEBUG] [logger.py:106] - --------------这里结束event-------------- 

[2025-04-18 15:59:02,632] [DEBUG] [logger.py:106] - --------------这里开始是AgentStream--------------
[2025-04-18 15:59:02,633] [DEBUG] [logger.py:106] - 。

[2025-04-18 15:59:02,633] [DEBUG] [logger.py:106] - --------------这里结束AgentStream-------------- 

从日志可以清晰看出:

  1. 逐字传输:模型响应被拆分成最小的字符片段发送
  2. 增量更新:每个AgentStream事件包含一个小的增量片段(delta
  3. 即时展示:接收到的片段立即展示,无需等待全部生成完成

5. 常见问题与解决方案

5.1 输出被截断问题

问题:使用某些模型API时,流式输出内容被截断或不完整。

解决方案

  • 对于OpenAILike接口,必须设置is_chat_model=True
  • 日志示例:
    llm = OpenAILike(
        model="Qwen/Qwen2.5-7B-Instruct",
        api_key=os.getenv("SILICONFLOW_API_KEY"),
        api_base=os.getenv("SILICONFLOW_BASE_URL"),
        is_chat_model=True  # 关键参数:确保内容不被截断
    )

5.2 如何确保获取完整输出

问题:在流式过程中可能漏掉某些片段,导致最终拼接的文本不完整。

解决方案

  • 使用累积变量收集所有delta片段
  • 还可以使用await handler获取最终完整响应进行比对
  • 日志示例:
    accumulated_output = ""
    async for event in handler.stream_events():
        if isinstance(event, AgentStream) and event.delta:
            accumulated_output += event.delta
            
    # 获取最终完整响应进行比对
    final_output = str(await handler)
    print(f"累积内容长度: {len(accumulated_output)}")
    print(f"最终响应长度: {len(final_output)}")

6. 流式输出的应用场景

6.1 Web应用中的实时显示

使用WebSocket将流式输出实时推送到前端,实现打字机效果:

# 后端代码
async def stream_to_websocket(websocket, query):
    handler = workflow.run(user_msg=query)
    
    async for event in handler.stream_events():
        if isinstance(event, AgentStream) and event.delta:
            # 将片段实时发送到WebSocket
            await websocket.send_text(event.delta)
// 前端代码
const socket = new WebSocket('ws://example.com/stream');
let outputElement = document.getElementById('output');

socket.onmessage = function(event) {
    // 添加新内容到输出元素,实现打字机效果
    outputElement.textContent += event.data;
};

6.2 结合状态管理的实时对话

将流式输出与会话状态管理结合,实现带有记忆的实时对话体验:

from llama_index.core.workflow import Context

async def stateful_streaming_chat(query, ctx=None):
    # 如果没有上下文,创建新的上下文
    if ctx is None:
        ctx = Context(workflow)
    
    # 使用上下文执行流式输出
    handler = workflow.run(user_msg=query, ctx=ctx)
    
    accumulated_output = ""
    async for event in handler.stream_events():
        if isinstance(event, AgentStream) and event.delta:
            accumulated_output += event.delta
            yield event.delta  # 生成器逐字返回
    
    # 返回累积输出和更新后的上下文
    return accumulated_output, ctx

使用示例:

ctx = None  # 初始上下文为空

# 第一轮对话
async for token in stateful_streaming_chat("你好,我叫张三"):
    print(token, end="", flush=True)
result, ctx = await stateful_streaming_chat("你好,我叫张三", ctx)

# 第二轮对话(带有上下文)
async for token in stateful_streaming_chat("我叫什么名字?", ctx):
    print(token, end="", flush=True)

7. 最佳实践与建议

  1. 异步处理:流式输出本质上是异步操作,确保正确使用async/await语法

    async def process_stream():
        async for event in handler.stream_events():
            # 处理事件
    
    # 启动异步函数
    asyncio.run(process_stream())
  2. 异常处理:添加完善的异常捕获机制,防止流式过程中断

    try:
        async for event in handler.stream_events():
            # 处理事件
    except Exception as e:
        logger.error(f"流式处理出错: {str(e)}")
        # 优雅处理错误
  3. 模型选择:不同模型的流式输出性能有差异,测试多个模型以选择最适合的

    # 测试不同模型的流式输出延迟和质量
    models = [
        "Qwen/Qwen2.5-7B-Instruct",
        "deepseek/deepseek-chat-v3-0324" 
    ]
    
    for model_name in models:
        # 测试每个模型的流式输出延迟和质量
  4. 增量处理:在收到增量内容时可以立即开始处理,不必等待全部生成完成

    async for event in handler.stream_events():
        if isinstance(event, AgentStream) and event.delta:
            # 实时处理每个片段,如情感分析、关键词提取等
            process_incremental_content(event.delta)

以上是LlamaIndex的流式输出的基础能力和尝试步骤。