1091 lines
28 KiB
Markdown
1091 lines
28 KiB
Markdown
|
|
# MineNASAI - PoC 验证计划
|
|||
|
|
|
|||
|
|
**创建日期**: 2025-02-04
|
|||
|
|
**目标**: 验证3个关键技术的可行性
|
|||
|
|
**预计时间**: 2-3天
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 验证目标
|
|||
|
|
|
|||
|
|
### 为什么需要 PoC?
|
|||
|
|
|
|||
|
|
在投入大量时间开发之前,我们需要验证以下关键技术的可行性:
|
|||
|
|
|
|||
|
|
1. **Claude Code CLI 集成** (最高优先级)
|
|||
|
|
- 风险:子进程管理、输出解析复杂
|
|||
|
|
- 影响:这是核心功能,如果不可行需要重新设计
|
|||
|
|
|
|||
|
|
2. **智能路由算法**
|
|||
|
|
- 风险:路由准确度不足
|
|||
|
|
- 影响:影响用户体验和资源利用率
|
|||
|
|
|
|||
|
|
3. **MCP Server 加载**
|
|||
|
|
- 风险:MCP协议不熟悉、进程通信复杂
|
|||
|
|
- 影响:工具扩展能力的基础
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## PoC 1: Claude Code CLI 集成
|
|||
|
|
|
|||
|
|
### 目标
|
|||
|
|
验证可以通过 Python 子进程调用 Claude Code CLI,并正确解析其输出。
|
|||
|
|
|
|||
|
|
### 验证内容
|
|||
|
|
- [ ] 子进程启动和管理
|
|||
|
|
- [ ] 实时输出流捕获
|
|||
|
|
- [ ] ANSI 转义序列处理
|
|||
|
|
- [ ] 交互式输入处理
|
|||
|
|
- [ ] 超时和资源限制
|
|||
|
|
- [ ] 错误处理和恢复
|
|||
|
|
|
|||
|
|
### 实施步骤
|
|||
|
|
|
|||
|
|
#### 步骤1: 环境准备
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 1. 确保已安装 Claude Code CLI
|
|||
|
|
# 如果没有,访问: https://docs.anthropic.com/claude/docs/claude-code
|
|||
|
|
|
|||
|
|
# 2. 验证 CLI 可用
|
|||
|
|
claude --version
|
|||
|
|
|
|||
|
|
# 3. 测试基本调用
|
|||
|
|
claude "print hello world in python"
|
|||
|
|
|
|||
|
|
# 4. 创建 PoC 工作目录
|
|||
|
|
mkdir -p poc/claude_cli_test
|
|||
|
|
cd poc/claude_cli_test
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 步骤2: 基础子进程调用 (poc_1_basic.py)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
"""PoC 1.1: Basic Claude CLI subprocess call."""
|
|||
|
|
import subprocess
|
|||
|
|
import sys
|
|||
|
|
|
|||
|
|
def test_basic_call():
|
|||
|
|
"""测试基本的 CLI 调用."""
|
|||
|
|
print("=== PoC 1.1: Basic CLI Call ===\n")
|
|||
|
|
|
|||
|
|
# 简单命令
|
|||
|
|
cmd = ["claude", "print hello world in python"]
|
|||
|
|
|
|||
|
|
print(f"执行命令: {' '.join(cmd)}")
|
|||
|
|
print("-" * 60)
|
|||
|
|
|
|||
|
|
try:
|
|||
|
|
# 使用 subprocess.run
|
|||
|
|
result = subprocess.run(
|
|||
|
|
cmd,
|
|||
|
|
capture_output=True,
|
|||
|
|
text=True,
|
|||
|
|
timeout=30,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
print("STDOUT:")
|
|||
|
|
print(result.stdout)
|
|||
|
|
print("\nSTDERR:")
|
|||
|
|
print(result.stderr)
|
|||
|
|
print(f"\n退出码: {result.returncode}")
|
|||
|
|
|
|||
|
|
if result.returncode == 0:
|
|||
|
|
print("\n✅ 基础调用成功!")
|
|||
|
|
return True
|
|||
|
|
else:
|
|||
|
|
print("\n❌ 命令执行失败")
|
|||
|
|
return False
|
|||
|
|
|
|||
|
|
except subprocess.TimeoutExpired:
|
|||
|
|
print("\n❌ 命令超时")
|
|||
|
|
return False
|
|||
|
|
except Exception as e:
|
|||
|
|
print(f"\n❌ 异常: {e}")
|
|||
|
|
return False
|
|||
|
|
|
|||
|
|
|
|||
|
|
if __name__ == "__main__":
|
|||
|
|
success = test_basic_call()
|
|||
|
|
sys.exit(0 if success else 1)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**验收标准**:
|
|||
|
|
- [ ] 脚本可以成功调用 Claude CLI
|
|||
|
|
- [ ] 可以捕获完整输出
|
|||
|
|
- [ ] 超时机制生效
|
|||
|
|
|
|||
|
|
**运行测试**:
|
|||
|
|
```bash
|
|||
|
|
python poc_1_basic.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
#### 步骤3: 实时输出流捕获 (poc_1_streaming.py)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
"""PoC 1.2: Real-time output streaming."""
|
|||
|
|
import subprocess
|
|||
|
|
import sys
|
|||
|
|
import select
|
|||
|
|
import os
|
|||
|
|
|
|||
|
|
|
|||
|
|
def test_streaming_output():
|
|||
|
|
"""测试实时输出流捕获."""
|
|||
|
|
print("=== PoC 1.2: Streaming Output ===\n")
|
|||
|
|
|
|||
|
|
cmd = ["claude", "count to 10 with 1 second delays in python"]
|
|||
|
|
|
|||
|
|
print(f"执行命令: {' '.join(cmd)}")
|
|||
|
|
print("-" * 60)
|
|||
|
|
|
|||
|
|
try:
|
|||
|
|
# 使用 Popen 进行流式输出
|
|||
|
|
process = subprocess.Popen(
|
|||
|
|
cmd,
|
|||
|
|
stdout=subprocess.PIPE,
|
|||
|
|
stderr=subprocess.PIPE,
|
|||
|
|
text=True,
|
|||
|
|
bufsize=1, # 行缓冲
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
print("实时输出:")
|
|||
|
|
|
|||
|
|
# 实时读取输出
|
|||
|
|
while True:
|
|||
|
|
# 检查进程是否结束
|
|||
|
|
if process.poll() is not None:
|
|||
|
|
break
|
|||
|
|
|
|||
|
|
# 读取一行输出
|
|||
|
|
line = process.stdout.readline()
|
|||
|
|
if line:
|
|||
|
|
print(f"[OUT] {line.rstrip()}")
|
|||
|
|
|
|||
|
|
# 读取错误输出
|
|||
|
|
# 注意: stderr 需要非阻塞读取或使用 select
|
|||
|
|
|
|||
|
|
# 获取剩余输出
|
|||
|
|
remaining_out, remaining_err = process.communicate()
|
|||
|
|
if remaining_out:
|
|||
|
|
print(remaining_out)
|
|||
|
|
|
|||
|
|
print(f"\n退出码: {process.returncode}")
|
|||
|
|
|
|||
|
|
if process.returncode == 0:
|
|||
|
|
print("\n✅ 流式输出捕获成功!")
|
|||
|
|
return True
|
|||
|
|
else:
|
|||
|
|
print("\n❌ 进程执行失败")
|
|||
|
|
return False
|
|||
|
|
|
|||
|
|
except Exception as e:
|
|||
|
|
print(f"\n❌ 异常: {e}")
|
|||
|
|
return False
|
|||
|
|
|
|||
|
|
|
|||
|
|
if __name__ == "__main__":
|
|||
|
|
success = test_streaming_output()
|
|||
|
|
sys.exit(0 if success else 1)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**验收标准**:
|
|||
|
|
- [ ] 可以实时捕获输出(不是等待结束)
|
|||
|
|
- [ ] 输出不会丢失
|
|||
|
|
- [ ] 可以正确检测进程结束
|
|||
|
|
|
|||
|
|
**运行测试**:
|
|||
|
|
```bash
|
|||
|
|
python poc_1_streaming.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
#### 步骤4: PTY 伪终端处理 (poc_1_pty.py)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
"""PoC 1.3: PTY pseudo-terminal for interactive handling."""
|
|||
|
|
import pty
|
|||
|
|
import os
|
|||
|
|
import sys
|
|||
|
|
import select
|
|||
|
|
import subprocess
|
|||
|
|
|
|||
|
|
|
|||
|
|
def test_pty_terminal():
|
|||
|
|
"""测试 PTY 伪终端处理."""
|
|||
|
|
print("=== PoC 1.3: PTY Terminal ===\n")
|
|||
|
|
|
|||
|
|
cmd = ["claude", "create a simple python calculator"]
|
|||
|
|
|
|||
|
|
print(f"执行命令: {' '.join(cmd)}")
|
|||
|
|
print("-" * 60)
|
|||
|
|
|
|||
|
|
try:
|
|||
|
|
# 创建 PTY
|
|||
|
|
master, slave = pty.openpty()
|
|||
|
|
|
|||
|
|
# 启动进程
|
|||
|
|
process = subprocess.Popen(
|
|||
|
|
cmd,
|
|||
|
|
stdin=slave,
|
|||
|
|
stdout=slave,
|
|||
|
|
stderr=slave,
|
|||
|
|
close_fds=True,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
os.close(slave) # 子进程已继承,关闭父进程的副本
|
|||
|
|
|
|||
|
|
print("实时输出 (PTY):")
|
|||
|
|
output_buffer = []
|
|||
|
|
|
|||
|
|
# 读取输出
|
|||
|
|
while True:
|
|||
|
|
# 检查进程是否结束
|
|||
|
|
if process.poll() is not None:
|
|||
|
|
break
|
|||
|
|
|
|||
|
|
# 使用 select 进行非阻塞读取
|
|||
|
|
r, w, e = select.select([master], [], [], 0.1)
|
|||
|
|
|
|||
|
|
if master in r:
|
|||
|
|
try:
|
|||
|
|
data = os.read(master, 1024)
|
|||
|
|
if data:
|
|||
|
|
text = data.decode('utf-8', errors='replace')
|
|||
|
|
print(text, end='')
|
|||
|
|
output_buffer.append(text)
|
|||
|
|
except OSError:
|
|||
|
|
break
|
|||
|
|
|
|||
|
|
os.close(master)
|
|||
|
|
|
|||
|
|
print(f"\n\n退出码: {process.returncode}")
|
|||
|
|
print(f"总输出长度: {len(''.join(output_buffer))} 字符")
|
|||
|
|
|
|||
|
|
if process.returncode == 0:
|
|||
|
|
print("\n✅ PTY 终端处理成功!")
|
|||
|
|
return True
|
|||
|
|
else:
|
|||
|
|
print("\n❌ 进程执行失败")
|
|||
|
|
return False
|
|||
|
|
|
|||
|
|
except Exception as e:
|
|||
|
|
print(f"\n❌ 异常: {e}")
|
|||
|
|
import traceback
|
|||
|
|
traceback.print_exc()
|
|||
|
|
return False
|
|||
|
|
|
|||
|
|
|
|||
|
|
if __name__ == "__main__":
|
|||
|
|
success = test_pty_terminal()
|
|||
|
|
sys.exit(0 if success else 1)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**验收标准**:
|
|||
|
|
- [ ] PTY 可以正确创建
|
|||
|
|
- [ ] 可以处理交互式输出
|
|||
|
|
- [ ] ANSI 转义序列正确显示
|
|||
|
|
|
|||
|
|
**运行测试**:
|
|||
|
|
```bash
|
|||
|
|
python poc_1_pty.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
#### 步骤5: 完整集成示例 (poc_1_full.py)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
"""PoC 1.4: Full Claude CLI integration."""
|
|||
|
|
import asyncio
|
|||
|
|
import pty
|
|||
|
|
import os
|
|||
|
|
import select
|
|||
|
|
import subprocess
|
|||
|
|
from dataclasses import dataclass
|
|||
|
|
from typing import Optional
|
|||
|
|
|
|||
|
|
|
|||
|
|
@dataclass
|
|||
|
|
class CLIResult:
|
|||
|
|
"""CLI execution result."""
|
|||
|
|
exit_code: int
|
|||
|
|
output: str
|
|||
|
|
error: Optional[str] = None
|
|||
|
|
duration_ms: int = 0
|
|||
|
|
|
|||
|
|
|
|||
|
|
class ClaudeCLI:
|
|||
|
|
"""Claude CLI wrapper."""
|
|||
|
|
|
|||
|
|
def __init__(self, timeout: int = 300):
|
|||
|
|
"""Initialize CLI wrapper."""
|
|||
|
|
self.timeout = timeout
|
|||
|
|
|
|||
|
|
async def execute(self, prompt: str, workspace: str = ".") -> CLIResult:
|
|||
|
|
"""
|
|||
|
|
Execute Claude CLI command.
|
|||
|
|
|
|||
|
|
Args:
|
|||
|
|
prompt: 任务提示
|
|||
|
|
workspace: 工作目录
|
|||
|
|
|
|||
|
|
Returns:
|
|||
|
|
CLIResult with output
|
|||
|
|
"""
|
|||
|
|
import time
|
|||
|
|
start_time = time.time()
|
|||
|
|
|
|||
|
|
cmd = ["claude", prompt]
|
|||
|
|
|
|||
|
|
try:
|
|||
|
|
# 创建 PTY
|
|||
|
|
master, slave = pty.openpty()
|
|||
|
|
|
|||
|
|
# 启动进程
|
|||
|
|
process = subprocess.Popen(
|
|||
|
|
cmd,
|
|||
|
|
stdin=slave,
|
|||
|
|
stdout=slave,
|
|||
|
|
stderr=slave,
|
|||
|
|
cwd=workspace,
|
|||
|
|
close_fds=True,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
os.close(slave)
|
|||
|
|
|
|||
|
|
# 收集输出
|
|||
|
|
output_buffer = []
|
|||
|
|
|
|||
|
|
# 读取输出 (带超时)
|
|||
|
|
timeout_time = start_time + self.timeout
|
|||
|
|
|
|||
|
|
while True:
|
|||
|
|
# 检查超时
|
|||
|
|
if time.time() > timeout_time:
|
|||
|
|
process.kill()
|
|||
|
|
raise TimeoutError(f"Command timeout after {self.timeout}s")
|
|||
|
|
|
|||
|
|
# 检查进程是否结束
|
|||
|
|
if process.poll() is not None:
|
|||
|
|
break
|
|||
|
|
|
|||
|
|
# 非阻塞读取
|
|||
|
|
r, w, e = select.select([master], [], [], 0.1)
|
|||
|
|
|
|||
|
|
if master in r:
|
|||
|
|
try:
|
|||
|
|
data = os.read(master, 4096)
|
|||
|
|
if data:
|
|||
|
|
text = data.decode('utf-8', errors='replace')
|
|||
|
|
output_buffer.append(text)
|
|||
|
|
except OSError:
|
|||
|
|
break
|
|||
|
|
|
|||
|
|
os.close(master)
|
|||
|
|
|
|||
|
|
duration_ms = int((time.time() - start_time) * 1000)
|
|||
|
|
output = ''.join(output_buffer)
|
|||
|
|
|
|||
|
|
return CLIResult(
|
|||
|
|
exit_code=process.returncode,
|
|||
|
|
output=output,
|
|||
|
|
duration_ms=duration_ms,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
except TimeoutError as e:
|
|||
|
|
return CLIResult(
|
|||
|
|
exit_code=-1,
|
|||
|
|
output=''.join(output_buffer),
|
|||
|
|
error=str(e),
|
|||
|
|
duration_ms=int((time.time() - start_time) * 1000),
|
|||
|
|
)
|
|||
|
|
except Exception as e:
|
|||
|
|
return CLIResult(
|
|||
|
|
exit_code=-1,
|
|||
|
|
output='',
|
|||
|
|
error=str(e),
|
|||
|
|
duration_ms=int((time.time() - start_time) * 1000),
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
|
|||
|
|
async def test_full_integration():
|
|||
|
|
"""测试完整集成."""
|
|||
|
|
print("=== PoC 1.4: Full Integration ===\n")
|
|||
|
|
|
|||
|
|
cli = ClaudeCLI(timeout=60)
|
|||
|
|
|
|||
|
|
# 测试用例1: 简单任务
|
|||
|
|
print("测试1: 简单Python脚本")
|
|||
|
|
print("-" * 60)
|
|||
|
|
|
|||
|
|
result = await cli.execute("create a hello world python script")
|
|||
|
|
|
|||
|
|
print(f"输出:\n{result.output}")
|
|||
|
|
print(f"\n退出码: {result.exit_code}")
|
|||
|
|
print(f"耗时: {result.duration_ms}ms")
|
|||
|
|
|
|||
|
|
if result.exit_code == 0:
|
|||
|
|
print("✅ 测试1通过")
|
|||
|
|
else:
|
|||
|
|
print(f"❌ 测试1失败: {result.error}")
|
|||
|
|
return False
|
|||
|
|
|
|||
|
|
# 测试用例2: 需要多步骤的任务
|
|||
|
|
print("\n\n测试2: 多步骤任务")
|
|||
|
|
print("-" * 60)
|
|||
|
|
|
|||
|
|
result = await cli.execute(
|
|||
|
|
"create a simple REST API with FastAPI that has a hello endpoint"
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
print(f"输出:\n{result.output[:500]}...") # 只显示前500字符
|
|||
|
|
print(f"\n退出码: {result.exit_code}")
|
|||
|
|
print(f"耗时: {result.duration_ms}ms")
|
|||
|
|
|
|||
|
|
if result.exit_code == 0:
|
|||
|
|
print("✅ 测试2通过")
|
|||
|
|
else:
|
|||
|
|
print(f"❌ 测试2失败: {result.error}")
|
|||
|
|
return False
|
|||
|
|
|
|||
|
|
print("\n" + "=" * 60)
|
|||
|
|
print("✅ 完整集成验证通过!")
|
|||
|
|
print("=" * 60)
|
|||
|
|
|
|||
|
|
return True
|
|||
|
|
|
|||
|
|
|
|||
|
|
if __name__ == "__main__":
|
|||
|
|
success = asyncio.run(test_full_integration())
|
|||
|
|
import sys
|
|||
|
|
sys.exit(0 if success else 1)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**验收标准**:
|
|||
|
|
- [ ] 封装类可以正常工作
|
|||
|
|
- [ ] 支持异步调用
|
|||
|
|
- [ ] 超时控制有效
|
|||
|
|
- [ ] 错误处理完善
|
|||
|
|
- [ ] 可以处理简单和复杂任务
|
|||
|
|
|
|||
|
|
**运行测试**:
|
|||
|
|
```bash
|
|||
|
|
python poc_1_full.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### PoC 1 总结
|
|||
|
|
|
|||
|
|
完成上述5个步骤后,填写以下清单:
|
|||
|
|
|
|||
|
|
**验证结果**:
|
|||
|
|
- [ ] 基础调用可行
|
|||
|
|
- [ ] 流式输出捕获可行
|
|||
|
|
- [ ] PTY终端处理可行
|
|||
|
|
- [ ] 完整集成验证通过
|
|||
|
|
- [ ] 性能可接受 (大多数任务 < 1分钟)
|
|||
|
|
|
|||
|
|
**发现的问题**:
|
|||
|
|
1.
|
|||
|
|
2.
|
|||
|
|
3.
|
|||
|
|
|
|||
|
|
**需要的改进**:
|
|||
|
|
1.
|
|||
|
|
2.
|
|||
|
|
3.
|
|||
|
|
|
|||
|
|
**结论**:
|
|||
|
|
- [ ] ✅ 可行,建议使用 PTY + asyncio 方案
|
|||
|
|
- [ ] ⚠️ 部分可行,需要调整设计
|
|||
|
|
- [ ] ❌ 不可行,需要备选方案
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## PoC 2: 智能路由算法
|
|||
|
|
|
|||
|
|
### 目标
|
|||
|
|
验证路由算法可以合理地将任务分类为 fast/medium/deep 三种模式。
|
|||
|
|
|
|||
|
|
### 验证内容
|
|||
|
|
- [ ] 启发式规则准确度
|
|||
|
|
- [ ] LLM-based 路由可行性
|
|||
|
|
- [ ] 用户指令覆盖
|
|||
|
|
- [ ] 路由性能 (响应时间)
|
|||
|
|
|
|||
|
|
### 实施步骤
|
|||
|
|
|
|||
|
|
#### 步骤1: 启发式规则测试 (poc_2_heuristic.py)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
"""PoC 2.1: Heuristic routing rules."""
|
|||
|
|
from dataclasses import dataclass
|
|||
|
|
from typing import Literal
|
|||
|
|
|
|||
|
|
|
|||
|
|
@dataclass
|
|||
|
|
class RoutingDecision:
|
|||
|
|
"""Routing decision."""
|
|||
|
|
mode: Literal["fast", "medium", "deep"]
|
|||
|
|
reason: str
|
|||
|
|
confidence: float # 0.0 - 1.0
|
|||
|
|
|
|||
|
|
|
|||
|
|
class HeuristicRouter:
|
|||
|
|
"""Heuristic-based router."""
|
|||
|
|
|
|||
|
|
def __init__(self):
|
|||
|
|
"""Initialize rules."""
|
|||
|
|
self.rules = {
|
|||
|
|
"fast_keywords": ["状态", "查询", "搜索", "是什么", "?", "?"],
|
|||
|
|
"deep_keywords": ["实现", "开发", "编写", "重构", "优化", "设计", "创建"],
|
|||
|
|
"medium_keywords": ["修改", "更新", "添加", "删除"],
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
def evaluate(self, message: str) -> RoutingDecision:
|
|||
|
|
"""Evaluate routing decision."""
|
|||
|
|
message_lower = message.lower()
|
|||
|
|
length = len(message)
|
|||
|
|
|
|||
|
|
# 规则1: 短消息 + 查询关键词 -> fast
|
|||
|
|
if length < 50 and any(kw in message_lower for kw in self.rules["fast_keywords"]):
|
|||
|
|
return RoutingDecision(
|
|||
|
|
mode="fast",
|
|||
|
|
reason="短消息且包含查询关键词",
|
|||
|
|
confidence=0.9,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 规则2: 包含开发关键词 -> deep
|
|||
|
|
deep_count = sum(1 for kw in self.rules["deep_keywords"] if kw in message_lower)
|
|||
|
|
if deep_count >= 1:
|
|||
|
|
return RoutingDecision(
|
|||
|
|
mode="deep",
|
|||
|
|
reason=f"包含 {deep_count} 个开发关键词",
|
|||
|
|
confidence=0.7 + min(deep_count * 0.1, 0.2),
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 规则3: 中等长度 + 修改关键词 -> medium
|
|||
|
|
if 50 <= length <= 200 and any(kw in message_lower for kw in self.rules["medium_keywords"]):
|
|||
|
|
return RoutingDecision(
|
|||
|
|
mode="medium",
|
|||
|
|
reason="中等长度且包含修改关键词",
|
|||
|
|
confidence=0.6,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 规则4: 长消息 -> medium/deep
|
|||
|
|
if length > 200:
|
|||
|
|
return RoutingDecision(
|
|||
|
|
mode="medium",
|
|||
|
|
reason="消息较长,可能需要多步处理",
|
|||
|
|
confidence=0.5,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 默认: medium
|
|||
|
|
return RoutingDecision(
|
|||
|
|
mode="medium",
|
|||
|
|
reason="默认中等复杂度",
|
|||
|
|
confidence=0.4,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
|
|||
|
|
def test_heuristic_router():
|
|||
|
|
"""测试启发式路由."""
|
|||
|
|
print("=== PoC 2.1: Heuristic Router ===\n")
|
|||
|
|
|
|||
|
|
router = HeuristicRouter()
|
|||
|
|
|
|||
|
|
# 测试用例
|
|||
|
|
test_cases = [
|
|||
|
|
("NAS状态?", "fast"),
|
|||
|
|
("搜索最新的Python教程", "fast"),
|
|||
|
|
("实现一个Web服务", "deep"),
|
|||
|
|
("重构这个模块", "deep"),
|
|||
|
|
("修改配置文件中的端口", "medium"),
|
|||
|
|
("添加一个新的API端点", "medium"),
|
|||
|
|
("这是一段很长的描述,我需要你帮我分析一下这个系统的架构,然后给出优化建议,包括性能、安全性和可维护性等多个方面", "medium"),
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
correct = 0
|
|||
|
|
total = len(test_cases)
|
|||
|
|
|
|||
|
|
print(f"测试用例数: {total}\n")
|
|||
|
|
|
|||
|
|
for i, (message, expected) in enumerate(test_cases, 1):
|
|||
|
|
decision = router.evaluate(message)
|
|||
|
|
is_correct = decision.mode == expected
|
|||
|
|
correct += is_correct
|
|||
|
|
|
|||
|
|
status = "✅" if is_correct else "❌"
|
|||
|
|
print(f"{status} 用例 {i}:")
|
|||
|
|
print(f" 消息: {message}")
|
|||
|
|
print(f" 预期: {expected}")
|
|||
|
|
print(f" 实际: {decision.mode} (置信度: {decision.confidence:.2f})")
|
|||
|
|
print(f" 理由: {decision.reason}")
|
|||
|
|
print()
|
|||
|
|
|
|||
|
|
accuracy = correct / total
|
|||
|
|
print("=" * 60)
|
|||
|
|
print(f"准确率: {accuracy:.1%} ({correct}/{total})")
|
|||
|
|
print("=" * 60)
|
|||
|
|
|
|||
|
|
if accuracy >= 0.7:
|
|||
|
|
print("\n✅ 启发式规则准确度可接受 (>= 70%)")
|
|||
|
|
return True
|
|||
|
|
else:
|
|||
|
|
print(f"\n⚠️ 准确度较低 ({accuracy:.1%}), 建议优化规则或使用LLM")
|
|||
|
|
return False
|
|||
|
|
|
|||
|
|
|
|||
|
|
if __name__ == "__main__":
|
|||
|
|
import sys
|
|||
|
|
success = test_heuristic_router()
|
|||
|
|
sys.exit(0 if success else 1)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**验收标准**:
|
|||
|
|
- [ ] 准确率 >= 70%
|
|||
|
|
- [ ] 置信度评分合理
|
|||
|
|
- [ ] 响应时间 < 10ms
|
|||
|
|
|
|||
|
|
**运行测试**:
|
|||
|
|
```bash
|
|||
|
|
python poc_2_heuristic.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
#### 步骤2: LLM 路由测试 (poc_2_llm.py)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
"""PoC 2.2: LLM-based routing."""
|
|||
|
|
import asyncio
|
|||
|
|
import os
|
|||
|
|
from anthropic import AsyncAnthropic
|
|||
|
|
|
|||
|
|
|
|||
|
|
class LLMRouter:
|
|||
|
|
"""LLM-based router using Claude Haiku."""
|
|||
|
|
|
|||
|
|
def __init__(self, api_key: str):
|
|||
|
|
"""Initialize with API key."""
|
|||
|
|
self.client = AsyncAnthropic(api_key=api_key)
|
|||
|
|
self.model = "claude-3-5-haiku-20241022"
|
|||
|
|
|
|||
|
|
async def evaluate(self, message: str) -> dict:
|
|||
|
|
"""Evaluate using Claude Haiku."""
|
|||
|
|
system_prompt = """
|
|||
|
|
你是一个智能路由助手。分析用户消息,判断任务复杂度:
|
|||
|
|
|
|||
|
|
- **fast**: 简单查询,不需要复杂工具 (<1000 tokens)
|
|||
|
|
例如: "NAS状态?", "搜索xxx", "是什么"
|
|||
|
|
|
|||
|
|
- **medium**: 中等任务,需要少量工具调用 (<5000 tokens)
|
|||
|
|
例如: "修改配置", "添加功能", "更新文档"
|
|||
|
|
|
|||
|
|
- **deep**: 复杂任务,需要编程或多步骤处理 (>5000 tokens)
|
|||
|
|
例如: "实现xxx", "重构xxx", "设计xxx"
|
|||
|
|
|
|||
|
|
返回 JSON 格式:
|
|||
|
|
{
|
|||
|
|
"mode": "fast|medium|deep",
|
|||
|
|
"reason": "判断理由",
|
|||
|
|
"confidence": 0.0-1.0
|
|||
|
|
}
|
|||
|
|
"""
|
|||
|
|
|
|||
|
|
try:
|
|||
|
|
response = await self.client.messages.create(
|
|||
|
|
model=self.model,
|
|||
|
|
max_tokens=200,
|
|||
|
|
system=system_prompt,
|
|||
|
|
messages=[
|
|||
|
|
{"role": "user", "content": f"分析这个任务: {message}"}
|
|||
|
|
]
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 解析响应
|
|||
|
|
import json
|
|||
|
|
text = response.content[0].text
|
|||
|
|
|
|||
|
|
# 尝试提取JSON
|
|||
|
|
if "{" in text and "}" in text:
|
|||
|
|
json_str = text[text.find("{"):text.rfind("}")+1]
|
|||
|
|
result = json.loads(json_str)
|
|||
|
|
return result
|
|||
|
|
else:
|
|||
|
|
# 降级到启发式
|
|||
|
|
return {"mode": "medium", "reason": "LLM解析失败", "confidence": 0.5}
|
|||
|
|
|
|||
|
|
except Exception as e:
|
|||
|
|
print(f"LLM路由失败: {e}")
|
|||
|
|
return {"mode": "medium", "reason": f"错误: {e}", "confidence": 0.0}
|
|||
|
|
|
|||
|
|
|
|||
|
|
async def test_llm_router():
|
|||
|
|
"""测试LLM路由."""
|
|||
|
|
print("=== PoC 2.2: LLM Router ===\n")
|
|||
|
|
|
|||
|
|
# 检查API密钥
|
|||
|
|
api_key = os.getenv("ANTHROPIC_API_KEY")
|
|||
|
|
if not api_key:
|
|||
|
|
print("⚠️ 未设置 ANTHROPIC_API_KEY,跳过LLM测试")
|
|||
|
|
return True
|
|||
|
|
|
|||
|
|
router = LLMRouter(api_key)
|
|||
|
|
|
|||
|
|
# 测试用例
|
|||
|
|
test_cases = [
|
|||
|
|
("NAS状态?", "fast"),
|
|||
|
|
("实现一个Web服务", "deep"),
|
|||
|
|
("修改配置文件中的端口", "medium"),
|
|||
|
|
]
|
|||
|
|
|
|||
|
|
print(f"测试用例数: {len(test_cases)}\n")
|
|||
|
|
|
|||
|
|
for i, (message, expected) in enumerate(test_cases, 1):
|
|||
|
|
print(f"用例 {i}: {message}")
|
|||
|
|
result = await router.evaluate(message)
|
|||
|
|
|
|||
|
|
is_correct = result["mode"] == expected
|
|||
|
|
status = "✅" if is_correct else "❌"
|
|||
|
|
|
|||
|
|
print(f" {status} 预期: {expected}, 实际: {result['mode']}")
|
|||
|
|
print(f" 理由: {result.get('reason', 'N/A')}")
|
|||
|
|
print(f" 置信度: {result.get('confidence', 0):.2f}")
|
|||
|
|
print()
|
|||
|
|
|
|||
|
|
print("✅ LLM路由可行(需要API费用)")
|
|||
|
|
return True
|
|||
|
|
|
|||
|
|
|
|||
|
|
if __name__ == "__main__":
|
|||
|
|
success = asyncio.run(test_llm_router())
|
|||
|
|
import sys
|
|||
|
|
sys.exit(0 if success else 1)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**验收标准**:
|
|||
|
|
- [ ] LLM可以正确分类
|
|||
|
|
- [ ] 响应时间可接受 (< 2秒)
|
|||
|
|
- [ ] API费用在预算内
|
|||
|
|
|
|||
|
|
**运行测试**:
|
|||
|
|
```bash
|
|||
|
|
export ANTHROPIC_API_KEY="your-api-key"
|
|||
|
|
python poc_2_llm.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### PoC 2 总结
|
|||
|
|
|
|||
|
|
**验证结果**:
|
|||
|
|
- [ ] 启发式规则准确度: ____%
|
|||
|
|
- [ ] LLM路由准确度: ____%
|
|||
|
|
- [ ] LLM平均响应时间: ___ ms
|
|||
|
|
- [ ] 单次LLM路由成本: $_____
|
|||
|
|
|
|||
|
|
**结论**:
|
|||
|
|
- [ ] ✅ 启发式规则足够,先用规则
|
|||
|
|
- [ ] ✅ 启发式 + LLM混合(低置信度时调用LLM)
|
|||
|
|
- [ ] ⚠️ 仅使用LLM(成本和延迟较高)
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## PoC 3: MCP Server 加载
|
|||
|
|
|
|||
|
|
### 目标
|
|||
|
|
验证可以动态加载和调用 MCP Server。
|
|||
|
|
|
|||
|
|
### 验证内容
|
|||
|
|
- [ ] MCP Server 发现和启动
|
|||
|
|
- [ ] MCP 协议通信 (stdio/sse)
|
|||
|
|
- [ ] 工具调用和结果解析
|
|||
|
|
- [ ] 进程生命周期管理
|
|||
|
|
|
|||
|
|
### 实施步骤
|
|||
|
|
|
|||
|
|
#### 步骤1: 基础MCP连接 (poc_3_basic.py)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
"""PoC 3.1: Basic MCP server connection."""
|
|||
|
|
import asyncio
|
|||
|
|
import subprocess
|
|||
|
|
import json
|
|||
|
|
|
|||
|
|
|
|||
|
|
async def test_mcp_connection():
|
|||
|
|
"""测试 MCP Server 连接."""
|
|||
|
|
print("=== PoC 3.1: MCP Connection ===\n")
|
|||
|
|
|
|||
|
|
# 使用官方 filesystem MCP server
|
|||
|
|
cmd = ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
|
|||
|
|
|
|||
|
|
print(f"启动 MCP Server: {' '.join(cmd)}")
|
|||
|
|
print("-" * 60)
|
|||
|
|
|
|||
|
|
try:
|
|||
|
|
# 启动进程
|
|||
|
|
process = subprocess.Popen(
|
|||
|
|
cmd,
|
|||
|
|
stdin=subprocess.PIPE,
|
|||
|
|
stdout=subprocess.PIPE,
|
|||
|
|
stderr=subprocess.PIPE,
|
|||
|
|
text=True,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 发送 initialize 请求
|
|||
|
|
initialize_request = {
|
|||
|
|
"jsonrpc": "2.0",
|
|||
|
|
"id": 1,
|
|||
|
|
"method": "initialize",
|
|||
|
|
"params": {
|
|||
|
|
"protocolVersion": "0.1.0",
|
|||
|
|
"capabilities": {},
|
|||
|
|
"clientInfo": {
|
|||
|
|
"name": "poc-test",
|
|||
|
|
"version": "0.1.0"
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
print("发送 initialize 请求...")
|
|||
|
|
process.stdin.write(json.dumps(initialize_request) + "\n")
|
|||
|
|
process.stdin.flush()
|
|||
|
|
|
|||
|
|
# 读取响应
|
|||
|
|
response_line = process.stdout.readline()
|
|||
|
|
response = json.loads(response_line)
|
|||
|
|
|
|||
|
|
print(f"收到响应: {json.dumps(response, indent=2)}")
|
|||
|
|
|
|||
|
|
if "result" in response:
|
|||
|
|
print("\n✅ MCP 连接成功!")
|
|||
|
|
print(f"Server capabilities: {response['result'].get('capabilities', {})}")
|
|||
|
|
|
|||
|
|
# 清理
|
|||
|
|
process.terminate()
|
|||
|
|
process.wait(timeout=5)
|
|||
|
|
return True
|
|||
|
|
else:
|
|||
|
|
print(f"\n❌ 初始化失败: {response.get('error')}")
|
|||
|
|
process.terminate()
|
|||
|
|
return False
|
|||
|
|
|
|||
|
|
except Exception as e:
|
|||
|
|
print(f"\n❌ 异常: {e}")
|
|||
|
|
import traceback
|
|||
|
|
traceback.print_exc()
|
|||
|
|
return False
|
|||
|
|
|
|||
|
|
|
|||
|
|
if __name__ == "__main__":
|
|||
|
|
success = asyncio.run(test_mcp_connection())
|
|||
|
|
import sys
|
|||
|
|
sys.exit(0 if success else 1)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**验收标准**:
|
|||
|
|
- [ ] MCP Server可以启动
|
|||
|
|
- [ ] initialize握手成功
|
|||
|
|
- [ ] 可以正确解析响应
|
|||
|
|
|
|||
|
|
**运行测试**:
|
|||
|
|
```bash
|
|||
|
|
python poc_3_basic.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
#### 步骤2: 工具调用测试 (poc_3_tools.py)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
"""PoC 3.2: MCP tool calling."""
|
|||
|
|
import asyncio
|
|||
|
|
import subprocess
|
|||
|
|
import json
|
|||
|
|
|
|||
|
|
|
|||
|
|
async def test_mcp_tools():
|
|||
|
|
"""测试 MCP 工具调用."""
|
|||
|
|
print("=== PoC 3.2: MCP Tools ===\n")
|
|||
|
|
|
|||
|
|
cmd = ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
|
|||
|
|
|
|||
|
|
try:
|
|||
|
|
process = subprocess.Popen(
|
|||
|
|
cmd,
|
|||
|
|
stdin=subprocess.PIPE,
|
|||
|
|
stdout=subprocess.PIPE,
|
|||
|
|
stderr=subprocess.PIPE,
|
|||
|
|
text=True,
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# Initialize
|
|||
|
|
initialize_request = {
|
|||
|
|
"jsonrpc": "2.0",
|
|||
|
|
"id": 1,
|
|||
|
|
"method": "initialize",
|
|||
|
|
"params": {
|
|||
|
|
"protocolVersion": "0.1.0",
|
|||
|
|
"capabilities": {},
|
|||
|
|
"clientInfo": {"name": "poc-test", "version": "0.1.0"}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
process.stdin.write(json.dumps(initialize_request) + "\n")
|
|||
|
|
process.stdin.flush()
|
|||
|
|
response = json.loads(process.stdout.readline())
|
|||
|
|
print(f"✓ Initialized")
|
|||
|
|
|
|||
|
|
# List tools
|
|||
|
|
list_tools_request = {
|
|||
|
|
"jsonrpc": "2.0",
|
|||
|
|
"id": 2,
|
|||
|
|
"method": "tools/list",
|
|||
|
|
"params": {}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
print("\n列出可用工具...")
|
|||
|
|
process.stdin.write(json.dumps(list_tools_request) + "\n")
|
|||
|
|
process.stdin.flush()
|
|||
|
|
response = json.loads(process.stdout.readline())
|
|||
|
|
|
|||
|
|
if "result" in response:
|
|||
|
|
tools = response["result"].get("tools", [])
|
|||
|
|
print(f"✓ 找到 {len(tools)} 个工具:")
|
|||
|
|
for tool in tools:
|
|||
|
|
print(f" - {tool['name']}: {tool.get('description', 'N/A')}")
|
|||
|
|
|
|||
|
|
# 测试调用一个工具 (read_file)
|
|||
|
|
if tools:
|
|||
|
|
print("\n测试调用工具...")
|
|||
|
|
|
|||
|
|
# 创建测试文件
|
|||
|
|
test_file = "/tmp/mcp_test.txt"
|
|||
|
|
with open(test_file, "w") as f:
|
|||
|
|
f.write("Hello MCP!")
|
|||
|
|
|
|||
|
|
call_tool_request = {
|
|||
|
|
"jsonrpc": "2.0",
|
|||
|
|
"id": 3,
|
|||
|
|
"method": "tools/call",
|
|||
|
|
"params": {
|
|||
|
|
"name": "read_file",
|
|||
|
|
"arguments": {"path": test_file}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
process.stdin.write(json.dumps(call_tool_request) + "\n")
|
|||
|
|
process.stdin.flush()
|
|||
|
|
response = json.loads(process.stdout.readline())
|
|||
|
|
|
|||
|
|
if "result" in response:
|
|||
|
|
print(f"✓ 工具调用成功: {response['result']}")
|
|||
|
|
print("\n✅ MCP 工具调用验证通过!")
|
|||
|
|
|
|||
|
|
process.terminate()
|
|||
|
|
return True
|
|||
|
|
else:
|
|||
|
|
print(f"❌ 工具调用失败: {response.get('error')}")
|
|||
|
|
|
|||
|
|
process.terminate()
|
|||
|
|
return False
|
|||
|
|
|
|||
|
|
except Exception as e:
|
|||
|
|
print(f"\n❌ 异常: {e}")
|
|||
|
|
import traceback
|
|||
|
|
traceback.print_exc()
|
|||
|
|
return False
|
|||
|
|
|
|||
|
|
|
|||
|
|
if __name__ == "__main__":
|
|||
|
|
success = asyncio.run(test_mcp_tools())
|
|||
|
|
import sys
|
|||
|
|
sys.exit(0 if success else 1)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**验收标准**:
|
|||
|
|
- [ ] 可以列出工具
|
|||
|
|
- [ ] 可以调用工具
|
|||
|
|
- [ ] 结果解析正确
|
|||
|
|
|
|||
|
|
**运行测试**:
|
|||
|
|
```bash
|
|||
|
|
python poc_3_tools.py
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
### PoC 3 总结
|
|||
|
|
|
|||
|
|
**验证结果**:
|
|||
|
|
- [ ] MCP Server 可以启动
|
|||
|
|
- [ ] 协议通信正常
|
|||
|
|
- [ ] 工具调用成功
|
|||
|
|
- [ ] 进程管理可控
|
|||
|
|
|
|||
|
|
**发现的问题**:
|
|||
|
|
1.
|
|||
|
|
2.
|
|||
|
|
|
|||
|
|
**结论**:
|
|||
|
|
- [ ] ✅ MCP 集成可行
|
|||
|
|
- [ ] ⚠️ 需要解决的问题:
|
|||
|
|
- [ ] ❌ 不可行,需要替代方案
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 总体结论
|
|||
|
|
|
|||
|
|
完成所有 PoC 后,填写总体评估:
|
|||
|
|
|
|||
|
|
### PoC 验证总结
|
|||
|
|
|
|||
|
|
| PoC | 状态 | 结论 | 备注 |
|
|||
|
|
|-----|------|------|------|
|
|||
|
|
| Claude CLI 集成 | ⏸️ | - | - |
|
|||
|
|
| 智能路由算法 | ⏸️ | - | - |
|
|||
|
|
| MCP Server 加载 | ⏸️ | - | - |
|
|||
|
|
|
|||
|
|
### 风险评估更新
|
|||
|
|
|
|||
|
|
原有风险的验证结果:
|
|||
|
|
|
|||
|
|
1. **Claude CLI集成复杂** (原风险等级: 高)
|
|||
|
|
- PoC 结果: ____
|
|||
|
|
- 新风险等级: ____
|
|||
|
|
- 建议: ____
|
|||
|
|
|
|||
|
|
2. **智能路由效果不佳** (原风险等级: 中)
|
|||
|
|
- PoC 结果: ____
|
|||
|
|
- 新风险等级: ____
|
|||
|
|
- 建议: ____
|
|||
|
|
|
|||
|
|
3. **MCP Server不稳定** (原风险等级: 中)
|
|||
|
|
- PoC 结果: ____
|
|||
|
|
- 新风险等级: ____
|
|||
|
|
- 建议: ____
|
|||
|
|
|
|||
|
|
### 下一步建议
|
|||
|
|
|
|||
|
|
基于 PoC 结果:
|
|||
|
|
|
|||
|
|
- [ ] ✅ 所有验证通过,可以开始 Phase 0 正式开发
|
|||
|
|
- [ ] ⚠️ 部分通过,需要调整设计后开发
|
|||
|
|
- [ ] ❌ 关键技术不可行,需要重新规划
|
|||
|
|
|
|||
|
|
具体行动:
|
|||
|
|
1.
|
|||
|
|
2.
|
|||
|
|
3.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**文档结束**
|