Files
MineNasAI/PoC验证.md

1091 lines
28 KiB
Markdown
Raw Normal View History

# MineNASAI - PoC 验证计划
**创建日期**: 2025-02-04
**目标**: 验证3个关键技术的可行性
**预计时间**: 2-3天
---
## 验证目标
### 为什么需要 PoC
在投入大量时间开发之前,我们需要验证以下关键技术的可行性:
1. **Claude Code CLI 集成** (最高优先级)
- 风险:子进程管理、输出解析复杂
- 影响:这是核心功能,如果不可行需要重新设计
2. **智能路由算法**
- 风险:路由准确度不足
- 影响:影响用户体验和资源利用率
3. **MCP Server 加载**
- 风险MCP协议不熟悉、进程通信复杂
- 影响:工具扩展能力的基础
---
## PoC 1: Claude Code CLI 集成
### 目标
验证可以通过 Python 子进程调用 Claude Code CLI并正确解析其输出。
### 验证内容
- [ ] 子进程启动和管理
- [ ] 实时输出流捕获
- [ ] ANSI 转义序列处理
- [ ] 交互式输入处理
- [ ] 超时和资源限制
- [ ] 错误处理和恢复
### 实施步骤
#### 步骤1: 环境准备
```bash
# 1. 确保已安装 Claude Code CLI
# 如果没有,访问: https://docs.anthropic.com/claude/docs/claude-code
# 2. 验证 CLI 可用
claude --version
# 3. 测试基本调用
claude "print hello world in python"
# 4. 创建 PoC 工作目录
mkdir -p poc/claude_cli_test
cd poc/claude_cli_test
```
#### 步骤2: 基础子进程调用 (poc_1_basic.py)
```python
"""PoC 1.1: Basic Claude CLI subprocess call."""
import subprocess
import sys
def test_basic_call():
"""测试基本的 CLI 调用."""
print("=== PoC 1.1: Basic CLI Call ===\n")
# 简单命令
cmd = ["claude", "print hello world in python"]
print(f"执行命令: {' '.join(cmd)}")
print("-" * 60)
try:
# 使用 subprocess.run
result = subprocess.run(
cmd,
capture_output=True,
text=True,
timeout=30,
)
print("STDOUT:")
print(result.stdout)
print("\nSTDERR:")
print(result.stderr)
print(f"\n退出码: {result.returncode}")
if result.returncode == 0:
print("\n✅ 基础调用成功!")
return True
else:
print("\n❌ 命令执行失败")
return False
except subprocess.TimeoutExpired:
print("\n❌ 命令超时")
return False
except Exception as e:
print(f"\n❌ 异常: {e}")
return False
if __name__ == "__main__":
success = test_basic_call()
sys.exit(0 if success else 1)
```
**验收标准**:
- [ ] 脚本可以成功调用 Claude CLI
- [ ] 可以捕获完整输出
- [ ] 超时机制生效
**运行测试**:
```bash
python poc_1_basic.py
```
---
#### 步骤3: 实时输出流捕获 (poc_1_streaming.py)
```python
"""PoC 1.2: Real-time output streaming."""
import subprocess
import sys
import select
import os
def test_streaming_output():
"""测试实时输出流捕获."""
print("=== PoC 1.2: Streaming Output ===\n")
cmd = ["claude", "count to 10 with 1 second delays in python"]
print(f"执行命令: {' '.join(cmd)}")
print("-" * 60)
try:
# 使用 Popen 进行流式输出
process = subprocess.Popen(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
bufsize=1, # 行缓冲
)
print("实时输出:")
# 实时读取输出
while True:
# 检查进程是否结束
if process.poll() is not None:
break
# 读取一行输出
line = process.stdout.readline()
if line:
print(f"[OUT] {line.rstrip()}")
# 读取错误输出
# 注意: stderr 需要非阻塞读取或使用 select
# 获取剩余输出
remaining_out, remaining_err = process.communicate()
if remaining_out:
print(remaining_out)
print(f"\n退出码: {process.returncode}")
if process.returncode == 0:
print("\n✅ 流式输出捕获成功!")
return True
else:
print("\n❌ 进程执行失败")
return False
except Exception as e:
print(f"\n❌ 异常: {e}")
return False
if __name__ == "__main__":
success = test_streaming_output()
sys.exit(0 if success else 1)
```
**验收标准**:
- [ ] 可以实时捕获输出(不是等待结束)
- [ ] 输出不会丢失
- [ ] 可以正确检测进程结束
**运行测试**:
```bash
python poc_1_streaming.py
```
---
#### 步骤4: PTY 伪终端处理 (poc_1_pty.py)
```python
"""PoC 1.3: PTY pseudo-terminal for interactive handling."""
import pty
import os
import sys
import select
import subprocess
def test_pty_terminal():
"""测试 PTY 伪终端处理."""
print("=== PoC 1.3: PTY Terminal ===\n")
cmd = ["claude", "create a simple python calculator"]
print(f"执行命令: {' '.join(cmd)}")
print("-" * 60)
try:
# 创建 PTY
master, slave = pty.openpty()
# 启动进程
process = subprocess.Popen(
cmd,
stdin=slave,
stdout=slave,
stderr=slave,
close_fds=True,
)
os.close(slave) # 子进程已继承,关闭父进程的副本
print("实时输出 (PTY):")
output_buffer = []
# 读取输出
while True:
# 检查进程是否结束
if process.poll() is not None:
break
# 使用 select 进行非阻塞读取
r, w, e = select.select([master], [], [], 0.1)
if master in r:
try:
data = os.read(master, 1024)
if data:
text = data.decode('utf-8', errors='replace')
print(text, end='')
output_buffer.append(text)
except OSError:
break
os.close(master)
print(f"\n\n退出码: {process.returncode}")
print(f"总输出长度: {len(''.join(output_buffer))} 字符")
if process.returncode == 0:
print("\n✅ PTY 终端处理成功!")
return True
else:
print("\n❌ 进程执行失败")
return False
except Exception as e:
print(f"\n❌ 异常: {e}")
import traceback
traceback.print_exc()
return False
if __name__ == "__main__":
success = test_pty_terminal()
sys.exit(0 if success else 1)
```
**验收标准**:
- [ ] PTY 可以正确创建
- [ ] 可以处理交互式输出
- [ ] ANSI 转义序列正确显示
**运行测试**:
```bash
python poc_1_pty.py
```
---
#### 步骤5: 完整集成示例 (poc_1_full.py)
```python
"""PoC 1.4: Full Claude CLI integration."""
import asyncio
import pty
import os
import select
import subprocess
from dataclasses import dataclass
from typing import Optional
@dataclass
class CLIResult:
"""CLI execution result."""
exit_code: int
output: str
error: Optional[str] = None
duration_ms: int = 0
class ClaudeCLI:
"""Claude CLI wrapper."""
def __init__(self, timeout: int = 300):
"""Initialize CLI wrapper."""
self.timeout = timeout
async def execute(self, prompt: str, workspace: str = ".") -> CLIResult:
"""
Execute Claude CLI command.
Args:
prompt: 任务提示
workspace: 工作目录
Returns:
CLIResult with output
"""
import time
start_time = time.time()
cmd = ["claude", prompt]
try:
# 创建 PTY
master, slave = pty.openpty()
# 启动进程
process = subprocess.Popen(
cmd,
stdin=slave,
stdout=slave,
stderr=slave,
cwd=workspace,
close_fds=True,
)
os.close(slave)
# 收集输出
output_buffer = []
# 读取输出 (带超时)
timeout_time = start_time + self.timeout
while True:
# 检查超时
if time.time() > timeout_time:
process.kill()
raise TimeoutError(f"Command timeout after {self.timeout}s")
# 检查进程是否结束
if process.poll() is not None:
break
# 非阻塞读取
r, w, e = select.select([master], [], [], 0.1)
if master in r:
try:
data = os.read(master, 4096)
if data:
text = data.decode('utf-8', errors='replace')
output_buffer.append(text)
except OSError:
break
os.close(master)
duration_ms = int((time.time() - start_time) * 1000)
output = ''.join(output_buffer)
return CLIResult(
exit_code=process.returncode,
output=output,
duration_ms=duration_ms,
)
except TimeoutError as e:
return CLIResult(
exit_code=-1,
output=''.join(output_buffer),
error=str(e),
duration_ms=int((time.time() - start_time) * 1000),
)
except Exception as e:
return CLIResult(
exit_code=-1,
output='',
error=str(e),
duration_ms=int((time.time() - start_time) * 1000),
)
async def test_full_integration():
"""测试完整集成."""
print("=== PoC 1.4: Full Integration ===\n")
cli = ClaudeCLI(timeout=60)
# 测试用例1: 简单任务
print("测试1: 简单Python脚本")
print("-" * 60)
result = await cli.execute("create a hello world python script")
print(f"输出:\n{result.output}")
print(f"\n退出码: {result.exit_code}")
print(f"耗时: {result.duration_ms}ms")
if result.exit_code == 0:
print("✅ 测试1通过")
else:
print(f"❌ 测试1失败: {result.error}")
return False
# 测试用例2: 需要多步骤的任务
print("\n\n测试2: 多步骤任务")
print("-" * 60)
result = await cli.execute(
"create a simple REST API with FastAPI that has a hello endpoint"
)
print(f"输出:\n{result.output[:500]}...") # 只显示前500字符
print(f"\n退出码: {result.exit_code}")
print(f"耗时: {result.duration_ms}ms")
if result.exit_code == 0:
print("✅ 测试2通过")
else:
print(f"❌ 测试2失败: {result.error}")
return False
print("\n" + "=" * 60)
print("✅ 完整集成验证通过!")
print("=" * 60)
return True
if __name__ == "__main__":
success = asyncio.run(test_full_integration())
import sys
sys.exit(0 if success else 1)
```
**验收标准**:
- [ ] 封装类可以正常工作
- [ ] 支持异步调用
- [ ] 超时控制有效
- [ ] 错误处理完善
- [ ] 可以处理简单和复杂任务
**运行测试**:
```bash
python poc_1_full.py
```
---
### PoC 1 总结
完成上述5个步骤后填写以下清单
**验证结果**:
- [ ] 基础调用可行
- [ ] 流式输出捕获可行
- [ ] PTY终端处理可行
- [ ] 完整集成验证通过
- [ ] 性能可接受 (大多数任务 < 1分钟)
**发现的问题**:
1.
2.
3.
**需要的改进**:
1.
2.
3.
**结论**:
- [ ] ✅ 可行,建议使用 PTY + asyncio 方案
- [ ] ⚠️ 部分可行,需要调整设计
- [ ] ❌ 不可行,需要备选方案
---
## PoC 2: 智能路由算法
### 目标
验证路由算法可以合理地将任务分类为 fast/medium/deep 三种模式。
### 验证内容
- [ ] 启发式规则准确度
- [ ] LLM-based 路由可行性
- [ ] 用户指令覆盖
- [ ] 路由性能 (响应时间)
### 实施步骤
#### 步骤1: 启发式规则测试 (poc_2_heuristic.py)
```python
"""PoC 2.1: Heuristic routing rules."""
from dataclasses import dataclass
from typing import Literal
@dataclass
class RoutingDecision:
"""Routing decision."""
mode: Literal["fast", "medium", "deep"]
reason: str
confidence: float # 0.0 - 1.0
class HeuristicRouter:
"""Heuristic-based router."""
def __init__(self):
"""Initialize rules."""
self.rules = {
"fast_keywords": ["状态", "查询", "搜索", "是什么", "?", ""],
"deep_keywords": ["实现", "开发", "编写", "重构", "优化", "设计", "创建"],
"medium_keywords": ["修改", "更新", "添加", "删除"],
}
def evaluate(self, message: str) -> RoutingDecision:
"""Evaluate routing decision."""
message_lower = message.lower()
length = len(message)
# 规则1: 短消息 + 查询关键词 -> fast
if length < 50 and any(kw in message_lower for kw in self.rules["fast_keywords"]):
return RoutingDecision(
mode="fast",
reason="短消息且包含查询关键词",
confidence=0.9,
)
# 规则2: 包含开发关键词 -> deep
deep_count = sum(1 for kw in self.rules["deep_keywords"] if kw in message_lower)
if deep_count >= 1:
return RoutingDecision(
mode="deep",
reason=f"包含 {deep_count} 个开发关键词",
confidence=0.7 + min(deep_count * 0.1, 0.2),
)
# 规则3: 中等长度 + 修改关键词 -> medium
if 50 <= length <= 200 and any(kw in message_lower for kw in self.rules["medium_keywords"]):
return RoutingDecision(
mode="medium",
reason="中等长度且包含修改关键词",
confidence=0.6,
)
# 规则4: 长消息 -> medium/deep
if length > 200:
return RoutingDecision(
mode="medium",
reason="消息较长,可能需要多步处理",
confidence=0.5,
)
# 默认: medium
return RoutingDecision(
mode="medium",
reason="默认中等复杂度",
confidence=0.4,
)
def test_heuristic_router():
"""测试启发式路由."""
print("=== PoC 2.1: Heuristic Router ===\n")
router = HeuristicRouter()
# 测试用例
test_cases = [
("NAS状态?", "fast"),
("搜索最新的Python教程", "fast"),
("实现一个Web服务", "deep"),
("重构这个模块", "deep"),
("修改配置文件中的端口", "medium"),
("添加一个新的API端点", "medium"),
("这是一段很长的描述,我需要你帮我分析一下这个系统的架构,然后给出优化建议,包括性能、安全性和可维护性等多个方面", "medium"),
]
correct = 0
total = len(test_cases)
print(f"测试用例数: {total}\n")
for i, (message, expected) in enumerate(test_cases, 1):
decision = router.evaluate(message)
is_correct = decision.mode == expected
correct += is_correct
status = "✅" if is_correct else "❌"
print(f"{status} 用例 {i}:")
print(f" 消息: {message}")
print(f" 预期: {expected}")
print(f" 实际: {decision.mode} (置信度: {decision.confidence:.2f})")
print(f" 理由: {decision.reason}")
print()
accuracy = correct / total
print("=" * 60)
print(f"准确率: {accuracy:.1%} ({correct}/{total})")
print("=" * 60)
if accuracy >= 0.7:
print("\n✅ 启发式规则准确度可接受 (>= 70%)")
return True
else:
print(f"\n⚠ 准确度较低 ({accuracy:.1%}), 建议优化规则或使用LLM")
return False
if __name__ == "__main__":
import sys
success = test_heuristic_router()
sys.exit(0 if success else 1)
```
**验收标准**:
- [ ] 准确率 >= 70%
- [ ] 置信度评分合理
- [ ] 响应时间 < 10ms
**运行测试**:
```bash
python poc_2_heuristic.py
```
---
#### 步骤2: LLM 路由测试 (poc_2_llm.py)
```python
"""PoC 2.2: LLM-based routing."""
import asyncio
import os
from anthropic import AsyncAnthropic
class LLMRouter:
"""LLM-based router using Claude Haiku."""
def __init__(self, api_key: str):
"""Initialize with API key."""
self.client = AsyncAnthropic(api_key=api_key)
self.model = "claude-3-5-haiku-20241022"
async def evaluate(self, message: str) -> dict:
"""Evaluate using Claude Haiku."""
system_prompt = """
你是一个智能路由助手。分析用户消息,判断任务复杂度:
- **fast**: 简单查询,不需要复杂工具 (<1000 tokens)
例如: "NAS状态?", "搜索xxx", "是什么"
- **medium**: 中等任务,需要少量工具调用 (<5000 tokens)
例如: "修改配置", "添加功能", "更新文档"
- **deep**: 复杂任务,需要编程或多步骤处理 (>5000 tokens)
例如: "实现xxx", "重构xxx", "设计xxx"
返回 JSON 格式:
{
"mode": "fast|medium|deep",
"reason": "判断理由",
"confidence": 0.0-1.0
}
"""
try:
response = await self.client.messages.create(
model=self.model,
max_tokens=200,
system=system_prompt,
messages=[
{"role": "user", "content": f"分析这个任务: {message}"}
]
)
# 解析响应
import json
text = response.content[0].text
# 尝试提取JSON
if "{" in text and "}" in text:
json_str = text[text.find("{"):text.rfind("}")+1]
result = json.loads(json_str)
return result
else:
# 降级到启发式
return {"mode": "medium", "reason": "LLM解析失败", "confidence": 0.5}
except Exception as e:
print(f"LLM路由失败: {e}")
return {"mode": "medium", "reason": f"错误: {e}", "confidence": 0.0}
async def test_llm_router():
"""测试LLM路由."""
print("=== PoC 2.2: LLM Router ===\n")
# 检查API密钥
api_key = os.getenv("ANTHROPIC_API_KEY")
if not api_key:
print("⚠️ 未设置 ANTHROPIC_API_KEY跳过LLM测试")
return True
router = LLMRouter(api_key)
# 测试用例
test_cases = [
("NAS状态?", "fast"),
("实现一个Web服务", "deep"),
("修改配置文件中的端口", "medium"),
]
print(f"测试用例数: {len(test_cases)}\n")
for i, (message, expected) in enumerate(test_cases, 1):
print(f"用例 {i}: {message}")
result = await router.evaluate(message)
is_correct = result["mode"] == expected
status = "✅" if is_correct else "❌"
print(f" {status} 预期: {expected}, 实际: {result['mode']}")
print(f" 理由: {result.get('reason', 'N/A')}")
print(f" 置信度: {result.get('confidence', 0):.2f}")
print()
print("✅ LLM路由可行需要API费用")
return True
if __name__ == "__main__":
success = asyncio.run(test_llm_router())
import sys
sys.exit(0 if success else 1)
```
**验收标准**:
- [ ] LLM可以正确分类
- [ ] 响应时间可接受 (< 2秒)
- [ ] API费用在预算内
**运行测试**:
```bash
export ANTHROPIC_API_KEY="your-api-key"
python poc_2_llm.py
```
---
### PoC 2 总结
**验证结果**:
- [ ] 启发式规则准确度: ____%
- [ ] LLM路由准确度: ____%
- [ ] LLM平均响应时间: ___ ms
- [ ] 单次LLM路由成本: $_____
**结论**:
- [ ] ✅ 启发式规则足够,先用规则
- [ ] ✅ 启发式 + LLM混合低置信度时调用LLM
- [ ] ⚠️ 仅使用LLM成本和延迟较高
---
## PoC 3: MCP Server 加载
### 目标
验证可以动态加载和调用 MCP Server。
### 验证内容
- [ ] MCP Server 发现和启动
- [ ] MCP 协议通信 (stdio/sse)
- [ ] 工具调用和结果解析
- [ ] 进程生命周期管理
### 实施步骤
#### 步骤1: 基础MCP连接 (poc_3_basic.py)
```python
"""PoC 3.1: Basic MCP server connection."""
import asyncio
import subprocess
import json
async def test_mcp_connection():
"""测试 MCP Server 连接."""
print("=== PoC 3.1: MCP Connection ===\n")
# 使用官方 filesystem MCP server
cmd = ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
print(f"启动 MCP Server: {' '.join(cmd)}")
print("-" * 60)
try:
# 启动进程
process = subprocess.Popen(
cmd,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
)
# 发送 initialize 请求
initialize_request = {
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "0.1.0",
"capabilities": {},
"clientInfo": {
"name": "poc-test",
"version": "0.1.0"
}
}
}
print("发送 initialize 请求...")
process.stdin.write(json.dumps(initialize_request) + "\n")
process.stdin.flush()
# 读取响应
response_line = process.stdout.readline()
response = json.loads(response_line)
print(f"收到响应: {json.dumps(response, indent=2)}")
if "result" in response:
print("\n✅ MCP 连接成功!")
print(f"Server capabilities: {response['result'].get('capabilities', {})}")
# 清理
process.terminate()
process.wait(timeout=5)
return True
else:
print(f"\n❌ 初始化失败: {response.get('error')}")
process.terminate()
return False
except Exception as e:
print(f"\n❌ 异常: {e}")
import traceback
traceback.print_exc()
return False
if __name__ == "__main__":
success = asyncio.run(test_mcp_connection())
import sys
sys.exit(0 if success else 1)
```
**验收标准**:
- [ ] MCP Server可以启动
- [ ] initialize握手成功
- [ ] 可以正确解析响应
**运行测试**:
```bash
python poc_3_basic.py
```
---
#### 步骤2: 工具调用测试 (poc_3_tools.py)
```python
"""PoC 3.2: MCP tool calling."""
import asyncio
import subprocess
import json
async def test_mcp_tools():
"""测试 MCP 工具调用."""
print("=== PoC 3.2: MCP Tools ===\n")
cmd = ["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
try:
process = subprocess.Popen(
cmd,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
)
# Initialize
initialize_request = {
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"protocolVersion": "0.1.0",
"capabilities": {},
"clientInfo": {"name": "poc-test", "version": "0.1.0"}
}
}
process.stdin.write(json.dumps(initialize_request) + "\n")
process.stdin.flush()
response = json.loads(process.stdout.readline())
print(f"✓ Initialized")
# List tools
list_tools_request = {
"jsonrpc": "2.0",
"id": 2,
"method": "tools/list",
"params": {}
}
print("\n列出可用工具...")
process.stdin.write(json.dumps(list_tools_request) + "\n")
process.stdin.flush()
response = json.loads(process.stdout.readline())
if "result" in response:
tools = response["result"].get("tools", [])
print(f"✓ 找到 {len(tools)} 个工具:")
for tool in tools:
print(f" - {tool['name']}: {tool.get('description', 'N/A')}")
# 测试调用一个工具 (read_file)
if tools:
print("\n测试调用工具...")
# 创建测试文件
test_file = "/tmp/mcp_test.txt"
with open(test_file, "w") as f:
f.write("Hello MCP!")
call_tool_request = {
"jsonrpc": "2.0",
"id": 3,
"method": "tools/call",
"params": {
"name": "read_file",
"arguments": {"path": test_file}
}
}
process.stdin.write(json.dumps(call_tool_request) + "\n")
process.stdin.flush()
response = json.loads(process.stdout.readline())
if "result" in response:
print(f"✓ 工具调用成功: {response['result']}")
print("\n✅ MCP 工具调用验证通过!")
process.terminate()
return True
else:
print(f"❌ 工具调用失败: {response.get('error')}")
process.terminate()
return False
except Exception as e:
print(f"\n❌ 异常: {e}")
import traceback
traceback.print_exc()
return False
if __name__ == "__main__":
success = asyncio.run(test_mcp_tools())
import sys
sys.exit(0 if success else 1)
```
**验收标准**:
- [ ] 可以列出工具
- [ ] 可以调用工具
- [ ] 结果解析正确
**运行测试**:
```bash
python poc_3_tools.py
```
---
### PoC 3 总结
**验证结果**:
- [ ] MCP Server 可以启动
- [ ] 协议通信正常
- [ ] 工具调用成功
- [ ] 进程管理可控
**发现的问题**:
1.
2.
**结论**:
- [ ] ✅ MCP 集成可行
- [ ] ⚠️ 需要解决的问题:
- [ ] ❌ 不可行,需要替代方案
---
## 总体结论
完成所有 PoC 后,填写总体评估:
### PoC 验证总结
| PoC | 状态 | 结论 | 备注 |
|-----|------|------|------|
| Claude CLI 集成 | ⏸️ | - | - |
| 智能路由算法 | ⏸️ | - | - |
| MCP Server 加载 | ⏸️ | - | - |
### 风险评估更新
原有风险的验证结果:
1. **Claude CLI集成复杂** (原风险等级: 高)
- PoC 结果: ____
- 新风险等级: ____
- 建议: ____
2. **智能路由效果不佳** (原风险等级: 中)
- PoC 结果: ____
- 新风险等级: ____
- 建议: ____
3. **MCP Server不稳定** (原风险等级: 中)
- PoC 结果: ____
- 新风险等级: ____
- 建议: ____
### 下一步建议
基于 PoC 结果:
- [ ] ✅ 所有验证通过,可以开始 Phase 0 正式开发
- [ ] ⚠️ 部分通过,需要调整设计后开发
- [ ] ❌ 关键技术不可行,需要重新规划
具体行动:
1.
2.
3.
---
**文档结束**