feat: 实现CutThenThink P0阶段核心功能

项目初始化 - 创建完整项目结构（src/, data/, docs/, examples/, tests/） - 配置requirements.txt依赖 - 创建.gitignore P0基础框架 - 数据库模型：Record模型，6种分类类型 - 配置管理：YAML配置，支持AI/OCR/云存储/UI配置 - OCR模块：PaddleOCR本地识别，支持云端扩展 - AI模块：支持OpenAI/Claude/通义/Ollama，6种分类 - 存储模块：完整CRUD，搜索，统计，导入导出 - 主窗口框架：侧边栏导航，米白配色方案 - 图片处理：截图/剪贴板/文件选择/图片预览 - 处理流程整合：OCR→AI→存储串联，Markdown展示，剪贴板复制 - 分类浏览：卡片网格展示，分类筛选，搜索，详情查看技术栈 - PyQt6 + SQLAlchemy + PaddleOCR + OpenAI/Claude SDK - 共47个Python文件，4000+行代码 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 18:21:31 +08:00
commit c4a77f8aa4
79 changed files with 19412 additions and 0 deletions
--- a/docs/ocr_module.md
+++ b/docs/ocr_module.md
@@ -0,0 +1,327 @@
+# OCR 模块文档
+
+## 概述
+
+OCR 模块提供文字识别功能，支持本地 PaddleOCR 识别和云端 OCR API 扩展。
+
+## 目录结构
+
+```
+src/core/ocr.py            # OCR 模块主文件
+examples/ocr_example.py    # 使用示例
+tests/test_ocr.py          # 测试脚本
+```
+
+## 核心组件
+
+### 1. 数据模型
+
+#### OCRResult
+单行识别结果
+
+```python
+@dataclass
+class OCRResult:
+    text: str              # 识别的文本
+    confidence: float      # 置信度 (0-1)
+    bbox: List[List[float]]  # 文本框坐标 [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
+    line_index: int        # 行索引
+```
+
+#### OCRBatchResult
+批量识别结果
+
+```python
+@dataclass
+class OCRBatchResult:
+    results: List[OCRResult]  # 所有识别结果
+    full_text: str            # 完整文本
+    total_confidence: float   # 平均置信度
+    success: bool             # 是否成功
+    error_message: Optional[str]  # 错误信息
+```
+
+#### OCRLanguage
+支持的语言
+
+```python
+class OCRLanguage(str, Enum):
+    CHINESE = "ch"               # 中文
+    ENGLISH = "en"               # 英文
+    MIXED = "chinese_chinese"    # 中英混合
+```
+
+### 2. OCR 引擎
+
+#### BaseOCREngine (抽象基类)
+所有 OCR 引擎的基类
+
+```python
+class BaseOCREngine(ABC):
+    @abstractmethod
+    def recognize(self, image, preprocess: bool = True) -> OCRBatchResult:
+        """识别图像中的文本"""
+```
+
+#### PaddleOCREngine
+本地 PaddleOCR 识别引擎
+
+```python
+# 创建引擎
+config = {
+    'lang': 'ch',      # 语言
+    'use_gpu': False,  # 是否使用 GPU
+    'show_log': False  # 是否显示日志
+}
+engine = PaddleOCREngine(config)
+
+# 识别
+result = engine.recognize(image_path, preprocess=False)
+```
+
+**配置参数:**
+- `lang`: 语言 (ch/en/chinese_chinese)
+- `use_gpu`: 是否使用 GPU 加速
+- `show_log`: 是否显示 PaddleOCR 日志
+
+#### CloudOCREngine
+云端 OCR 适配器（预留接口）
+
+```python
+# 配置（需要根据具体 API 实现）
+config = {
+    'api_endpoint': 'https://api.example.com/ocr',
+    'api_key': 'your_key',
+    'provider': 'custom',
+    'timeout': 30
+}
+engine = CloudOCREngine(config)
+```
+
+### 3. 图像预处理器
+
+#### ImagePreprocessor
+提供图像增强和预处理功能
+
+```python
+# 单独使用
+processor = ImagePreprocessor()
+image = processor.load_image("image.png")
+
+# 调整大小
+resized = processor.resize_image(image, max_width=2000)
+
+# 增强对比度
+contrasted = processor.enhance_contrast(image, factor=1.5)
+
+# 增强锐度
+sharpened = processor.enhance_sharpness(image, factor=1.5)
+
+# 去噪
+denoised = processor.denoise(image)
+
+# 二值化
+binary = processor.binarize(image, threshold=127)
+
+# 综合预处理
+processed = processor.preprocess(
+    image,
+    resize=True,
+    enhance_contrast=True,
+    enhance_sharpness=True,
+    denoise=False,
+    binarize=False
+)
+```
+
+### 4. 工厂类
+
+#### OCRFactory
+根据模式创建对应的引擎
+
+```python
+# 创建本地引擎
+local_engine = OCRFactory.create_engine("local", {'lang': 'ch'})
+
+# 创建云端引擎
+cloud_engine = OCRFactory.create_engine("cloud", {'api_endpoint': '...'})
+```
+
+## 快速开始
+
+### 安装依赖
+
+```bash
+pip install paddleocr paddlepaddle
+```
+
+### 基本使用
+
+```python
+from src.core.ocr import recognize_text
+
+# 快速识别
+result = recognize_text(
+    image="path/to/image.png",
+    mode="local",
+    lang="ch",
+    use_gpu=False,
+    preprocess=False
+)
+
+if result.success:
+    print(f"识别文本: {result.full_text}")
+    print(f"平均置信度: {result.total_confidence:.2f}")
+```
+
+### 带预处理的识别
+
+```python
+result = recognize_text(
+    image="path/to/image.png",
+    mode="local",
+    lang="ch",
+    preprocess=True  # 启用预处理
+)
+```
+
+### 批量处理
+
+```python
+from src.core.ocr import PaddleOCREngine
+
+engine = PaddleOCREngine({'lang': 'ch'})
+
+for image_path in image_list:
+    result = engine.recognize(image_path)
+    print(f"{image_path}: {result.full_text[:50]}...")
+```
+
+### 自定义预处理
+
+```python
+from src.core.ocr import preprocess_image, recognize_text
+from PIL import Image
+
+# 预处理图像
+processed = preprocess_image(
+    "input.png",
+    resize=True,
+    enhance_contrast=True,
+    enhance_sharpness=True
+)
+
+# 识别预处理后的图像
+result = recognize_text(processed, mode="local", lang="ch")
+```
+
+## 测试
+
+运行测试脚本:
+
+```bash
+# 基本测试
+python tests/test_ocr.py --image /path/to/image.png
+
+# 指定语言
+python tests/test_ocr.py --image /path/to/image.png --lang en
+
+# 使用 GPU
+python tests/test_ocr.py --image /path/to/image.png --gpu
+
+# 仅测试预处理
+python tests/test_ocr.py --image /path/to/image.png --preprocess-only
+```
+
+## 支持的输入格式
+
+- **文件路径**: 字符串路径
+- **PIL Image**: PIL.Image.Image 对象
+- **NumPy 数组**: numpy.ndarray
+
+```python
+# 三种方式都可以
+result1 = recognize_text("/path/to/image.png")
+result2 = recognize_text(Image.open("/path/to/image.png"))
+result3 = recognize_text(numpy.array(Image.open("/path/to/image.png")))
+```
+
+## 性能优化建议
+
+1. **GPU 加速**: 如果有 NVIDIA GPU，设置 `use_gpu=True`
+2. **图像大小**: 自动调整到合理大小（max_width=2000）
+3. **预处理**: 对低质量图像启用预处理可提高准确率
+4. **批量处理**: 复用引擎实例处理多张图片
+
+## 常见问题
+
+### Q: 如何提高识别准确率？
+A:
+1. 对低质量图片启用预处理 (`preprocess=True`)
+2. 确保图片分辨率足够
+3. 选择正确的语言参数
+4. 尝试不同的预处理组合
+
+### Q: 如何处理中英混合文本？
+A:
+```python
+result = recognize_text(image, lang="chinese_chinese")
+```
+
+### Q: 如何获取每行的坐标？
+A:
+```python
+for line_result in result.results:
+    print(f"文本: {line_result.text}")
+    print(f"坐标: {line_result.bbox}")
+```
+
+### Q: 云端 OCR 如何使用？
+A: CloudOCREngine 是预留接口，需要根据具体的云服务 API 实现 `_send_request` 方法。
+
+## 扩展云端 OCR
+
+如需扩展云端 OCR，继承 `CloudOCREngine` 并实现 `_send_request` 方法:
+
+```python
+class CustomCloudOCREngine(CloudOCREngine):
+    def _send_request(self, image_data: bytes) -> Dict[str, Any]:
+        # 发送 API 请求
+        # 返回标准格式: {"text": "...", "confidence": 0.95}
+        pass
+
+    def recognize(self, image, preprocess=False) -> OCRBatchResult:
+        # 实现具体逻辑
+        pass
+```
+
+## API 参考
+
+### recognize_text()
+快捷识别函数
+
+```python
+def recognize_text(
+    image,              # 图像（路径、PIL Image、numpy 数组）
+    mode: str = "local",      # OCR 模式
+    lang: str = "ch",          # 语言
+    use_gpu: bool = False,     # 是否使用 GPU
+    preprocess: bool = False,  # 是否预处理
+    **kwargs
+) -> OCRBatchResult
+```
+
+### preprocess_image()
+快捷预处理函数
+
+```python
+def preprocess_image(
+    image_path: str,
+    output_path: Optional[str] = None,
+    resize: bool = True,
+    enhance_contrast: bool = True,
+    enhance_sharpness: bool = True,
+    denoise: bool = False,
+    binarize: bool = False
+) -> Image.Image
+```