328 lines
6.9 KiB
Markdown
328 lines
6.9 KiB
Markdown
|
|
# OCR 模块文档
|
|||
|
|
|
|||
|
|
## 概述
|
|||
|
|
|
|||
|
|
OCR 模块提供文字识别功能,支持本地 PaddleOCR 识别和云端 OCR API 扩展。
|
|||
|
|
|
|||
|
|
## 目录结构
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
src/core/ocr.py # OCR 模块主文件
|
|||
|
|
examples/ocr_example.py # 使用示例
|
|||
|
|
tests/test_ocr.py # 测试脚本
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 核心组件
|
|||
|
|
|
|||
|
|
### 1. 数据模型
|
|||
|
|
|
|||
|
|
#### OCRResult
|
|||
|
|
单行识别结果
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
@dataclass
|
|||
|
|
class OCRResult:
|
|||
|
|
text: str # 识别的文本
|
|||
|
|
confidence: float # 置信度 (0-1)
|
|||
|
|
bbox: List[List[float]] # 文本框坐标 [[x1,y1], [x2,y2], [x3,y3], [x4,y4]]
|
|||
|
|
line_index: int # 行索引
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### OCRBatchResult
|
|||
|
|
批量识别结果
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
@dataclass
|
|||
|
|
class OCRBatchResult:
|
|||
|
|
results: List[OCRResult] # 所有识别结果
|
|||
|
|
full_text: str # 完整文本
|
|||
|
|
total_confidence: float # 平均置信度
|
|||
|
|
success: bool # 是否成功
|
|||
|
|
error_message: Optional[str] # 错误信息
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### OCRLanguage
|
|||
|
|
支持的语言
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
class OCRLanguage(str, Enum):
|
|||
|
|
CHINESE = "ch" # 中文
|
|||
|
|
ENGLISH = "en" # 英文
|
|||
|
|
MIXED = "chinese_chinese" # 中英混合
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2. OCR 引擎
|
|||
|
|
|
|||
|
|
#### BaseOCREngine (抽象基类)
|
|||
|
|
所有 OCR 引擎的基类
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
class BaseOCREngine(ABC):
|
|||
|
|
@abstractmethod
|
|||
|
|
def recognize(self, image, preprocess: bool = True) -> OCRBatchResult:
|
|||
|
|
"""识别图像中的文本"""
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### PaddleOCREngine
|
|||
|
|
本地 PaddleOCR 识别引擎
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 创建引擎
|
|||
|
|
config = {
|
|||
|
|
'lang': 'ch', # 语言
|
|||
|
|
'use_gpu': False, # 是否使用 GPU
|
|||
|
|
'show_log': False # 是否显示日志
|
|||
|
|
}
|
|||
|
|
engine = PaddleOCREngine(config)
|
|||
|
|
|
|||
|
|
# 识别
|
|||
|
|
result = engine.recognize(image_path, preprocess=False)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
**配置参数:**
|
|||
|
|
- `lang`: 语言 (ch/en/chinese_chinese)
|
|||
|
|
- `use_gpu`: 是否使用 GPU 加速
|
|||
|
|
- `show_log`: 是否显示 PaddleOCR 日志
|
|||
|
|
|
|||
|
|
#### CloudOCREngine
|
|||
|
|
云端 OCR 适配器(预留接口)
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 配置(需要根据具体 API 实现)
|
|||
|
|
config = {
|
|||
|
|
'api_endpoint': 'https://api.example.com/ocr',
|
|||
|
|
'api_key': 'your_key',
|
|||
|
|
'provider': 'custom',
|
|||
|
|
'timeout': 30
|
|||
|
|
}
|
|||
|
|
engine = CloudOCREngine(config)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 3. 图像预处理器
|
|||
|
|
|
|||
|
|
#### ImagePreprocessor
|
|||
|
|
提供图像增强和预处理功能
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 单独使用
|
|||
|
|
processor = ImagePreprocessor()
|
|||
|
|
image = processor.load_image("image.png")
|
|||
|
|
|
|||
|
|
# 调整大小
|
|||
|
|
resized = processor.resize_image(image, max_width=2000)
|
|||
|
|
|
|||
|
|
# 增强对比度
|
|||
|
|
contrasted = processor.enhance_contrast(image, factor=1.5)
|
|||
|
|
|
|||
|
|
# 增强锐度
|
|||
|
|
sharpened = processor.enhance_sharpness(image, factor=1.5)
|
|||
|
|
|
|||
|
|
# 去噪
|
|||
|
|
denoised = processor.denoise(image)
|
|||
|
|
|
|||
|
|
# 二值化
|
|||
|
|
binary = processor.binarize(image, threshold=127)
|
|||
|
|
|
|||
|
|
# 综合预处理
|
|||
|
|
processed = processor.preprocess(
|
|||
|
|
image,
|
|||
|
|
resize=True,
|
|||
|
|
enhance_contrast=True,
|
|||
|
|
enhance_sharpness=True,
|
|||
|
|
denoise=False,
|
|||
|
|
binarize=False
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. 工厂类
|
|||
|
|
|
|||
|
|
#### OCRFactory
|
|||
|
|
根据模式创建对应的引擎
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 创建本地引擎
|
|||
|
|
local_engine = OCRFactory.create_engine("local", {'lang': 'ch'})
|
|||
|
|
|
|||
|
|
# 创建云端引擎
|
|||
|
|
cloud_engine = OCRFactory.create_engine("cloud", {'api_endpoint': '...'})
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 快速开始
|
|||
|
|
|
|||
|
|
### 安装依赖
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
pip install paddleocr paddlepaddle
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 基本使用
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from src.core.ocr import recognize_text
|
|||
|
|
|
|||
|
|
# 快速识别
|
|||
|
|
result = recognize_text(
|
|||
|
|
image="path/to/image.png",
|
|||
|
|
mode="local",
|
|||
|
|
lang="ch",
|
|||
|
|
use_gpu=False,
|
|||
|
|
preprocess=False
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
if result.success:
|
|||
|
|
print(f"识别文本: {result.full_text}")
|
|||
|
|
print(f"平均置信度: {result.total_confidence:.2f}")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 带预处理的识别
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
result = recognize_text(
|
|||
|
|
image="path/to/image.png",
|
|||
|
|
mode="local",
|
|||
|
|
lang="ch",
|
|||
|
|
preprocess=True # 启用预处理
|
|||
|
|
)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 批量处理
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from src.core.ocr import PaddleOCREngine
|
|||
|
|
|
|||
|
|
engine = PaddleOCREngine({'lang': 'ch'})
|
|||
|
|
|
|||
|
|
for image_path in image_list:
|
|||
|
|
result = engine.recognize(image_path)
|
|||
|
|
print(f"{image_path}: {result.full_text[:50]}...")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 自定义预处理
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from src.core.ocr import preprocess_image, recognize_text
|
|||
|
|
from PIL import Image
|
|||
|
|
|
|||
|
|
# 预处理图像
|
|||
|
|
processed = preprocess_image(
|
|||
|
|
"input.png",
|
|||
|
|
resize=True,
|
|||
|
|
enhance_contrast=True,
|
|||
|
|
enhance_sharpness=True
|
|||
|
|
)
|
|||
|
|
|
|||
|
|
# 识别预处理后的图像
|
|||
|
|
result = recognize_text(processed, mode="local", lang="ch")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 测试
|
|||
|
|
|
|||
|
|
运行测试脚本:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 基本测试
|
|||
|
|
python tests/test_ocr.py --image /path/to/image.png
|
|||
|
|
|
|||
|
|
# 指定语言
|
|||
|
|
python tests/test_ocr.py --image /path/to/image.png --lang en
|
|||
|
|
|
|||
|
|
# 使用 GPU
|
|||
|
|
python tests/test_ocr.py --image /path/to/image.png --gpu
|
|||
|
|
|
|||
|
|
# 仅测试预处理
|
|||
|
|
python tests/test_ocr.py --image /path/to/image.png --preprocess-only
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 支持的输入格式
|
|||
|
|
|
|||
|
|
- **文件路径**: 字符串路径
|
|||
|
|
- **PIL Image**: PIL.Image.Image 对象
|
|||
|
|
- **NumPy 数组**: numpy.ndarray
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
# 三种方式都可以
|
|||
|
|
result1 = recognize_text("/path/to/image.png")
|
|||
|
|
result2 = recognize_text(Image.open("/path/to/image.png"))
|
|||
|
|
result3 = recognize_text(numpy.array(Image.open("/path/to/image.png")))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 性能优化建议
|
|||
|
|
|
|||
|
|
1. **GPU 加速**: 如果有 NVIDIA GPU,设置 `use_gpu=True`
|
|||
|
|
2. **图像大小**: 自动调整到合理大小(max_width=2000)
|
|||
|
|
3. **预处理**: 对低质量图像启用预处理可提高准确率
|
|||
|
|
4. **批量处理**: 复用引擎实例处理多张图片
|
|||
|
|
|
|||
|
|
## 常见问题
|
|||
|
|
|
|||
|
|
### Q: 如何提高识别准确率?
|
|||
|
|
A:
|
|||
|
|
1. 对低质量图片启用预处理 (`preprocess=True`)
|
|||
|
|
2. 确保图片分辨率足够
|
|||
|
|
3. 选择正确的语言参数
|
|||
|
|
4. 尝试不同的预处理组合
|
|||
|
|
|
|||
|
|
### Q: 如何处理中英混合文本?
|
|||
|
|
A:
|
|||
|
|
```python
|
|||
|
|
result = recognize_text(image, lang="chinese_chinese")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Q: 如何获取每行的坐标?
|
|||
|
|
A:
|
|||
|
|
```python
|
|||
|
|
for line_result in result.results:
|
|||
|
|
print(f"文本: {line_result.text}")
|
|||
|
|
print(f"坐标: {line_result.bbox}")
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Q: 云端 OCR 如何使用?
|
|||
|
|
A: CloudOCREngine 是预留接口,需要根据具体的云服务 API 实现 `_send_request` 方法。
|
|||
|
|
|
|||
|
|
## 扩展云端 OCR
|
|||
|
|
|
|||
|
|
如需扩展云端 OCR,继承 `CloudOCREngine` 并实现 `_send_request` 方法:
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
class CustomCloudOCREngine(CloudOCREngine):
|
|||
|
|
def _send_request(self, image_data: bytes) -> Dict[str, Any]:
|
|||
|
|
# 发送 API 请求
|
|||
|
|
# 返回标准格式: {"text": "...", "confidence": 0.95}
|
|||
|
|
pass
|
|||
|
|
|
|||
|
|
def recognize(self, image, preprocess=False) -> OCRBatchResult:
|
|||
|
|
# 实现具体逻辑
|
|||
|
|
pass
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## API 参考
|
|||
|
|
|
|||
|
|
### recognize_text()
|
|||
|
|
快捷识别函数
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def recognize_text(
|
|||
|
|
image, # 图像(路径、PIL Image、numpy 数组)
|
|||
|
|
mode: str = "local", # OCR 模式
|
|||
|
|
lang: str = "ch", # 语言
|
|||
|
|
use_gpu: bool = False, # 是否使用 GPU
|
|||
|
|
preprocess: bool = False, # 是否预处理
|
|||
|
|
**kwargs
|
|||
|
|
) -> OCRBatchResult
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### preprocess_image()
|
|||
|
|
快捷预处理函数
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
def preprocess_image(
|
|||
|
|
image_path: str,
|
|||
|
|
output_path: Optional[str] = None,
|
|||
|
|
resize: bool = True,
|
|||
|
|
enhance_contrast: bool = True,
|
|||
|
|
enhance_sharpness: bool = True,
|
|||
|
|
denoise: bool = False,
|
|||
|
|
binarize: bool = False
|
|||
|
|
) -> Image.Image
|
|||
|
|
```
|