fix: 端口更换 & 代码审核修复
端口: - 服务端口 8000 → 7329 - 前端开发端口 5173 → 7330 安全: - CORS 收紧为白名单,关闭 credentials - SPA 路由白名单完善 - 前端 XSS 转义 可靠性: - 时区统一为 datetime.now(timezone.utc) - 文章入库改为内存去重 + 增量计数 - OPML 导入改为 body 参数接收 - OPML 导出 URL XML 转义 - 首次抓取改为 BackgroundTasks 异步 - articles.py HTTPException 移到顶部 import - FTS5 异常显式日志 - FTS5 查询加引号包裹防布尔注入 - 中文摘要支持中文标点 - 去掉未使用的 hashlib import 部署: - Dockerfile 锁 python:3.12.7-slim - requirements 锁定具体版本 - healthcheck 不用 curl(镜像里没有) - docker-compose 使用 .env 文件 - 新增 .env 配置文件
This commit is contained in:
@@ -0,0 +1,8 @@
|
||||
DATA_DIR=/app/data
|
||||
DATABASE_URL=/app/data/rsskeeper.db
|
||||
FETCH_CONCURRENCY=10
|
||||
FETCH_TIMEOUT=30
|
||||
DEFAULT_FETCH_INTERVAL=60
|
||||
MIN_FETCH_INTERVAL=15
|
||||
MAX_ARTICLE_CONTENT_LENGTH=50000
|
||||
MAX_SUMMARY_LENGTH=500
|
||||
+8
-5
@@ -2,16 +2,16 @@
|
||||
# Stage 1: 构建前端
|
||||
FROM node:20-alpine AS frontend-builder
|
||||
WORKDIR /app/frontend
|
||||
COPY frontend/package.json frontend/package-lock.json* ./
|
||||
COPY frontend/package.json ./
|
||||
RUN npm install
|
||||
COPY frontend/ .
|
||||
RUN npm run build
|
||||
|
||||
# Stage 2: Python 后端
|
||||
FROM python:3.12-slim
|
||||
FROM python:3.12.7-slim
|
||||
WORKDIR /app
|
||||
|
||||
# 安装系统依赖
|
||||
# 安装系统依赖(构建时)
|
||||
RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
gcc \
|
||||
libxml2-dev \
|
||||
@@ -22,6 +22,9 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
|
||||
COPY backend/requirements.txt .
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# 清理构建依赖
|
||||
RUN apt-get purge -y gcc && apt-get autoremove -y && rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# 复制后端代码
|
||||
COPY backend/ .
|
||||
|
||||
@@ -31,6 +34,6 @@ COPY --from=frontend-builder /app/frontend/dist ./static
|
||||
# 创建数据目录
|
||||
RUN mkdir -p /app/data
|
||||
|
||||
EXPOSE 8000
|
||||
EXPOSE 7329
|
||||
|
||||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "1"]
|
||||
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "7329", "--workers", "1"]
|
||||
|
||||
@@ -20,10 +20,10 @@ RSS 抓取、管理与检索系统。支持 Docker 部署,包含 Web UI 和 RE
|
||||
git clone <repo-url>
|
||||
cd rssKeeper
|
||||
|
||||
# 启动
|
||||
# 启动(服务运行在 7329 端口)
|
||||
docker-compose up -d --build
|
||||
|
||||
# 访问 http://localhost:8000
|
||||
# 访问 http://localhost:7329
|
||||
```
|
||||
|
||||
### 开发模式
|
||||
@@ -32,9 +32,9 @@ docker-compose up -d --build
|
||||
# 后端
|
||||
cd backend
|
||||
pip install -r requirements.txt
|
||||
uvicorn main:app --reload --port 8000
|
||||
uvicorn main:app --reload --port 7329
|
||||
|
||||
# 前端(另开终端)
|
||||
# 前端(另开终端,运行在 7330 端口)
|
||||
cd frontend
|
||||
npm install
|
||||
npm run dev
|
||||
@@ -46,31 +46,31 @@ npm run dev
|
||||
|
||||
```bash
|
||||
# 获取最近 24 小时的文章
|
||||
curl "http://localhost:8000/api/v1/external/recent?hours=24&limit=50"
|
||||
curl "http://localhost:7329/api/v1/external/recent?hours=24&limit=50"
|
||||
|
||||
# 指定 RSS 源
|
||||
curl "http://localhost:8000/api/v1/external/recent?feed_id=1&hours=48"
|
||||
curl "http://localhost:7329/api/v1/external/recent?feed_id=1&hours=48"
|
||||
|
||||
# 指定分类
|
||||
curl "http://localhost:8000/api/v1/external/recent?category=科技&hours=24"
|
||||
curl "http://localhost:7329/api/v1/external/recent?category=科技&hours=24"
|
||||
```
|
||||
|
||||
### 获取源列表
|
||||
|
||||
```bash
|
||||
curl "http://localhost:8000/api/v1/external/feeds"
|
||||
curl "http://localhost:7329/api/v1/external/feeds"
|
||||
```
|
||||
|
||||
### 按源获取文章
|
||||
|
||||
```bash
|
||||
curl "http://localhost:8000/api/v1/external/feeds/1/articles?limit=100"
|
||||
curl "http://localhost:7329/api/v1/external/feeds/1/articles?limit=100"
|
||||
```
|
||||
|
||||
### 获取每日摘要
|
||||
|
||||
```bash
|
||||
curl "http://localhost:8000/api/v1/external/summary?date=2024-06-01"
|
||||
curl "http://localhost:7329/api/v1/external/summary?date=2024-06-01"
|
||||
```
|
||||
|
||||
## 技术栈
|
||||
|
||||
+7
-3
@@ -43,15 +43,19 @@ def init_fts5():
|
||||
conn = engine.raw_connection()
|
||||
cursor = conn.cursor()
|
||||
|
||||
import logging
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
# 检查 FTS5 扩展是否可用
|
||||
try:
|
||||
cursor.execute("SELECT sqlite_compileoption_used('ENABLE_FTS5')")
|
||||
has_fts5 = cursor.fetchone()[0]
|
||||
if not has_fts5:
|
||||
print("警告: SQLite 未启用 FTS5 扩展,全文搜索将不可用")
|
||||
logger.warning("SQLite 未启用 FTS5 扩展,全文搜索将不可用")
|
||||
return
|
||||
except Exception:
|
||||
pass
|
||||
except Exception as e:
|
||||
logger.error(f"FTS5 检测失败: {e}")
|
||||
return
|
||||
|
||||
# 创建 FTS5 虚拟表
|
||||
cursor.execute("""
|
||||
|
||||
@@ -10,8 +10,13 @@ def search_articles(query: str, limit: int = 50, offset: int = 0):
|
||||
if not query or not query.strip():
|
||||
return [], 0
|
||||
|
||||
# 转义 FTS5 特殊字符
|
||||
# 转义 FTS5 特殊字符(双引号、* 等)
|
||||
# 简单策略:将用户查询视为一个整体短语,加引号包裹
|
||||
query = query.replace('"', '""').strip()
|
||||
if not query:
|
||||
return [], 0
|
||||
# 用双引号包裹,避免 FTS5 布尔操作符被误解析
|
||||
query = f'"{query}"'
|
||||
|
||||
conn = engine.raw_connection()
|
||||
cursor = conn.cursor()
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
"""RSS 源健康度检测"""
|
||||
from datetime import datetime, timedelta
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from typing import List, Dict
|
||||
from sqlalchemy import func
|
||||
from sqlalchemy.orm import Session
|
||||
from models import Feed, FetchLog
|
||||
|
||||
@@ -9,6 +10,7 @@ def get_feed_health(db: Session, feed_id: int = None) -> List[Dict]:
|
||||
"""获取 RSS 源健康度信息
|
||||
返回每个源的健康状态详情
|
||||
"""
|
||||
now = datetime.now(timezone.utc)
|
||||
query = db.query(Feed)
|
||||
if feed_id:
|
||||
query = query.filter(Feed.id == feed_id)
|
||||
@@ -22,15 +24,16 @@ def get_feed_health(db: Session, feed_id: int = None) -> List[Dict]:
|
||||
|
||||
days_since_fetch = None
|
||||
if feed.last_fetch_at:
|
||||
days_since_fetch = (datetime.utcnow() - feed.last_fetch_at).days
|
||||
days_since_fetch = (now - feed.last_fetch_at).days
|
||||
|
||||
# 获取最近 7 天抓取记录
|
||||
week_ago = now - timedelta(days=7)
|
||||
recent_logs = db.query(FetchLog).filter(
|
||||
FetchLog.feed_id == feed.id,
|
||||
FetchLog.created_at >= datetime.utcnow() - timedelta(days=7)
|
||||
FetchLog.created_at >= week_ago
|
||||
).order_by(FetchLog.created_at.desc()).limit(10).all()
|
||||
|
||||
health = feed.health_status()
|
||||
health = feed.health_status(now=now)
|
||||
|
||||
results.append({
|
||||
"id": feed.id,
|
||||
@@ -76,14 +79,14 @@ def get_overall_stats(db: Session) -> Dict:
|
||||
"""获取整体统计信息"""
|
||||
total_feeds = db.query(Feed).count()
|
||||
active_feeds = db.query(Feed).filter(Feed.is_active == True).count()
|
||||
total_articles = db.query(Feed).with_entities(Feed.article_count).all()
|
||||
total_articles_count = sum(a[0] for a in total_articles) if total_articles else 0
|
||||
total_articles_count = db.query(func.sum(Feed.article_count)).scalar() or 0
|
||||
|
||||
# 健康源统计
|
||||
feeds = db.query(Feed).all()
|
||||
healthy = warning = unhealthy = 0
|
||||
now = datetime.now(timezone.utc)
|
||||
for feed in feeds:
|
||||
status = feed.health_status()
|
||||
status = feed.health_status(now=now)
|
||||
if status == "healthy":
|
||||
healthy += 1
|
||||
elif status == "warning":
|
||||
@@ -92,8 +95,7 @@ def get_overall_stats(db: Session) -> Dict:
|
||||
unhealthy += 1
|
||||
|
||||
# 今日抓取
|
||||
today = datetime.utcnow().replace(hour=0, minute=0, second=0, microsecond=0)
|
||||
from models import FetchLog
|
||||
today = now.replace(hour=0, minute=0, second=0, microsecond=0)
|
||||
today_fetches = db.query(FetchLog).filter(FetchLog.created_at >= today).count()
|
||||
today_success = db.query(FetchLog).filter(
|
||||
FetchLog.created_at >= today, FetchLog.status == "success"
|
||||
|
||||
+17
-7
@@ -35,13 +35,17 @@ app = FastAPI(
|
||||
lifespan=lifespan,
|
||||
)
|
||||
|
||||
# CORS
|
||||
# CORS — 仅允许同源和开发环境
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
allow_origins=[
|
||||
"http://localhost:7329",
|
||||
"http://localhost:7330",
|
||||
"http://127.0.0.1:7329",
|
||||
],
|
||||
allow_credentials=False,
|
||||
allow_methods=["GET", "POST", "PUT", "DELETE"],
|
||||
allow_headers=["Content-Type", "Authorization", "X-API-Key"],
|
||||
)
|
||||
|
||||
# API 路由
|
||||
@@ -62,11 +66,17 @@ static_dir = os.path.join(config.BASE_DIR, "static")
|
||||
if os.path.exists(static_dir):
|
||||
app.mount("/static", StaticFiles(directory=static_dir), name="static")
|
||||
|
||||
# API 路径白名单 — 这些路径不应被 SPA 兜底
|
||||
_API_PATHS = {
|
||||
"api", "docs", "openapi.json", "redoc",
|
||||
}
|
||||
|
||||
@app.get("/{full_path:path}")
|
||||
async def serve_spa(full_path: str):
|
||||
"""Vue SPA 路由回退"""
|
||||
# API 路由不走这里
|
||||
if full_path.startswith("api/") or full_path.startswith("docs") or full_path.startswith("openapi.json"):
|
||||
# API/文档路由不走 SPA 兜底
|
||||
first_seg = full_path.split("/")[0] if full_path else ""
|
||||
if first_seg in _API_PATHS:
|
||||
return {"detail": "Not found"}
|
||||
|
||||
index_path = os.path.join(static_dir, "index.html")
|
||||
|
||||
+9
-6
@@ -1,5 +1,5 @@
|
||||
"""SQLAlchemy 数据模型"""
|
||||
from datetime import datetime
|
||||
from datetime import datetime, timezone
|
||||
from sqlalchemy import Column, Integer, String, Text, Boolean, DateTime, ForeignKey
|
||||
from sqlalchemy.orm import relationship
|
||||
from database import Base
|
||||
@@ -25,13 +25,13 @@ class Feed(Base):
|
||||
fail_count = Column(Integer, default=0)
|
||||
article_count = Column(Integer, default=0)
|
||||
|
||||
created_at = Column(DateTime, default=datetime.utcnow)
|
||||
created_at = Column(DateTime, default=lambda: datetime.now(timezone.utc))
|
||||
|
||||
# 关联
|
||||
articles = relationship("Article", back_populates="feed", cascade="all, delete-orphan")
|
||||
fetch_logs = relationship("FetchLog", back_populates="feed", cascade="all, delete-orphan")
|
||||
|
||||
def health_status(self):
|
||||
def health_status(self, now: datetime = None):
|
||||
"""计算健康度
|
||||
🟢 健康: 成功率 >= 90%, 最近7天有更新
|
||||
🟡 警告: 成功率 50%-90%, 或超过3天未更新
|
||||
@@ -43,9 +43,12 @@ class Feed(Base):
|
||||
|
||||
success_rate = self.success_count / total
|
||||
|
||||
if now is None:
|
||||
now = datetime.now(timezone.utc)
|
||||
|
||||
days_since_last_fetch = None
|
||||
if self.last_fetch_at:
|
||||
days_since_last_fetch = (datetime.utcnow() - self.last_fetch_at).days
|
||||
days_since_last_fetch = (now - self.last_fetch_at).days
|
||||
|
||||
if success_rate >= 0.9 and (days_since_last_fetch is None or days_since_last_fetch <= 7):
|
||||
return "healthy"
|
||||
@@ -68,7 +71,7 @@ class Article(Base):
|
||||
content = Column(Text, default="")
|
||||
summary = Column(Text, default="")
|
||||
is_read = Column(Boolean, default=False)
|
||||
created_at = Column(DateTime, default=datetime.utcnow, index=True)
|
||||
created_at = Column(DateTime, default=lambda: datetime.now(timezone.utc), index=True)
|
||||
|
||||
# 关联
|
||||
feed = relationship("Feed", back_populates="articles")
|
||||
@@ -84,7 +87,7 @@ class FetchLog(Base):
|
||||
articles_fetched = Column(Integer, default=0)
|
||||
error_message = Column(Text, default="")
|
||||
response_time_ms = Column(Integer, nullable=True)
|
||||
created_at = Column(DateTime, default=datetime.utcnow, index=True)
|
||||
created_at = Column(DateTime, default=lambda: datetime.now(timezone.utc), index=True)
|
||||
|
||||
# 关联
|
||||
feed = relationship("Feed", back_populates="fetch_logs")
|
||||
|
||||
@@ -1,9 +1,9 @@
|
||||
fastapi>=0.110.0
|
||||
uvicorn[standard]>=0.29.0
|
||||
sqlalchemy>=2.0.0
|
||||
pydantic>=2.6.0
|
||||
feedparser>=6.0.11
|
||||
requests>=2.31.0
|
||||
beautifulsoup4>=4.12.0
|
||||
apscheduler>=3.10.4
|
||||
lxml>=5.1.0
|
||||
fastapi==0.115.0
|
||||
uvicorn[standard]==0.32.0
|
||||
sqlalchemy==2.0.36
|
||||
pydantic==2.9.2
|
||||
feedparser==6.0.11
|
||||
requests==2.32.3
|
||||
beautifulsoup4==4.12.3
|
||||
apscheduler==3.10.4
|
||||
lxml==5.3.0
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
"""文章管理 API"""
|
||||
from typing import Optional
|
||||
from fastapi import APIRouter, Depends
|
||||
from fastapi import APIRouter, Depends, HTTPException
|
||||
from pydantic import BaseModel
|
||||
from sqlalchemy.orm import Session
|
||||
from sqlalchemy import desc
|
||||
@@ -130,4 +130,3 @@ def fulltext_search(
|
||||
return {"total": total, "items": results}
|
||||
|
||||
|
||||
from fastapi import HTTPException
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
"""对外 API(供 AI/外部系统调用)"""
|
||||
from typing import Optional
|
||||
from datetime import datetime, timedelta
|
||||
from datetime import datetime, timedelta, timezone
|
||||
from fastapi import APIRouter, Depends
|
||||
from sqlalchemy.orm import Session
|
||||
from sqlalchemy import desc
|
||||
@@ -21,7 +21,7 @@ def get_recent_articles(
|
||||
"""获取最近 N 小时的文章
|
||||
这是对外提供给 AI 分析的主要接口
|
||||
"""
|
||||
since = datetime.utcnow() - timedelta(hours=hours)
|
||||
since = datetime.now(timezone.utc) - timedelta(hours=hours)
|
||||
|
||||
query = db.query(Article, Feed.title.label("feed_title"), Feed.category.label("category")).join(Feed)
|
||||
|
||||
@@ -136,7 +136,7 @@ def get_daily_summary(
|
||||
except ValueError:
|
||||
return {"error": "Invalid date format, use YYYY-MM-DD"}
|
||||
else:
|
||||
day = datetime.utcnow().replace(hour=0, minute=0, second=0, microsecond=0)
|
||||
day = datetime.now(timezone.utc).replace(hour=0, minute=0, second=0, microsecond=0)
|
||||
next_day = day + timedelta(days=1)
|
||||
|
||||
query = db.query(Article, Feed.title.label("feed_title"), Feed.category.label("category")).join(Feed)
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
"""RSS 源管理 API"""
|
||||
from typing import List, Optional
|
||||
from fastapi import APIRouter, Depends, HTTPException
|
||||
from fastapi import APIRouter, Depends, HTTPException, BackgroundTasks
|
||||
from pydantic import BaseModel, HttpUrl
|
||||
from sqlalchemy.orm import Session
|
||||
from database import get_db
|
||||
@@ -103,7 +103,11 @@ def list_categories(db: Session = Depends(get_db)):
|
||||
|
||||
|
||||
@router.post("", response_model=dict)
|
||||
def create_feed(data: FeedCreate, db: Session = Depends(get_db)):
|
||||
def create_feed(
|
||||
data: FeedCreate,
|
||||
background_tasks: BackgroundTasks,
|
||||
db: Session = Depends(get_db),
|
||||
):
|
||||
"""添加 RSS 源"""
|
||||
# 检查是否已存在
|
||||
existing = db.query(Feed).filter(Feed.url == data.url).first()
|
||||
@@ -126,10 +130,10 @@ def create_feed(data: FeedCreate, db: Session = Depends(get_db)):
|
||||
if feed.is_active:
|
||||
add_feed_job(feed.id, feed.fetch_interval_minutes)
|
||||
|
||||
# 立即抓取一次
|
||||
fetch_and_store_feed(feed.id)
|
||||
# 后台异步首次抓取,不阻塞 HTTP 响应
|
||||
background_tasks.add_task(fetch_and_store_feed, feed.id)
|
||||
|
||||
return {"id": feed.id, "message": "RSS 源添加成功", "url": feed.url}
|
||||
return {"id": feed.id, "message": "RSS 源添加成功,正在后台抓取", "url": feed.url}
|
||||
|
||||
|
||||
@router.post("/discover")
|
||||
@@ -217,13 +221,25 @@ def trigger_fetch(feed_id: int, db: Session = Depends(get_db)):
|
||||
return result
|
||||
|
||||
|
||||
class OpmlImport(BaseModel):
|
||||
opml_content: str
|
||||
|
||||
|
||||
@router.post("/import-opml")
|
||||
def import_opml(opml_content: str, db: Session = Depends(get_db)):
|
||||
def import_opml(data: OpmlImport, db: Session = Depends(get_db)):
|
||||
"""导入 OPML 文件内容"""
|
||||
import xml.etree.ElementTree as ET
|
||||
|
||||
content = data.opml_content.strip()
|
||||
if not content:
|
||||
raise HTTPException(status_code=400, detail="OPML 内容不能为空")
|
||||
|
||||
# 限制大小(防止滥用)
|
||||
if len(content) > 5_000_000: # 5MB
|
||||
raise HTTPException(status_code=413, detail="OPML 文件过大")
|
||||
|
||||
try:
|
||||
root = ET.fromstring(opml_content)
|
||||
root = ET.fromstring(content)
|
||||
except ET.ParseError:
|
||||
raise HTTPException(status_code=400, detail="无效的 OPML 文件")
|
||||
|
||||
@@ -261,12 +277,14 @@ def import_opml(opml_content: str, db: Session = Depends(get_db)):
|
||||
@router.get("/export-opml")
|
||||
def export_opml(db: Session = Depends(get_db)):
|
||||
"""导出 OPML 文件内容"""
|
||||
from xml.sax.saxutils import escape
|
||||
feeds = db.query(Feed).all()
|
||||
|
||||
lines = ['<?xml version="1.0" encoding="UTF-8"?>', '<opml version="2.0">', '<head><title>rssKeeper Feeds</title></head>', '<body>']
|
||||
for feed in feeds:
|
||||
title = (feed.title or feed.url).replace('"', '"')
|
||||
lines.append(f' <outline type="rss" text="{title}" xmlUrl="{feed.url}" />')
|
||||
title = escape(feed.title or feed.url, {'"': '"'})
|
||||
url = escape(feed.url)
|
||||
lines.append(f' <outline type="rss" text="{title}" xmlUrl="{url}" />')
|
||||
lines.append('</body>')
|
||||
lines.append('</opml>')
|
||||
|
||||
|
||||
+49
-26
@@ -2,7 +2,6 @@
|
||||
import time
|
||||
import re
|
||||
import html
|
||||
import hashlib
|
||||
from datetime import datetime, timezone
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
from urllib.parse import urljoin
|
||||
@@ -99,16 +98,16 @@ def parse_article(entry, feed_id: int) -> dict:
|
||||
link = entry.get("link", "")
|
||||
author = entry.get("author", "")
|
||||
|
||||
# 发布时间
|
||||
# 发布时间 — 统一存为 UTC aware datetime
|
||||
published_at = None
|
||||
if hasattr(entry, "published_parsed") and entry.published_parsed:
|
||||
try:
|
||||
published_at = datetime(*entry.published_parsed[:6], tzinfo=timezone.utc).replace(tzinfo=None)
|
||||
published_at = datetime(*entry.published_parsed[:6], tzinfo=timezone.utc)
|
||||
except (ValueError, TypeError):
|
||||
pass
|
||||
if not published_at and hasattr(entry, "updated_parsed") and entry.updated_parsed:
|
||||
try:
|
||||
published_at = datetime(*entry.updated_parsed[:6], tzinfo=timezone.utc).replace(tzinfo=None)
|
||||
published_at = datetime(*entry.updated_parsed[:6], tzinfo=timezone.utc)
|
||||
except (ValueError, TypeError):
|
||||
pass
|
||||
|
||||
@@ -172,9 +171,14 @@ def generate_summary(content: str, max_length: int = 300) -> str:
|
||||
if len(text) <= max_length:
|
||||
return text
|
||||
|
||||
# 在句子边界截断
|
||||
# 在句子边界截断(支持中英文标点)
|
||||
truncated = text[:max_length]
|
||||
last_period = max(truncated.rfind("。"), truncated.rfind(". "), truncated.rfind("! "), truncated.rfind("? "))
|
||||
last_period = max(
|
||||
truncated.rfind("。"), truncated.rfind(". "),
|
||||
truncated.rfind("! "), truncated.rfind("? "),
|
||||
truncated.rfind("?"), truncated.rfind("!"),
|
||||
truncated.rfind(";"),
|
||||
)
|
||||
if last_period > max_length * 0.5:
|
||||
return truncated[:last_period + 1]
|
||||
|
||||
@@ -195,7 +199,7 @@ def fetch_and_store_feed(feed_id: int) -> dict:
|
||||
|
||||
if not result["success"]:
|
||||
# 记录失败
|
||||
feed.last_fetch_at = datetime.utcnow()
|
||||
feed.last_fetch_at = datetime.now(timezone.utc)
|
||||
feed.last_fetch_status = "fail"
|
||||
feed.last_error = result["error"]
|
||||
feed.fail_count += 1
|
||||
@@ -218,34 +222,53 @@ def fetch_and_store_feed(feed_id: int) -> dict:
|
||||
if hasattr(parsed.feed, "description"):
|
||||
feed.description = parsed.feed.description[:1000]
|
||||
|
||||
# 存储文章
|
||||
new_count = 0
|
||||
# 存储文章 — 先收集所有文章,内存去重后批量入库
|
||||
seen_links = set()
|
||||
articles_to_add = []
|
||||
articles_to_update = []
|
||||
|
||||
for entry in parsed.entries:
|
||||
article_data = parse_article(entry, feed_id)
|
||||
if not article_data["link"]:
|
||||
link = article_data.get("link", "")
|
||||
if not link or link in seen_links:
|
||||
continue
|
||||
seen_links.add(link)
|
||||
articles_to_add.append(article_data)
|
||||
|
||||
# 检查是否已存在(基于 link)
|
||||
existing = db.query(Article).filter(Article.link == article_data["link"]).first()
|
||||
if existing:
|
||||
# 更新已有文章
|
||||
existing.title = article_data["title"] or existing.title
|
||||
existing.content = article_data["content"] or existing.content
|
||||
existing.summary = article_data["summary"] or existing.summary
|
||||
existing.author = article_data["author"] or existing.author
|
||||
if article_data["published_at"]:
|
||||
existing.published_at = article_data["published_at"]
|
||||
else:
|
||||
article = Article(**article_data)
|
||||
db.add(article)
|
||||
new_count += 1
|
||||
# 批量查询已有文章
|
||||
if articles_to_add:
|
||||
existing_links = {
|
||||
row[0] for row in db.query(Article.link).filter(
|
||||
Article.link.in_([a["link"] for a in articles_to_add])
|
||||
).all()
|
||||
}
|
||||
|
||||
new_count = 0
|
||||
for article_data in articles_to_add:
|
||||
if article_data["link"] in existing_links:
|
||||
articles_to_update.append(article_data)
|
||||
else:
|
||||
article = Article(**article_data)
|
||||
db.add(article)
|
||||
new_count += 1
|
||||
|
||||
# 更新已有文章
|
||||
for article_data in articles_to_update:
|
||||
existing = db.query(Article).filter(Article.link == article_data["link"]).first()
|
||||
if existing:
|
||||
existing.title = article_data["title"] or existing.title
|
||||
existing.content = article_data["content"] or existing.content
|
||||
existing.summary = article_data["summary"] or existing.summary
|
||||
existing.author = article_data["author"] or existing.author
|
||||
if article_data["published_at"]:
|
||||
existing.published_at = article_data["published_at"]
|
||||
|
||||
# 更新 feed 统计
|
||||
feed.last_fetch_at = datetime.utcnow()
|
||||
feed.last_fetch_at = datetime.now(timezone.utc)
|
||||
feed.last_fetch_status = "success"
|
||||
feed.last_error = ""
|
||||
feed.success_count += 1
|
||||
feed.article_count = db.query(Article).filter(Article.feed_id == feed_id).count()
|
||||
feed.article_count += new_count
|
||||
|
||||
log = FetchLog(
|
||||
feed_id=feed_id,
|
||||
|
||||
+4
-11
@@ -7,21 +7,14 @@ services:
|
||||
dockerfile: Dockerfile
|
||||
container_name: rsskeeper
|
||||
ports:
|
||||
- "8000:8000"
|
||||
- "7329:7329"
|
||||
volumes:
|
||||
- ./data:/app/data
|
||||
environment:
|
||||
- DATA_DIR=/app/data
|
||||
- DATABASE_URL=/app/data/rsskeeper.db
|
||||
- FETCH_CONCURRENCY=10
|
||||
- FETCH_TIMEOUT=30
|
||||
- DEFAULT_FETCH_INTERVAL=60
|
||||
- MIN_FETCH_INTERVAL=15
|
||||
- MAX_ARTICLE_CONTENT_LENGTH=50000
|
||||
- MAX_SUMMARY_LENGTH=500
|
||||
env_file:
|
||||
- .env
|
||||
restart: unless-stopped
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8000/api/health"]
|
||||
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:7329/api/health')"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
|
||||
@@ -44,10 +44,19 @@ const formatTime = (iso) => {
|
||||
return d.toLocaleString('zh-CN')
|
||||
}
|
||||
|
||||
// 简单的 HTML 实体转义,防止 XSS
|
||||
const escapeHtml = (text) => {
|
||||
if (!text) return ''
|
||||
const div = document.createElement('div')
|
||||
div.textContent = text
|
||||
return div.innerHTML
|
||||
}
|
||||
|
||||
const formatContent = (content) => {
|
||||
if (!content) return '<p style="color: #718096;">暂无内容</p>'
|
||||
// 将纯文本中的换行转为 HTML 换行
|
||||
return content
|
||||
// 先转义再换行,防止 XSS
|
||||
const escaped = escapeHtml(content)
|
||||
return escaped
|
||||
.replace(/\n/g, '<br>')
|
||||
.replace(/ /g, ' ')
|
||||
}
|
||||
|
||||
@@ -15,14 +15,14 @@ export default defineConfig({
|
||||
},
|
||||
},
|
||||
server: {
|
||||
port: 5173,
|
||||
port: 7330,
|
||||
proxy: {
|
||||
'/api': {
|
||||
target: 'http://localhost:8000',
|
||||
target: 'http://localhost:7329',
|
||||
changeOrigin: true,
|
||||
},
|
||||
'/api/v1': {
|
||||
target: 'http://localhost:8000',
|
||||
target: 'http://localhost:7329',
|
||||
changeOrigin: true,
|
||||
},
|
||||
},
|
||||
|
||||
@@ -0,0 +1,518 @@
|
||||
# rssKeeper 代码审核报告
|
||||
|
||||
> 审核日期:2026-06-11
|
||||
> 审核范围:后端(Python / FastAPI)+ 前端(Vue 3)+ 部署(Docker)
|
||||
> 审核版本:`54e7db0 feat: init rssKeeper - RSS 抓取、管理与检索系统`
|
||||
|
||||
---
|
||||
|
||||
## 0. 项目概览
|
||||
|
||||
| 项目 | 信息 |
|
||||
|------|------|
|
||||
| 项目名 | rssKeeper — RSS 抓取、管理与检索系统 |
|
||||
| 后端技术栈 | Python 3.12 + FastAPI + SQLAlchemy 2.0 + APScheduler + SQLite (FTS5) |
|
||||
| 前端技术栈 | Vue 3 + Vue Router 4 + Element Plus + Vite |
|
||||
| 部署 | Docker 多阶段构建(前端 Node 20 + 后端 python:3.12-slim) |
|
||||
| 代码规模 | 后端 8 个核心文件 + 4 个 router,前端 1 个根组件 + 4 个 view |
|
||||
|
||||
整体评价:架构清晰、模块划分合理、命名规范统一,技术栈选型恰当。但**安全、可靠性、配置**三方面存在多个需立即处理的问题。
|
||||
|
||||
### 综合评分
|
||||
|
||||
| 维度 | 评分 | 说明 |
|
||||
|------|------|------|
|
||||
| 架构清晰度 | ⭐⭐⭐⭐ | 分层明确,routers/services 分离好 |
|
||||
| 可读性 | ⭐⭐⭐⭐ | 函数短小、命名合理;个别处手写 dict 偏多 |
|
||||
| 安全性 | ⭐⭐ | CORS、鉴权、XSS 均有隐患 |
|
||||
| 可靠性 | ⭐⭐⭐ | 缺关键单测;部分异常被静默吞掉 |
|
||||
| 性能 | ⭐⭐⭐ | 数据量小时无问题,规模化时 `count`/`contains`/同步抓取是瓶颈 |
|
||||
| 工程化 | ⭐⭐ | 缺测试、CI、lint、日志、类型检查 |
|
||||
|
||||
---
|
||||
|
||||
## 1. 安全问题(高优先级)
|
||||
|
||||
### 1.1 CORS 策略过于宽松
|
||||
- **位置**:`backend/main.py:39-45`
|
||||
- **风险等级**:🔴 高
|
||||
- **现状**:
|
||||
```python
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["*"],
|
||||
allow_credentials=True,
|
||||
...
|
||||
)
|
||||
```
|
||||
- **问题**:
|
||||
- `allow_origins=["*"]` 与 `allow_credentials=True` 同时启用本身已违反 CORS 规范(部分浏览器会拒绝);
|
||||
- 对外暴露的 `/api/v1/external/*` 形同裸奔;
|
||||
- 任意来源页面都能携带用户凭据调用 API。
|
||||
- **建议**:改为白名单(如 `["http://localhost:5173", "https://your-domain"]`),并去掉 credentials。
|
||||
|
||||
### 1.2 XSS 风险
|
||||
- **位置**:`frontend/src/views/ArticleDetail.vue:47-53`
|
||||
- **风险等级**:🟠 中
|
||||
- **现状**:
|
||||
```js
|
||||
const formatContent = (content) => {
|
||||
if (!content) return '<p style="color: #718096;">暂无内容</p>'
|
||||
return content.replace(/\n/g, '<br>').replace(/ /g, ' ')
|
||||
}
|
||||
```
|
||||
- **问题**:
|
||||
- `v-html` + 直接拼接原始内容;
|
||||
- 后端 `clean_html` 用 `BeautifulSoup.get_text()` 提取的纯文本**目前**不会注入 XSS,但**一旦后端切换到保留 HTML**(如想支持代码块、链接)就立即变成漏洞。
|
||||
- **建议**:
|
||||
- 显式 escape `<>&"'`,或
|
||||
- 用 `marked` + `DOMPurify` 走白名单富文本路径,并明确文档化"内容已 sanitize"。
|
||||
|
||||
### 1.3 缺少鉴权 / 限流
|
||||
- **位置**:所有 `/api` 与 `/api/v1/external` 端点
|
||||
- **风险等级**:🔴 高
|
||||
- **问题**:
|
||||
- 后端没有任何认证、限流、防滥用机制;
|
||||
- `routers/external_api.py` 明确说"供 AI/外部系统调用",却无 API Key / Token / 速率限制;
|
||||
- 任何能访问 8000 端口的客户端都能增删 RSS 源、删除文章、触发抓取;
|
||||
- `import-opml` 接受任意字符串并解析 XML,**存在 XXE 风险**(Python 3.x `xml.etree` 默认禁止外部实体,影响较小但需关注)。
|
||||
- **建议**:
|
||||
- 至少加一个 `X-API-Key` 中间件;
|
||||
- 或在 `external_api` 下使用独立的密钥前缀;
|
||||
- OPML 导入限制单文件大小、条目数量。
|
||||
|
||||
### 1.4 静态文件 SPA 兜底白名单不完整
|
||||
- **位置**:`backend/main.py:65-74`
|
||||
- **风险等级**:🟡 低
|
||||
- **现状**:
|
||||
```python
|
||||
@app.get("/{full_path:path}")
|
||||
async def serve_spa(full_path: str):
|
||||
if full_path.startswith("api/") or full_path.startswith("docs") or full_path.startswith("openapi.json"):
|
||||
return {"detail": "Not found"}
|
||||
...
|
||||
return FileResponse(index_path)
|
||||
```
|
||||
- **问题**:因为最终只返回 `index.html`,**没有目录穿越风险**;但白名单不完整(漏了 `redoc`、`/docs/oauth2-redirect`、OpenAPI 变体等)。
|
||||
- **建议**:显式 mount SPA 路由前缀,或用 `APIRouter` 把所有非 API 路由兜底。
|
||||
|
||||
---
|
||||
|
||||
## 2. 可靠性 / 健壮性(高优先级)
|
||||
|
||||
### 2.1 时区处理不一致
|
||||
- **位置**:`backend/rss_fetcher.py:106, 111`、`models.py:48`、`health_checker.py:25, 30, 95`、`external_api.py:24, 139`
|
||||
- **风险等级**:🟡 中
|
||||
- **现状**:
|
||||
```python
|
||||
# rss_fetcher.py
|
||||
published_at = datetime(*entry.published_parsed[:6], tzinfo=timezone.utc).replace(tzinfo=None)
|
||||
```
|
||||
- **问题**:
|
||||
- 拿到 UTC 时间后**直接剥掉时区信息**,再存入 `DateTime`(naive)字段;
|
||||
- 数据库里所有时间都是 UTC,但 `models.py:48` `health_status` 又用 `datetime.utcnow()` 比较;
|
||||
- **跨时区用户**看到"最后抓取时间"会有偏差;
|
||||
- 前端 `formatTime` 用 `toLocaleString` 会按浏览器时区渲染 → 前后端时间基准不一致会导致"未来时间"。
|
||||
- **建议**:DB 全部存 `datetime`(naive UTC),并在返回时显式带 `Z` 后缀或转换为本地时区;或改用 `datetime.now(timezone.utc)` 统一时区。
|
||||
|
||||
### 2.2 ArticleDetail ID 类型校验缺失
|
||||
- **位置**:`frontend/src/views/ArticleDetail.vue:56`
|
||||
- **风险等级**:🟡 中
|
||||
- **现状**:
|
||||
```js
|
||||
const id = parseInt(route.params.id)
|
||||
if (!id) return
|
||||
```
|
||||
- **问题**:若 `id` 是 `"3abc"`,`parseInt` 会得到 `3`,然后请求失败但 UI 静默。
|
||||
- **建议**:使用正则校验或 `Number.isInteger`。
|
||||
|
||||
### 2.3 全表级重复检测 + 唯一约束冲突
|
||||
- **位置**:`backend/rss_fetcher.py:229-241`
|
||||
- **风险等级**:🟠 中
|
||||
- **现状**:
|
||||
```python
|
||||
existing = db.query(Article).filter(Article.link == article_data["link"]).first()
|
||||
if existing:
|
||||
existing.title = article_data["title"] or existing.title
|
||||
...
|
||||
else:
|
||||
article = Article(**article_data)
|
||||
db.add(article)
|
||||
new_count += 1
|
||||
```
|
||||
- **问题**:
|
||||
- 每次抓取每篇文章都触发一次 `SELECT`,大数据量下性能差;
|
||||
- 一次抓取未提交 → 同一 feed 内出现 link 重复时会触发 `Article.link` unique 约束**异常**(外层 `try/except` 会回滚整个 feed 的入库),**该 feed 当次所有新文章都不会保存**。
|
||||
- **建议**:
|
||||
- 先在内存中 `set(article_data["link"] for ...)` 去重;
|
||||
- 或用 `bulk_save_objects` 配合 `INSERT ... ON CONFLICT DO NOTHING`(SQLite 支持)。
|
||||
|
||||
### 2.4 article_count 统计代价高
|
||||
- **位置**:`backend/rss_fetcher.py:248`
|
||||
- **风险等级**:🟡 中
|
||||
- **现状**:
|
||||
```python
|
||||
feed.article_count = db.query(Article).filter(Article.feed_id == feed_id).count()
|
||||
```
|
||||
- **问题**:每次成功抓取都 `COUNT(*)`,article 表大时不可接受。
|
||||
- **建议**:维护时使用 `feed.article_count += new_count`,或在 Article 上加 trigger 维护计数。
|
||||
|
||||
### 2.5 同步"添加后立即抓取"行为
|
||||
- **位置**:`backend/routers/feeds.py:127, 130`
|
||||
- **风险等级**:🟡 中
|
||||
- **现状**:
|
||||
```python
|
||||
add_feed_job(feed.id, feed.fetch_interval_minutes)
|
||||
fetch_and_store_feed(feed.id) # 同步阻塞 HTTP 请求
|
||||
```
|
||||
- **问题**:
|
||||
- 在线程池里抓取会阻塞请求线程数秒~数十秒;
|
||||
- 1000 个 RSS 源批量导入时(`import_opml`),每个都同步抓取 → 整个 HTTP 请求会**卡死几分钟**。
|
||||
- **建议**:把首次抓取改为后台任务(`BackgroundTasks` 或直接交给 scheduler 的 `next_run_time`)。
|
||||
|
||||
### 2.6 全局调度器状态与启动顺序耦合
|
||||
- **位置**:`backend/scheduler.py:10-15, 60-65`
|
||||
- **风险等级**:🟡 中
|
||||
- **现状**:
|
||||
```python
|
||||
_scheduler = None
|
||||
|
||||
def get_scheduler():
|
||||
global _scheduler
|
||||
if _scheduler is None:
|
||||
_scheduler = BackgroundScheduler()
|
||||
return _scheduler
|
||||
|
||||
def stop_scheduler():
|
||||
global _scheduler
|
||||
if _scheduler and _scheduler.running:
|
||||
_scheduler.shutdown(wait=False)
|
||||
_scheduler = None
|
||||
```
|
||||
- **问题**:
|
||||
- `BackgroundScheduler` 默认会先 start 再 add_job(实际是惰性的),但 FastAPI 进程与 `lifespan` 启动顺序耦合;
|
||||
- `add_feed_job` 写入与 `init_feed_jobs` 启动之间没有互斥;
|
||||
- `stop_scheduler` 把 `_scheduler = None`,但 APScheduler 的 executor/shutdown 可能尚未真正结束 → 下次启动会创建**第二个**实例。
|
||||
|
||||
### 2.7 FTS5 初始化异常静默吞掉
|
||||
- **位置**:`backend/database.py:52-54`
|
||||
- **风险等级**:🟠 中
|
||||
- **现状**:
|
||||
```python
|
||||
try:
|
||||
cursor.execute("SELECT sqlite_compileoption_used('ENABLE_FTS5')")
|
||||
has_fts5 = cursor.fetchone()[0]
|
||||
if not has_fts5:
|
||||
print("警告: ...")
|
||||
return
|
||||
except Exception:
|
||||
pass
|
||||
```
|
||||
- **问题**:
|
||||
- `pass` 静默吞掉异常,FTS5 检测失败时会**继续往下执行**(进 `CREATE VIRTUAL TABLE`),最终报错信息对用户非常不友好。
|
||||
- **建议**:区分 `OperationalError` vs `ProgrammingError`;失败时显式 `logger.error`。
|
||||
|
||||
### 2.8 FTS5 用户输入转义不完整
|
||||
- **位置**:`backend/fulltext_search.py:14`
|
||||
- **风险等级**:🟡 中
|
||||
- **现状**:
|
||||
```python
|
||||
query = query.replace('"', '""').strip()
|
||||
```
|
||||
- **问题**:仅转义双引号,但 FTS5 语法里 `*` `:` `(` `)` `OR` `AND` `NOT` 都有特殊含义:
|
||||
- 用户输入 `python AND java` 会被解释为布尔操作符,可能报错或返回意外结果。
|
||||
- **建议**:用 `fts5` 安全的 `query_quote` 或对短词加 `""` 包裹。
|
||||
|
||||
### 2.9 重复 / 延迟 import 掩盖问题
|
||||
- **位置**:`backend/routers/articles.py:133`
|
||||
- **风险等级**:🟠 中
|
||||
- **现状**:
|
||||
```python
|
||||
from fastapi import HTTPException # 文件末尾
|
||||
```
|
||||
- **问题**:
|
||||
- `articles.py:3` 只 import 了 `APIRouter, Depends`;
|
||||
- 文件中 `HTTPException` 在 line 90/115 用到,**靠底部 line 133 那个延迟 import 才工作**;
|
||||
- **bug 风险**:重构时一旦删掉 line 133 就 500。
|
||||
- **建议**:在文件顶部一次性 import。
|
||||
|
||||
---
|
||||
|
||||
## 3. 性能问题(中优先级)
|
||||
|
||||
### 3.1 三字段 LIKE 全表扫描
|
||||
- **位置**:`backend/routers/feeds.py:69`
|
||||
- **风险等级**:🟡 中
|
||||
- **现状**:
|
||||
```python
|
||||
query = query.filter(
|
||||
Feed.title.contains(search) | Feed.url.contains(search) | Feed.description.contains(search)
|
||||
)
|
||||
```
|
||||
- **问题**:三字段 `OR` + 前导通配符,全表扫描;数据量大时是瓶颈。
|
||||
- **建议**:用 SQLite `FTS5`(已有 `articles_fts`,扩展一个 `feeds_fts`)。
|
||||
|
||||
### 3.2 created_at 排序无索引
|
||||
- **位置**:`backend/routers/feeds.py:73`
|
||||
- **风险等级**:🟡 中
|
||||
- **问题**:`Feed` 上没在 `created_at` 建索引 → 分页排序会随数据量变慢。
|
||||
|
||||
### 3.3 recent_activity join + 排序未优化
|
||||
- **位置**:`backend/routers/dashboard.py:40-42`
|
||||
- **风险等级**:🟡 中
|
||||
- **现状**:
|
||||
```python
|
||||
logs = db.query(FetchLog, Feed.title.label("feed_title")).join(Feed).order_by(
|
||||
desc(FetchLog.created_at)
|
||||
).limit(limit).all()
|
||||
```
|
||||
- **建议**:复合索引 `(feed_id, created_at DESC)`。同时 `get_overall_stats` 会 `feeds.all()` 拉全表 → feed 数万时内存里逐个调 `health_status()` 很慢。
|
||||
|
||||
### 3.4 discover_feed_url 顺序 HEAD 请求
|
||||
- **位置**:`backend/rss_fetcher.py:60-89`
|
||||
- **风险等级**:🟡 中
|
||||
- **问题**:对每个常见 path 都做一次 `requests.head`,没并发。**当用户填入的 URL 响应慢时整体阻塞**。
|
||||
|
||||
### 3.5 前端全文搜索不分页
|
||||
- **位置**:`frontend/src/views/Articles.vue:127-129`
|
||||
- **风险等级**:🟡 中
|
||||
- **现状**:
|
||||
```js
|
||||
if (searchQuery.value && searchQuery.value.trim()) {
|
||||
const res = await articlesApi.search(searchQuery.value.trim())
|
||||
articles.value = res.items || []
|
||||
}
|
||||
```
|
||||
- **问题**:全文搜索 API 本来支持 `skip/limit`,但前端未传 → 大量结果时无分页。
|
||||
|
||||
### 3.6 整体统计拉全表再 sum
|
||||
- **位置**:`backend/health_checker.py:79-80`
|
||||
- **风险等级**:🟡 中
|
||||
- **现状**:
|
||||
```python
|
||||
total_articles = db.query(Feed).with_entities(Feed.article_count).all()
|
||||
total_articles_count = sum(a[0] for a in total_articles) if total_articles else 0
|
||||
```
|
||||
- **建议**:直接 `db.query(func.sum(Feed.article_count)).scalar()` 即可。
|
||||
|
||||
### 3.7 前端图标重复注册
|
||||
- **位置**:`frontend/src/main.js:29-31` + 各 view 的 import
|
||||
- **风险等级**:🟢 低
|
||||
- **现状**:
|
||||
```js
|
||||
for (const [key, component] of Object.entries(ElementPlusIconsVue)) {
|
||||
app.component(key, component)
|
||||
}
|
||||
```
|
||||
- **问题**:在 `main.js` 中注册了**所有** Element Plus 图标作为全局组件;又各 view 里 `import { Plus, Upload, ... }` 按需引入 → **重复且浪费 bundle 大小**。
|
||||
- **建议**:要么按需全局注册用到的几个,要么去 view 里的按需 import。
|
||||
|
||||
---
|
||||
|
||||
## 4. 功能性 Bug
|
||||
|
||||
### 4.1 OPML 导入 API body/query 不一致
|
||||
- **位置**:`frontend/src/api/index.js:36` vs `backend/routers/feeds.py:221`
|
||||
- **风险等级**:🔴 高
|
||||
- **现状**:
|
||||
```js
|
||||
// 前端
|
||||
importOpml: (content) => api.post('/api/feeds/import-opml', { opml_content: content })
|
||||
```
|
||||
```python
|
||||
# 后端
|
||||
def import_opml(opml_content: str, db: Session = Depends(get_db)):
|
||||
```
|
||||
- **问题**:
|
||||
- 后端期望 `opml_content` 作为 **query 参数**(因为只有 `db` 一个 Depends);
|
||||
- 前端却通过 **body** 传 → **导入功能实际不可用**。
|
||||
- 当前未被发现可能因为没人实际点过这个按钮。
|
||||
- **修复方案**:在后端加 Pydantic model `class OpmlImport(BaseModel): opml_content: str`,前端保持 body 传。
|
||||
|
||||
### 4.2 OPML 导出未 escape URL
|
||||
- **位置**:`backend/routers/feeds.py:262-272`
|
||||
- **风险等级**:🟡 中
|
||||
- **现状**:
|
||||
```python
|
||||
lines.append(f' <outline type="rss" text="{title}" xmlUrl="{feed.url}" />')
|
||||
```
|
||||
- **问题**:`title` 做了 `"` 转义,但 `feed.url` 没做 → URL 含 `&` 时会破坏 XML。
|
||||
- **建议**:用 `xml.etree.ElementTree` 或 `xml.sax.saxutils.escape`。
|
||||
|
||||
### 4.3 中文摘要截断不准确
|
||||
- **位置**:`backend/rss_fetcher.py:177`
|
||||
- **风险等级**:🟢 低
|
||||
- **现状**:
|
||||
```python
|
||||
last_period = max(truncated.rfind("。"), truncated.rfind(". "), truncated.rfind("! "), truncated.rfind("? "))
|
||||
```
|
||||
- **问题**:中文使用 `。` 但**无空格**,后三个都是英文符号 → 中文文本几乎走 fallback `+ "..."`。
|
||||
- **建议**:增加中文标点 `?`、`!` 及 `;` 的匹配。
|
||||
|
||||
### 4.4 未使用 import
|
||||
- **位置**:`backend/rss_fetcher.py:5`
|
||||
- **风险等级**:🟢 低
|
||||
- **现状**:
|
||||
```python
|
||||
import hashlib
|
||||
```
|
||||
- **问题**:代码中未使用。
|
||||
|
||||
---
|
||||
|
||||
## 5. 代码质量 / 可维护性
|
||||
|
||||
### 5.1 重复 import
|
||||
- **位置**:`backend/routers/dashboard.py:37, 96` 等
|
||||
- **风险等级**:🟢 低
|
||||
- **问题**:函数体内局部 import `from models import FetchLog`,本可顶部一次性 import。
|
||||
|
||||
### 5.2 health_status() 耦合时间源
|
||||
- **位置**:`backend/models.py:34-55`
|
||||
- **风险等级**:🟢 低
|
||||
- **问题**:业务逻辑与 `datetime.utcnow()` 耦合 → 不易测试。
|
||||
- **建议**:应接受 `now: datetime` 参数或单独抽出函数。
|
||||
|
||||
### 5.3 重复字段映射
|
||||
- **位置**:`backend/routers/feeds.py:51-95, 142-165`
|
||||
- **风险等级**:🟢 低
|
||||
- **问题**:都有大量手动 dict 转换 → 改字段时极易漏改。
|
||||
- **建议**:定义 `FeedOut` Pydantic 模型并直接 `from_attributes`。
|
||||
|
||||
### 5.4 缺少 TypeScript
|
||||
- **位置**:前端
|
||||
- **风险等级**:🟡 中
|
||||
- **问题**:`.vue` + `js`,字段拼写错误(如 `articles_count` 写成 `articlesCount`)不会在编译期暴露。
|
||||
- **建议**:至少加 `vetur`/Volar + JSDoc 注解。
|
||||
|
||||
### 5.5 OpenAPI 文档丰富度
|
||||
- **位置**:`backend/routers/external_api.py`
|
||||
- **风险等级**:🟢 低
|
||||
- **问题**:是给 AI 用的,但完全没有 `response_model` → OpenAPI schema 弱化(虽然 description 写了用途,但没 sample)。
|
||||
- **建议**:加 `response_model=ExternalRecent` 等显式 schema,**让 AI 消费端更易集成**。
|
||||
|
||||
---
|
||||
|
||||
## 6. 配置 / 部署问题
|
||||
|
||||
### 6.1 Docker 多阶段 COPY 覆盖
|
||||
- **位置**:`Dockerfile:5-8`
|
||||
- **风险等级**:🟡 中
|
||||
- **现状**:
|
||||
```dockerfile
|
||||
WORKDIR /app/frontend
|
||||
COPY frontend/package.json frontend/package-lock.json* ./
|
||||
RUN npm install
|
||||
COPY frontend/ .
|
||||
```
|
||||
- **问题**:
|
||||
- 第二次 `COPY frontend/ .` 会**清空** `WORKDIR`,npm 已生成的缓存被丢;
|
||||
- `frontend/package-lock.json*`(带星号)若文件不存在,Docker 18+ 行为可能报错。
|
||||
- **建议**:
|
||||
```dockerfile
|
||||
RUN --mount=type=cache,target=/app/frontend/node_modules npm ci
|
||||
```
|
||||
|
||||
### 6.2 运行时保留构建依赖
|
||||
- **位置**:`Dockerfile:14-19`
|
||||
- **风险等级**:🟡 中
|
||||
- **问题**:`gcc/libxml2-dev/libxslt1-dev` 仅在 `pip install` 时需要,运行时不需要,镜像变大。
|
||||
- **建议**:用 BuildKit 多阶段把 builder 拆出来,最终镜像只 `pip install` wheel 包。
|
||||
|
||||
### 6.3 Python 基础镜像未锁版本
|
||||
- **位置**:`Dockerfile:11`
|
||||
- **风险等级**:🟡 中
|
||||
- **问题**:`python:3.12-slim` 会拉 latest,未来构建可能产生不同结果。
|
||||
- **建议**:锁版本(`python:3.12.4-slim`)。
|
||||
|
||||
### 6.4 healthcheck 用了 curl,但镜像没装
|
||||
- **位置**:`docker-compose.yml:24-27` + `Dockerfile`
|
||||
- **风险等级**:🟠 中
|
||||
- **现状**:
|
||||
```yaml
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8000/api/health"]
|
||||
```
|
||||
- **问题**:`python:3.12-slim` 镜像默认**没有** `curl`;healthcheck 会一直返回非 0 → 容器会被反复标记 unhealthy。
|
||||
- **建议**:
|
||||
```yaml
|
||||
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/api/health')"]
|
||||
```
|
||||
|
||||
### 6.5 数据库 URL 拼接双斜杠
|
||||
- **位置**:`docker-compose.yml:14-15` + `backend/database.py:8`
|
||||
- **风险等级**:🟢 低
|
||||
- **问题**:
|
||||
```yaml
|
||||
- DATABASE_URL=/app/data/rsskeeper.db
|
||||
```
|
||||
```python
|
||||
engine = create_engine(
|
||||
f"sqlite:///{DATABASE_URL}",
|
||||
...
|
||||
)
|
||||
```
|
||||
- 实际拼成 `sqlite:////app/data/rsskeeper.db`(4 个 `/`)—— SQLite 接受但易读性差。
|
||||
|
||||
### 6.6 缺日志配置
|
||||
- **位置**:后端全局
|
||||
- **风险等级**:🟡 中
|
||||
- **问题**:后端只靠 `print` 输出一条 FTS5 警告。生产环境应配置 `logging`(file/stdout structured JSON);APScheduler 默认会打 INFO 日志到 stderr。
|
||||
|
||||
### 6.7 requirements 缺版本上界
|
||||
- **位置**:`backend/requirements.txt`
|
||||
- **风险等级**:🟡 中
|
||||
- **问题**:7 个包全部是 `>=`;一旦上游做不兼容变更,构建会失败且不易复现。
|
||||
- **建议**:锁 `==` 或加 `~=`。
|
||||
|
||||
### 6.8 docker-compose 环境变量硬编码
|
||||
- **位置**:`docker-compose.yml:14-22`
|
||||
- **风险等级**:🟢 低
|
||||
- **建议**:将可调参数抽到 `.env` 文件。
|
||||
|
||||
---
|
||||
|
||||
## 7. 测试 / 工程化
|
||||
|
||||
### 7.1 无任何测试
|
||||
- **位置**:项目根目录
|
||||
- **风险等级**:🟠 中
|
||||
- **问题**:没有 `tests/` 目录、`pytest`/`unittest` 都没有。
|
||||
- **建议**:
|
||||
- `parse_article`、`generate_summary`、`clean_html` 单元测试;
|
||||
- `feeds.py` 的路由级集成测试(用 `httpx.AsyncClient`)。
|
||||
|
||||
### 7.2 无 CI / lint 配置
|
||||
- **位置**:项目根目录
|
||||
- **风险等级**:🟡 中
|
||||
- **问题**:没有 `.github/workflows`、ESLint、Prettier、black、ruff、mypy 配置。
|
||||
- **建议**:至少加 `ruff check` + `black --check` 到 PR 流程。
|
||||
|
||||
---
|
||||
|
||||
## 8. 优先修复 Top 10
|
||||
|
||||
| # | 位置 | 严重度 | 一句话 |
|
||||
|---|------|--------|--------|
|
||||
| 1 | `backend/main.py:39-45` | 🔴 高 | 收紧 CORS |
|
||||
| 2 | `frontend/src/api/index.js:36` ↔ `backend/routers/feeds.py:221` | 🔴 高 | OPML 导入 body/query 不一致,**功能不可用** |
|
||||
| 3 | `backend/routers/external_api.py` 全部 | 🔴 高 | 外部 API 缺鉴权 / 限流 |
|
||||
| 4 | `backend/database.py:41-89` | 🟠 中 | FTS5 初始化异常静默吞掉 |
|
||||
| 5 | `backend/rss_fetcher.py:229-241` | 🟠 中 | 入库 unique 冲突会回滚整批 |
|
||||
| 6 | `backend/routers/articles.py:133` | 🟠 中 | 底部 import 掩盖顶部未导入 |
|
||||
| 7 | `docker-compose.yml:24-27` | 🟠 中 | healthcheck `curl` 镜像里没有 |
|
||||
| 8 | `backend/routers/feeds.py:130` | 🟡 中 | 添加源同步抓取会阻塞 HTTP |
|
||||
| 9 | `backend/rss_fetcher.py:106` | 🟡 中 | 时区处理导致显示偏差 |
|
||||
| 10 | `backend/routers/feeds.py:262-272` | 🟡 中 | OPML 导出未 escape URL |
|
||||
|
||||
---
|
||||
|
||||
## 9. 总结
|
||||
|
||||
`rssKeeper` 是一个定位明确、规模适中的自用型 RSS 管理系统。从代码风格、模块拆分来看,作者具备良好的工程素养;但要在生产环境长期运行,建议优先解决以下三类问题:
|
||||
|
||||
1. **安全基线**:CORS / 鉴权 / XSS —— 任何对外暴露的服务都必须先补齐;
|
||||
2. **正确性 Bug**:OPML 导入功能当前不可用(`/feeds/import-opml`),属于必须立即修复的 P0;
|
||||
3. **部署可靠性**:Docker healthcheck 失效会导致容器反复重启,看似无关紧要但容易掩盖真实故障。
|
||||
|
||||
后续若要扩展功能(多用户、订阅推送、标签、阅读列表等),建议先把"测试 + CI + 日志"这套工程基座补齐,再做功能叠加,避免技术债快速累积。
|
||||
Reference in New Issue
Block a user