feat: add firecrawl and vane applications, fix lxserver form config

首先新增Firecrawl和Vane两款应用,包含完整的应用配置、docker-compose编排、说明文档以及logo资源;同时修复了lxserver时区配置项中多余的rule参数。
This commit is contained in:
arch3rPro
2026-05-17 17:52:54 +08:00
parent 87bc4e7f86
commit e98811cd04
28 changed files with 705 additions and 2 deletions
+65
View File
@@ -0,0 +1,65 @@
# Firecrawl
将任意网站转换为适合大语言模型(LLM)的结构化数据。强大的网页抓取、爬取、搜索和数据提取平台。
## 功能特点
- **单页抓取**:将任意 URL 转换为 Markdown、HTML、截图或结构化 JSON
- **全站爬取**:递归抓取整个网站,智能过滤链接
- **URL 发现**:通过站点地图、索引查询或搜索快速发现网站所有 URL
- **网络搜索**:搜索网络并一次性获取结果的完整页面内容
- **AI 提取**:基于 LLM 的结构化数据提取,支持 Schema 验证
- **智能代理**:自主研究代理,自动导航并提取数据
- **远程浏览器**:支持远程浏览器会话,提供 CDP 访问和代码执行能力
- **批量操作**:异步批量抓取多个 URL
- **自托管支持**:完全开源,支持本地部署,数据掌握在自己手中
## 使用说明
### 默认端口
- API服务: 3002
- 队列管理界面: http://your-ip:3002/admin/YOUR_BULL_AUTH_KEY/queues
### API 访问
部署后可以通过 `http://your-ip:3002` 访问 API 服务。
测试爬取端点:
```bash
curl -X POST http://localhost:3002/v1/crawl \
-H 'Content-Type: application/json' \
-d '{
"url": "https://firecrawl.dev"
}'
```
### 数据目录
应用数据存储在以下目录:
- `./data/api` - API 服务数据
- `./data/postgres` - PostgreSQL 数据库数据
- `./data/redis` - Redis 缓存数据
- `./data/playwright` - Playwright 浏览器缓存
### 环境变量
- `POSTGRES_USER` / `POSTGRES_PASSWORD`PostgreSQL 数据库凭据
- `BULL_AUTH_KEY`:队列管理界面的访问密钥
- `OPENAI_API_KEY`OpenAI API 密钥(用于 AI 相关功能,可选)
### 架构说明
Firecrawl 自托管版本包含以下服务组件:
- **API 服务**:主 API 服务器,处理所有请求(4核CPU,8GB内存限制)
- **Playwright 服务**:浏览器自动化服务(2核CPU,4GB内存限制)
- **Redis**:任务队列和缓存后端
- **RabbitMQ**NuQ 消息代理
- **PostgreSQL**:任务状态管理数据库
## 相关链接
- 官方网站: https://www.firecrawl.dev
- GitHub: https://github.com/firecrawl/firecrawl
- 文档: https://docs.firecrawl.dev
- Discord社区: https://discord.gg/firecrawl
+65
View File
@@ -0,0 +1,65 @@
# Firecrawl
Turn any website into LLM-ready structured data. A powerful web scraping, crawling, search and data extraction platform.
## Features
- **Single Page Scraping**: Convert any URL to Markdown, HTML, screenshots, or structured JSON
- **Multi-Page Crawling**: Recursively scrape entire websites with intelligent link filtering
- **URL Discovery**: Discover all URLs on a website instantly via sitemaps, index queries, or search
- **Web Search**: Search the web and get full page content from results in a single call
- **AI Extraction**: LLM-powered structured data extraction with schema validation
- **Autonomous Agent**: AI research agent that automatically navigates and extracts data
- **Remote Browser**: Remote browser sessions with CDP access and code execution
- **Batch Operations**: Asynchronous bulk scraping of multiple URLs
- **Self-Hosted**: Fully open source, supports local deployment with complete data control
## Usage
### Default Port
- API Service: 3002
- Queue Admin UI: http://your-ip:3002/admin/YOUR_BULL_AUTH_KEY/queues
### API Access
After deployment, access the API at `http://your-ip:3002`.
Test the crawl endpoint:
```bash
curl -X POST http://localhost:3002/v1/crawl \
-H 'Content-Type: application/json' \
-d '{
"url": "https://firecrawl.dev"
}'
```
### Data Directories
Application data is stored in the following directories:
- `./data/api` - API service data
- `./data/postgres` - PostgreSQL database data
- `./data/redis` - Redis cache data
- `./data/playwright` - Playwright browser cache
### Environment Variables
- `POSTGRES_USER` / `POSTGRES_PASSWORD`: PostgreSQL database credentials
- `BULL_AUTH_KEY`: Access key for the queue admin UI
- `OPENAI_API_KEY`: OpenAI API key for AI-powered features (optional)
### Architecture
The self-hosted version includes the following service components:
- **API Service**: Main API server handling all requests (4 CPU cores, 8GB RAM limit)
- **Playwright Service**: Browser automation service (2 CPU cores, 4GB RAM limit)
- **Redis**: Job queue and cache backend
- **RabbitMQ**: NuQ message broker
- **PostgreSQL**: Job state management database
## Links
- Website: https://www.firecrawl.dev
- GitHub: https://github.com/firecrawl/firecrawl
- Documentation: https://docs.firecrawl.dev
- Discord: https://discord.gg/firecrawl
+25
View File
@@ -0,0 +1,25 @@
name: Firecrawl
tags:
- 开发工具
- AI
- 爬虫
title: 将任意网站转换为适合大语言模型的结构化数据
description:
en: Turn any website into LLM-ready structured data. Scrape, crawl, search and extract clean markdown, structured JSON or screenshots from websites
zh: 将任意网站转换为适合大语言模型的结构化数据。支持抓取、爬取、搜索和提取干净的 Markdown、结构化 JSON 或截图
additionalProperties:
key: firecrawl
name: Firecrawl
tags:
- DevTool
- AI
- Crawler
shortDescZh: 将任意网站转换为适合大语言模型的结构化数据
shortDescEn: Turn any website into LLM-ready structured data
type: website
crossVersionUpdate: true
limit: 0
recommend: 0
website: https://www.firecrawl.dev
github: https://github.com/firecrawl/firecrawl
document: https://docs.firecrawl.dev
+47
View File
@@ -0,0 +1,47 @@
additionalProperties:
formFields:
- default: "3002"
envKey: PANEL_APP_PORT_HTTP
label:
en: API Port
zh: API端口
required: true
type: number
edit: true
rule: paramPort
- default: "CHANGEME"
envKey: BULL_AUTH_KEY
label:
en: Bull Queue Admin Key
zh: 队列管理密钥
required: true
type: text
edit: true
rule: paramCommon
- default: "firecrawl"
envKey: POSTGRES_USER
label:
en: PostgreSQL Username
zh: 数据库用户名
required: true
type: text
edit: true
rule: paramCommon
- default: ""
envKey: POSTGRES_PASSWORD
label:
en: PostgreSQL Password
zh: 数据库密码
required: true
type: password
edit: true
rule: paramCommon
- default: ""
envKey: OPENAI_API_KEY
label:
en: OpenAI API Key (Optional)
zh: OpenAI API密钥(可选)
required: false
type: text
edit: true
rule: ""
View File
+121
View File
@@ -0,0 +1,121 @@
services:
firecrawl-api:
image: ghcr.io/firecrawl/firecrawl:latest
container_name: ${CONTAINER_NAME}
restart: always
environment:
- HOST=0.0.0.0
- PORT=${PANEL_APP_PORT_HTTP}
- REDIS_URL=redis://firecrawl-redis:6379
- REDIS_RATE_LIMIT_URL=redis://firecrawl-redis:6379
- PLAYWRIGHT_MICROSERVICE_URL=http://firecrawl-playwright:3000/scrape
- POSTGRES_USER=${POSTGRES_USER:-firecrawl}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-firecrawl}
- POSTGRES_DB=firecrawl
- POSTGRES_HOST=firecrawl-postgres
- POSTGRES_PORT=5432
- USE_DB_AUTHENTICATION=false
- NUM_WORKERS_PER_QUEUE=8
- CRAWL_CONCURRENT_REQUESTS=10
- MAX_CONCURRENT_JOBS=5
- BROWSER_POOL_SIZE=5
- OPENAI_API_KEY=${OPENAI_API_KEY:-}
- BULL_AUTH_KEY=${BULL_AUTH_KEY:-CHANGEME}
- NUQ_RABBITMQ_URL=amqp://firecrawl-rabbitmq:5672
- ENV=local
- EXTRACT_WORKER_PORT=3004
- WORKER_PORT=3005
- HARNESS_STARTUP_TIMEOUT_MS=60000
- TZ=Asia/Shanghai
depends_on:
firecrawl-redis:
condition: service_started
firecrawl-playwright:
condition: service_started
firecrawl-rabbitmq:
condition: service_healthy
firecrawl-postgres:
condition: service_started
ports:
- "${PANEL_APP_PORT_HTTP}:3002"
command: node dist/src/harness.js --start-docker
ulimits:
nofile:
soft: 65535
hard: 65535
volumes:
- ./data/api:/app/data
networks:
- 1panel-network
labels:
createdBy: "Apps"
cpus: 4.0
mem_limit: 8G
memswap_limit: 8G
firecrawl-playwright:
image: ghcr.io/firecrawl/playwright-service:latest
container_name: ${CONTAINER_NAME}-playwright
restart: always
environment:
- PORT=3000
- PROXY_SERVER=
- PROXY_USERNAME=
- PROXY_PASSWORD=
- ALLOW_LOCAL_WEBHOOKS=false
- BLOCK_MEDIA=false
- MAX_CONCURRENT_PAGES=10
- TZ=Asia/Shanghai
volumes:
- ./data/playwright:/tmp/.cache
networks:
- 1panel-network
tmpfs:
- /tmp/.cache:noexec,nosuid,size=1g
labels:
createdBy: "Apps"
cpus: 2.0
mem_limit: 4G
memswap_limit: 4G
firecrawl-redis:
image: redis:alpine
container_name: ${CONTAINER_NAME}-redis
restart: always
command: redis-server --bind 0.0.0.0
networks:
- 1panel-network
volumes:
- ./data/redis:/data
firecrawl-rabbitmq:
image: rabbitmq:3-management
container_name: ${CONTAINER_NAME}-rabbitmq
restart: always
command: rabbitmq-server
healthcheck:
test: ["CMD", "rabbitmq-diagnostics", "-q", "check_running"]
interval: 5s
timeout: 5s
retries: 3
start_period: 5s
networks:
- 1panel-network
firecrawl-postgres:
image: postgres:16-alpine
container_name: ${CONTAINER_NAME}-postgres
restart: always
environment:
- POSTGRES_USER=${POSTGRES_USER:-firecrawl}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-firecrawl}
- POSTGRES_DB=firecrawl
- TZ=Asia/Shanghai
networks:
- 1panel-network
volumes:
- ./data/postgres:/var/lib/postgresql/data
networks:
1panel-network:
external: true
Binary file not shown.

After

Width:  |  Height:  |  Size: 8.7 KiB

+47
View File
@@ -0,0 +1,47 @@
additionalProperties:
formFields:
- default: "3002"
envKey: PANEL_APP_PORT_HTTP
label:
en: API Port
zh: API端口
required: true
type: number
edit: true
rule: paramPort
- default: "CHANGEME"
envKey: BULL_AUTH_KEY
label:
en: Bull Queue Admin Key
zh: 队列管理密钥
required: true
type: text
edit: true
rule: paramCommon
- default: "firecrawl"
envKey: POSTGRES_USER
label:
en: PostgreSQL Username
zh: 数据库用户名
required: true
type: text
edit: true
rule: paramCommon
- default: ""
envKey: POSTGRES_PASSWORD
label:
en: PostgreSQL Password
zh: 数据库密码
required: true
type: password
edit: true
rule: paramCommon
- default: ""
envKey: OPENAI_API_KEY
label:
en: OpenAI API Key (Optional)
zh: OpenAI API密钥(可选)
required: false
type: text
edit: true
rule: ""
+121
View File
@@ -0,0 +1,121 @@
services:
firecrawl-api:
image: ghcr.io/firecrawl/firecrawl:v2.10.0
container_name: ${CONTAINER_NAME}
restart: always
environment:
- HOST=0.0.0.0
- PORT=${PANEL_APP_PORT_HTTP}
- REDIS_URL=redis://firecrawl-redis:6379
- REDIS_RATE_LIMIT_URL=redis://firecrawl-redis:6379
- PLAYWRIGHT_MICROSERVICE_URL=http://firecrawl-playwright:3000/scrape
- POSTGRES_USER=${POSTGRES_USER:-firecrawl}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-firecrawl}
- POSTGRES_DB=firecrawl
- POSTGRES_HOST=firecrawl-postgres
- POSTGRES_PORT=5432
- USE_DB_AUTHENTICATION=false
- NUM_WORKERS_PER_QUEUE=8
- CRAWL_CONCURRENT_REQUESTS=10
- MAX_CONCURRENT_JOBS=5
- BROWSER_POOL_SIZE=5
- OPENAI_API_KEY=${OPENAI_API_KEY:-}
- BULL_AUTH_KEY=${BULL_AUTH_KEY:-CHANGEME}
- NUQ_RABBITMQ_URL=amqp://firecrawl-rabbitmq:5672
- ENV=local
- EXTRACT_WORKER_PORT=3004
- WORKER_PORT=3005
- HARNESS_STARTUP_TIMEOUT_MS=60000
- TZ=Asia/Shanghai
depends_on:
firecrawl-redis:
condition: service_started
firecrawl-playwright:
condition: service_started
firecrawl-rabbitmq:
condition: service_healthy
firecrawl-postgres:
condition: service_started
ports:
- "${PANEL_APP_PORT_HTTP}:3002"
command: node dist/src/harness.js --start-docker
ulimits:
nofile:
soft: 65535
hard: 65535
volumes:
- ./data/api:/app/data
networks:
- 1panel-network
labels:
createdBy: "Apps"
cpus: 4.0
mem_limit: 8G
memswap_limit: 8G
firecrawl-playwright:
image: ghcr.io/firecrawl/playwright-service:v2.10.0
container_name: ${CONTAINER_NAME}-playwright
restart: always
environment:
- PORT=3000
- PROXY_SERVER=
- PROXY_USERNAME=
- PROXY_PASSWORD=
- ALLOW_LOCAL_WEBHOOKS=false
- BLOCK_MEDIA=false
- MAX_CONCURRENT_PAGES=10
- TZ=Asia/Shanghai
volumes:
- ./data/playwright:/tmp/.cache
networks:
- 1panel-network
tmpfs:
- /tmp/.cache:noexec,nosuid,size=1g
labels:
createdBy: "Apps"
cpus: 2.0
mem_limit: 4G
memswap_limit: 4G
firecrawl-redis:
image: redis:alpine
container_name: ${CONTAINER_NAME}-redis
restart: always
command: redis-server --bind 0.0.0.0
networks:
- 1panel-network
volumes:
- ./data/redis:/data
firecrawl-rabbitmq:
image: rabbitmq:3-management
container_name: ${CONTAINER_NAME}-rabbitmq
restart: always
command: rabbitmq-server
healthcheck:
test: ["CMD", "rabbitmq-diagnostics", "-q", "check_running"]
interval: 5s
timeout: 5s
retries: 3
start_period: 5s
networks:
- 1panel-network
firecrawl-postgres:
image: postgres:16-alpine
container_name: ${CONTAINER_NAME}-postgres
restart: always
environment:
- POSTGRES_USER=${POSTGRES_USER:-firecrawl}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-firecrawl}
- POSTGRES_DB=firecrawl
- TZ=Asia/Shanghai
networks:
- 1panel-network
volumes:
- ./data/postgres:/var/lib/postgresql/data
networks:
1panel-network:
external: true