mirror of
https://github.com/arch3rPro/1Panel-Appstore.git
synced 2026-06-10 16:39:39 +08:00
feat: add firecrawl and vane applications, fix lxserver form config
首先新增Firecrawl和Vane两款应用,包含完整的应用配置、docker-compose编排、说明文档以及logo资源;同时修复了lxserver时区配置项中多余的rule参数。
This commit is contained in:
@@ -0,0 +1,65 @@
|
||||
# Firecrawl
|
||||
|
||||
将任意网站转换为适合大语言模型(LLM)的结构化数据。强大的网页抓取、爬取、搜索和数据提取平台。
|
||||
|
||||
## 功能特点
|
||||
|
||||
- **单页抓取**:将任意 URL 转换为 Markdown、HTML、截图或结构化 JSON
|
||||
- **全站爬取**:递归抓取整个网站,智能过滤链接
|
||||
- **URL 发现**:通过站点地图、索引查询或搜索快速发现网站所有 URL
|
||||
- **网络搜索**:搜索网络并一次性获取结果的完整页面内容
|
||||
- **AI 提取**:基于 LLM 的结构化数据提取,支持 Schema 验证
|
||||
- **智能代理**:自主研究代理,自动导航并提取数据
|
||||
- **远程浏览器**:支持远程浏览器会话,提供 CDP 访问和代码执行能力
|
||||
- **批量操作**:异步批量抓取多个 URL
|
||||
- **自托管支持**:完全开源,支持本地部署,数据掌握在自己手中
|
||||
|
||||
## 使用说明
|
||||
|
||||
### 默认端口
|
||||
|
||||
- API服务: 3002
|
||||
- 队列管理界面: http://your-ip:3002/admin/YOUR_BULL_AUTH_KEY/queues
|
||||
|
||||
### API 访问
|
||||
|
||||
部署后可以通过 `http://your-ip:3002` 访问 API 服务。
|
||||
|
||||
测试爬取端点:
|
||||
```bash
|
||||
curl -X POST http://localhost:3002/v1/crawl \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"url": "https://firecrawl.dev"
|
||||
}'
|
||||
```
|
||||
|
||||
### 数据目录
|
||||
|
||||
应用数据存储在以下目录:
|
||||
- `./data/api` - API 服务数据
|
||||
- `./data/postgres` - PostgreSQL 数据库数据
|
||||
- `./data/redis` - Redis 缓存数据
|
||||
- `./data/playwright` - Playwright 浏览器缓存
|
||||
|
||||
### 环境变量
|
||||
|
||||
- `POSTGRES_USER` / `POSTGRES_PASSWORD`:PostgreSQL 数据库凭据
|
||||
- `BULL_AUTH_KEY`:队列管理界面的访问密钥
|
||||
- `OPENAI_API_KEY`:OpenAI API 密钥(用于 AI 相关功能,可选)
|
||||
|
||||
### 架构说明
|
||||
|
||||
Firecrawl 自托管版本包含以下服务组件:
|
||||
- **API 服务**:主 API 服务器,处理所有请求(4核CPU,8GB内存限制)
|
||||
- **Playwright 服务**:浏览器自动化服务(2核CPU,4GB内存限制)
|
||||
- **Redis**:任务队列和缓存后端
|
||||
- **RabbitMQ**:NuQ 消息代理
|
||||
- **PostgreSQL**:任务状态管理数据库
|
||||
|
||||
## 相关链接
|
||||
|
||||
- 官方网站: https://www.firecrawl.dev
|
||||
- GitHub: https://github.com/firecrawl/firecrawl
|
||||
- 文档: https://docs.firecrawl.dev
|
||||
- Discord社区: https://discord.gg/firecrawl
|
||||
@@ -0,0 +1,65 @@
|
||||
# Firecrawl
|
||||
|
||||
Turn any website into LLM-ready structured data. A powerful web scraping, crawling, search and data extraction platform.
|
||||
|
||||
## Features
|
||||
|
||||
- **Single Page Scraping**: Convert any URL to Markdown, HTML, screenshots, or structured JSON
|
||||
- **Multi-Page Crawling**: Recursively scrape entire websites with intelligent link filtering
|
||||
- **URL Discovery**: Discover all URLs on a website instantly via sitemaps, index queries, or search
|
||||
- **Web Search**: Search the web and get full page content from results in a single call
|
||||
- **AI Extraction**: LLM-powered structured data extraction with schema validation
|
||||
- **Autonomous Agent**: AI research agent that automatically navigates and extracts data
|
||||
- **Remote Browser**: Remote browser sessions with CDP access and code execution
|
||||
- **Batch Operations**: Asynchronous bulk scraping of multiple URLs
|
||||
- **Self-Hosted**: Fully open source, supports local deployment with complete data control
|
||||
|
||||
## Usage
|
||||
|
||||
### Default Port
|
||||
|
||||
- API Service: 3002
|
||||
- Queue Admin UI: http://your-ip:3002/admin/YOUR_BULL_AUTH_KEY/queues
|
||||
|
||||
### API Access
|
||||
|
||||
After deployment, access the API at `http://your-ip:3002`.
|
||||
|
||||
Test the crawl endpoint:
|
||||
```bash
|
||||
curl -X POST http://localhost:3002/v1/crawl \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"url": "https://firecrawl.dev"
|
||||
}'
|
||||
```
|
||||
|
||||
### Data Directories
|
||||
|
||||
Application data is stored in the following directories:
|
||||
- `./data/api` - API service data
|
||||
- `./data/postgres` - PostgreSQL database data
|
||||
- `./data/redis` - Redis cache data
|
||||
- `./data/playwright` - Playwright browser cache
|
||||
|
||||
### Environment Variables
|
||||
|
||||
- `POSTGRES_USER` / `POSTGRES_PASSWORD`: PostgreSQL database credentials
|
||||
- `BULL_AUTH_KEY`: Access key for the queue admin UI
|
||||
- `OPENAI_API_KEY`: OpenAI API key for AI-powered features (optional)
|
||||
|
||||
### Architecture
|
||||
|
||||
The self-hosted version includes the following service components:
|
||||
- **API Service**: Main API server handling all requests (4 CPU cores, 8GB RAM limit)
|
||||
- **Playwright Service**: Browser automation service (2 CPU cores, 4GB RAM limit)
|
||||
- **Redis**: Job queue and cache backend
|
||||
- **RabbitMQ**: NuQ message broker
|
||||
- **PostgreSQL**: Job state management database
|
||||
|
||||
## Links
|
||||
|
||||
- Website: https://www.firecrawl.dev
|
||||
- GitHub: https://github.com/firecrawl/firecrawl
|
||||
- Documentation: https://docs.firecrawl.dev
|
||||
- Discord: https://discord.gg/firecrawl
|
||||
@@ -0,0 +1,25 @@
|
||||
name: Firecrawl
|
||||
tags:
|
||||
- 开发工具
|
||||
- AI
|
||||
- 爬虫
|
||||
title: 将任意网站转换为适合大语言模型的结构化数据
|
||||
description:
|
||||
en: Turn any website into LLM-ready structured data. Scrape, crawl, search and extract clean markdown, structured JSON or screenshots from websites
|
||||
zh: 将任意网站转换为适合大语言模型的结构化数据。支持抓取、爬取、搜索和提取干净的 Markdown、结构化 JSON 或截图
|
||||
additionalProperties:
|
||||
key: firecrawl
|
||||
name: Firecrawl
|
||||
tags:
|
||||
- DevTool
|
||||
- AI
|
||||
- Crawler
|
||||
shortDescZh: 将任意网站转换为适合大语言模型的结构化数据
|
||||
shortDescEn: Turn any website into LLM-ready structured data
|
||||
type: website
|
||||
crossVersionUpdate: true
|
||||
limit: 0
|
||||
recommend: 0
|
||||
website: https://www.firecrawl.dev
|
||||
github: https://github.com/firecrawl/firecrawl
|
||||
document: https://docs.firecrawl.dev
|
||||
@@ -0,0 +1,47 @@
|
||||
additionalProperties:
|
||||
formFields:
|
||||
- default: "3002"
|
||||
envKey: PANEL_APP_PORT_HTTP
|
||||
label:
|
||||
en: API Port
|
||||
zh: API端口
|
||||
required: true
|
||||
type: number
|
||||
edit: true
|
||||
rule: paramPort
|
||||
- default: "CHANGEME"
|
||||
envKey: BULL_AUTH_KEY
|
||||
label:
|
||||
en: Bull Queue Admin Key
|
||||
zh: 队列管理密钥
|
||||
required: true
|
||||
type: text
|
||||
edit: true
|
||||
rule: paramCommon
|
||||
- default: "firecrawl"
|
||||
envKey: POSTGRES_USER
|
||||
label:
|
||||
en: PostgreSQL Username
|
||||
zh: 数据库用户名
|
||||
required: true
|
||||
type: text
|
||||
edit: true
|
||||
rule: paramCommon
|
||||
- default: ""
|
||||
envKey: POSTGRES_PASSWORD
|
||||
label:
|
||||
en: PostgreSQL Password
|
||||
zh: 数据库密码
|
||||
required: true
|
||||
type: password
|
||||
edit: true
|
||||
rule: paramCommon
|
||||
- default: ""
|
||||
envKey: OPENAI_API_KEY
|
||||
label:
|
||||
en: OpenAI API Key (Optional)
|
||||
zh: OpenAI API密钥(可选)
|
||||
required: false
|
||||
type: text
|
||||
edit: true
|
||||
rule: ""
|
||||
@@ -0,0 +1,121 @@
|
||||
services:
|
||||
firecrawl-api:
|
||||
image: ghcr.io/firecrawl/firecrawl:latest
|
||||
container_name: ${CONTAINER_NAME}
|
||||
restart: always
|
||||
environment:
|
||||
- HOST=0.0.0.0
|
||||
- PORT=${PANEL_APP_PORT_HTTP}
|
||||
- REDIS_URL=redis://firecrawl-redis:6379
|
||||
- REDIS_RATE_LIMIT_URL=redis://firecrawl-redis:6379
|
||||
- PLAYWRIGHT_MICROSERVICE_URL=http://firecrawl-playwright:3000/scrape
|
||||
- POSTGRES_USER=${POSTGRES_USER:-firecrawl}
|
||||
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-firecrawl}
|
||||
- POSTGRES_DB=firecrawl
|
||||
- POSTGRES_HOST=firecrawl-postgres
|
||||
- POSTGRES_PORT=5432
|
||||
- USE_DB_AUTHENTICATION=false
|
||||
- NUM_WORKERS_PER_QUEUE=8
|
||||
- CRAWL_CONCURRENT_REQUESTS=10
|
||||
- MAX_CONCURRENT_JOBS=5
|
||||
- BROWSER_POOL_SIZE=5
|
||||
- OPENAI_API_KEY=${OPENAI_API_KEY:-}
|
||||
- BULL_AUTH_KEY=${BULL_AUTH_KEY:-CHANGEME}
|
||||
- NUQ_RABBITMQ_URL=amqp://firecrawl-rabbitmq:5672
|
||||
- ENV=local
|
||||
- EXTRACT_WORKER_PORT=3004
|
||||
- WORKER_PORT=3005
|
||||
- HARNESS_STARTUP_TIMEOUT_MS=60000
|
||||
- TZ=Asia/Shanghai
|
||||
depends_on:
|
||||
firecrawl-redis:
|
||||
condition: service_started
|
||||
firecrawl-playwright:
|
||||
condition: service_started
|
||||
firecrawl-rabbitmq:
|
||||
condition: service_healthy
|
||||
firecrawl-postgres:
|
||||
condition: service_started
|
||||
ports:
|
||||
- "${PANEL_APP_PORT_HTTP}:3002"
|
||||
command: node dist/src/harness.js --start-docker
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 65535
|
||||
hard: 65535
|
||||
volumes:
|
||||
- ./data/api:/app/data
|
||||
networks:
|
||||
- 1panel-network
|
||||
labels:
|
||||
createdBy: "Apps"
|
||||
cpus: 4.0
|
||||
mem_limit: 8G
|
||||
memswap_limit: 8G
|
||||
|
||||
firecrawl-playwright:
|
||||
image: ghcr.io/firecrawl/playwright-service:latest
|
||||
container_name: ${CONTAINER_NAME}-playwright
|
||||
restart: always
|
||||
environment:
|
||||
- PORT=3000
|
||||
- PROXY_SERVER=
|
||||
- PROXY_USERNAME=
|
||||
- PROXY_PASSWORD=
|
||||
- ALLOW_LOCAL_WEBHOOKS=false
|
||||
- BLOCK_MEDIA=false
|
||||
- MAX_CONCURRENT_PAGES=10
|
||||
- TZ=Asia/Shanghai
|
||||
volumes:
|
||||
- ./data/playwright:/tmp/.cache
|
||||
networks:
|
||||
- 1panel-network
|
||||
tmpfs:
|
||||
- /tmp/.cache:noexec,nosuid,size=1g
|
||||
labels:
|
||||
createdBy: "Apps"
|
||||
cpus: 2.0
|
||||
mem_limit: 4G
|
||||
memswap_limit: 4G
|
||||
|
||||
firecrawl-redis:
|
||||
image: redis:alpine
|
||||
container_name: ${CONTAINER_NAME}-redis
|
||||
restart: always
|
||||
command: redis-server --bind 0.0.0.0
|
||||
networks:
|
||||
- 1panel-network
|
||||
volumes:
|
||||
- ./data/redis:/data
|
||||
|
||||
firecrawl-rabbitmq:
|
||||
image: rabbitmq:3-management
|
||||
container_name: ${CONTAINER_NAME}-rabbitmq
|
||||
restart: always
|
||||
command: rabbitmq-server
|
||||
healthcheck:
|
||||
test: ["CMD", "rabbitmq-diagnostics", "-q", "check_running"]
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
start_period: 5s
|
||||
networks:
|
||||
- 1panel-network
|
||||
|
||||
firecrawl-postgres:
|
||||
image: postgres:16-alpine
|
||||
container_name: ${CONTAINER_NAME}-postgres
|
||||
restart: always
|
||||
environment:
|
||||
- POSTGRES_USER=${POSTGRES_USER:-firecrawl}
|
||||
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-firecrawl}
|
||||
- POSTGRES_DB=firecrawl
|
||||
- TZ=Asia/Shanghai
|
||||
networks:
|
||||
- 1panel-network
|
||||
volumes:
|
||||
- ./data/postgres:/var/lib/postgresql/data
|
||||
|
||||
networks:
|
||||
1panel-network:
|
||||
external: true
|
||||
Binary file not shown.
|
After Width: | Height: | Size: 8.7 KiB |
@@ -0,0 +1,47 @@
|
||||
additionalProperties:
|
||||
formFields:
|
||||
- default: "3002"
|
||||
envKey: PANEL_APP_PORT_HTTP
|
||||
label:
|
||||
en: API Port
|
||||
zh: API端口
|
||||
required: true
|
||||
type: number
|
||||
edit: true
|
||||
rule: paramPort
|
||||
- default: "CHANGEME"
|
||||
envKey: BULL_AUTH_KEY
|
||||
label:
|
||||
en: Bull Queue Admin Key
|
||||
zh: 队列管理密钥
|
||||
required: true
|
||||
type: text
|
||||
edit: true
|
||||
rule: paramCommon
|
||||
- default: "firecrawl"
|
||||
envKey: POSTGRES_USER
|
||||
label:
|
||||
en: PostgreSQL Username
|
||||
zh: 数据库用户名
|
||||
required: true
|
||||
type: text
|
||||
edit: true
|
||||
rule: paramCommon
|
||||
- default: ""
|
||||
envKey: POSTGRES_PASSWORD
|
||||
label:
|
||||
en: PostgreSQL Password
|
||||
zh: 数据库密码
|
||||
required: true
|
||||
type: password
|
||||
edit: true
|
||||
rule: paramCommon
|
||||
- default: ""
|
||||
envKey: OPENAI_API_KEY
|
||||
label:
|
||||
en: OpenAI API Key (Optional)
|
||||
zh: OpenAI API密钥(可选)
|
||||
required: false
|
||||
type: text
|
||||
edit: true
|
||||
rule: ""
|
||||
@@ -0,0 +1,121 @@
|
||||
services:
|
||||
firecrawl-api:
|
||||
image: ghcr.io/firecrawl/firecrawl:v2.10.0
|
||||
container_name: ${CONTAINER_NAME}
|
||||
restart: always
|
||||
environment:
|
||||
- HOST=0.0.0.0
|
||||
- PORT=${PANEL_APP_PORT_HTTP}
|
||||
- REDIS_URL=redis://firecrawl-redis:6379
|
||||
- REDIS_RATE_LIMIT_URL=redis://firecrawl-redis:6379
|
||||
- PLAYWRIGHT_MICROSERVICE_URL=http://firecrawl-playwright:3000/scrape
|
||||
- POSTGRES_USER=${POSTGRES_USER:-firecrawl}
|
||||
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-firecrawl}
|
||||
- POSTGRES_DB=firecrawl
|
||||
- POSTGRES_HOST=firecrawl-postgres
|
||||
- POSTGRES_PORT=5432
|
||||
- USE_DB_AUTHENTICATION=false
|
||||
- NUM_WORKERS_PER_QUEUE=8
|
||||
- CRAWL_CONCURRENT_REQUESTS=10
|
||||
- MAX_CONCURRENT_JOBS=5
|
||||
- BROWSER_POOL_SIZE=5
|
||||
- OPENAI_API_KEY=${OPENAI_API_KEY:-}
|
||||
- BULL_AUTH_KEY=${BULL_AUTH_KEY:-CHANGEME}
|
||||
- NUQ_RABBITMQ_URL=amqp://firecrawl-rabbitmq:5672
|
||||
- ENV=local
|
||||
- EXTRACT_WORKER_PORT=3004
|
||||
- WORKER_PORT=3005
|
||||
- HARNESS_STARTUP_TIMEOUT_MS=60000
|
||||
- TZ=Asia/Shanghai
|
||||
depends_on:
|
||||
firecrawl-redis:
|
||||
condition: service_started
|
||||
firecrawl-playwright:
|
||||
condition: service_started
|
||||
firecrawl-rabbitmq:
|
||||
condition: service_healthy
|
||||
firecrawl-postgres:
|
||||
condition: service_started
|
||||
ports:
|
||||
- "${PANEL_APP_PORT_HTTP}:3002"
|
||||
command: node dist/src/harness.js --start-docker
|
||||
ulimits:
|
||||
nofile:
|
||||
soft: 65535
|
||||
hard: 65535
|
||||
volumes:
|
||||
- ./data/api:/app/data
|
||||
networks:
|
||||
- 1panel-network
|
||||
labels:
|
||||
createdBy: "Apps"
|
||||
cpus: 4.0
|
||||
mem_limit: 8G
|
||||
memswap_limit: 8G
|
||||
|
||||
firecrawl-playwright:
|
||||
image: ghcr.io/firecrawl/playwright-service:v2.10.0
|
||||
container_name: ${CONTAINER_NAME}-playwright
|
||||
restart: always
|
||||
environment:
|
||||
- PORT=3000
|
||||
- PROXY_SERVER=
|
||||
- PROXY_USERNAME=
|
||||
- PROXY_PASSWORD=
|
||||
- ALLOW_LOCAL_WEBHOOKS=false
|
||||
- BLOCK_MEDIA=false
|
||||
- MAX_CONCURRENT_PAGES=10
|
||||
- TZ=Asia/Shanghai
|
||||
volumes:
|
||||
- ./data/playwright:/tmp/.cache
|
||||
networks:
|
||||
- 1panel-network
|
||||
tmpfs:
|
||||
- /tmp/.cache:noexec,nosuid,size=1g
|
||||
labels:
|
||||
createdBy: "Apps"
|
||||
cpus: 2.0
|
||||
mem_limit: 4G
|
||||
memswap_limit: 4G
|
||||
|
||||
firecrawl-redis:
|
||||
image: redis:alpine
|
||||
container_name: ${CONTAINER_NAME}-redis
|
||||
restart: always
|
||||
command: redis-server --bind 0.0.0.0
|
||||
networks:
|
||||
- 1panel-network
|
||||
volumes:
|
||||
- ./data/redis:/data
|
||||
|
||||
firecrawl-rabbitmq:
|
||||
image: rabbitmq:3-management
|
||||
container_name: ${CONTAINER_NAME}-rabbitmq
|
||||
restart: always
|
||||
command: rabbitmq-server
|
||||
healthcheck:
|
||||
test: ["CMD", "rabbitmq-diagnostics", "-q", "check_running"]
|
||||
interval: 5s
|
||||
timeout: 5s
|
||||
retries: 3
|
||||
start_period: 5s
|
||||
networks:
|
||||
- 1panel-network
|
||||
|
||||
firecrawl-postgres:
|
||||
image: postgres:16-alpine
|
||||
container_name: ${CONTAINER_NAME}-postgres
|
||||
restart: always
|
||||
environment:
|
||||
- POSTGRES_USER=${POSTGRES_USER:-firecrawl}
|
||||
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-firecrawl}
|
||||
- POSTGRES_DB=firecrawl
|
||||
- TZ=Asia/Shanghai
|
||||
networks:
|
||||
- 1panel-network
|
||||
volumes:
|
||||
- ./data/postgres:/var/lib/postgresql/data
|
||||
|
||||
networks:
|
||||
1panel-network:
|
||||
external: true
|
||||
Reference in New Issue
Block a user