Files
Arch1Panel/apps/firecrawl
arch3rPro e97f8b7a79 fix(firecrawl): 调整资源限制并更新镜像标签
1. 移除firecrawl-api和playwright服务的硬编码CPU内存限制
2. 将firecrawl-api镜像从v2.10.0改为latest标签
3. 将playwright服务镜像保留对应版本配置
2026-05-18 00:00:35 +08:00
..

Firecrawl

Turn any website into LLM-ready structured data. A powerful web scraping, crawling, search and data extraction platform.

Features

  • Single Page Scraping: Convert any URL to Markdown, HTML, screenshots, or structured JSON
  • Multi-Page Crawling: Recursively scrape entire websites with intelligent link filtering
  • URL Discovery: Discover all URLs on a website instantly via sitemaps, index queries, or search
  • Web Search: Search the web and get full page content from results in a single call
  • AI Extraction: LLM-powered structured data extraction with schema validation
  • Autonomous Agent: AI research agent that automatically navigates and extracts data
  • Remote Browser: Remote browser sessions with CDP access and code execution
  • Batch Operations: Asynchronous bulk scraping of multiple URLs
  • Self-Hosted: Fully open source, supports local deployment with complete data control

Usage

Default Port

API Access

After deployment, access the API at http://your-ip:3002.

Test the crawl endpoint:

curl -X POST http://localhost:3002/v1/crawl \
    -H 'Content-Type: application/json' \
    -d '{
      "url": "https://firecrawl.dev"
    }'

Data Directories

Application data is stored in the following directories:

  • ./data/api - API service data
  • ./data/postgres - PostgreSQL database data
  • ./data/redis - Redis cache data
  • ./data/playwright - Playwright browser cache

Environment Variables

  • POSTGRES_USER / POSTGRES_PASSWORD: PostgreSQL database credentials
  • BULL_AUTH_KEY: Access key for the queue admin UI
  • OPENAI_API_KEY: OpenAI API key for AI-powered features (optional)

Architecture

The self-hosted version includes the following service components:

  • API Service: Main API server handling all requests (4 CPU cores, 8GB RAM limit)
  • Playwright Service: Browser automation service (2 CPU cores, 4GB RAM limit)
  • Redis: Job queue and cache backend
  • RabbitMQ: NuQ message broker
  • PostgreSQL: Job state management database