Files
Arch1Panel/apps/firecrawl/README_en.md
T
arch3rPro e98811cd04 feat: add firecrawl and vane applications, fix lxserver form config
首先新增Firecrawl和Vane两款应用,包含完整的应用配置、docker-compose编排、说明文档以及logo资源;同时修复了lxserver时区配置项中多余的rule参数。
2026-05-17 17:52:54 +08:00

2.3 KiB

Firecrawl

Turn any website into LLM-ready structured data. A powerful web scraping, crawling, search and data extraction platform.

Features

  • Single Page Scraping: Convert any URL to Markdown, HTML, screenshots, or structured JSON
  • Multi-Page Crawling: Recursively scrape entire websites with intelligent link filtering
  • URL Discovery: Discover all URLs on a website instantly via sitemaps, index queries, or search
  • Web Search: Search the web and get full page content from results in a single call
  • AI Extraction: LLM-powered structured data extraction with schema validation
  • Autonomous Agent: AI research agent that automatically navigates and extracts data
  • Remote Browser: Remote browser sessions with CDP access and code execution
  • Batch Operations: Asynchronous bulk scraping of multiple URLs
  • Self-Hosted: Fully open source, supports local deployment with complete data control

Usage

Default Port

API Access

After deployment, access the API at http://your-ip:3002.

Test the crawl endpoint:

curl -X POST http://localhost:3002/v1/crawl \
    -H 'Content-Type: application/json' \
    -d '{
      "url": "https://firecrawl.dev"
    }'

Data Directories

Application data is stored in the following directories:

  • ./data/api - API service data
  • ./data/postgres - PostgreSQL database data
  • ./data/redis - Redis cache data
  • ./data/playwright - Playwright browser cache

Environment Variables

  • POSTGRES_USER / POSTGRES_PASSWORD: PostgreSQL database credentials
  • BULL_AUTH_KEY: Access key for the queue admin UI
  • OPENAI_API_KEY: OpenAI API key for AI-powered features (optional)

Architecture

The self-hosted version includes the following service components:

  • API Service: Main API server handling all requests (4 CPU cores, 8GB RAM limit)
  • Playwright Service: Browser automation service (2 CPU cores, 4GB RAM limit)
  • Redis: Job queue and cache backend
  • RabbitMQ: NuQ message broker
  • PostgreSQL: Job state management database