mirror of
https://github.com/arch3rPro/1Panel-Appstore.git
synced 2026-06-11 00:59:40 +08:00
feat: add firecrawl and vane applications, fix lxserver form config
首先新增Firecrawl和Vane两款应用,包含完整的应用配置、docker-compose编排、说明文档以及logo资源;同时修复了lxserver时区配置项中多余的rule参数。
This commit is contained in:
@@ -0,0 +1,65 @@
|
||||
# Firecrawl
|
||||
|
||||
Turn any website into LLM-ready structured data. A powerful web scraping, crawling, search and data extraction platform.
|
||||
|
||||
## Features
|
||||
|
||||
- **Single Page Scraping**: Convert any URL to Markdown, HTML, screenshots, or structured JSON
|
||||
- **Multi-Page Crawling**: Recursively scrape entire websites with intelligent link filtering
|
||||
- **URL Discovery**: Discover all URLs on a website instantly via sitemaps, index queries, or search
|
||||
- **Web Search**: Search the web and get full page content from results in a single call
|
||||
- **AI Extraction**: LLM-powered structured data extraction with schema validation
|
||||
- **Autonomous Agent**: AI research agent that automatically navigates and extracts data
|
||||
- **Remote Browser**: Remote browser sessions with CDP access and code execution
|
||||
- **Batch Operations**: Asynchronous bulk scraping of multiple URLs
|
||||
- **Self-Hosted**: Fully open source, supports local deployment with complete data control
|
||||
|
||||
## Usage
|
||||
|
||||
### Default Port
|
||||
|
||||
- API Service: 3002
|
||||
- Queue Admin UI: http://your-ip:3002/admin/YOUR_BULL_AUTH_KEY/queues
|
||||
|
||||
### API Access
|
||||
|
||||
After deployment, access the API at `http://your-ip:3002`.
|
||||
|
||||
Test the crawl endpoint:
|
||||
```bash
|
||||
curl -X POST http://localhost:3002/v1/crawl \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{
|
||||
"url": "https://firecrawl.dev"
|
||||
}'
|
||||
```
|
||||
|
||||
### Data Directories
|
||||
|
||||
Application data is stored in the following directories:
|
||||
- `./data/api` - API service data
|
||||
- `./data/postgres` - PostgreSQL database data
|
||||
- `./data/redis` - Redis cache data
|
||||
- `./data/playwright` - Playwright browser cache
|
||||
|
||||
### Environment Variables
|
||||
|
||||
- `POSTGRES_USER` / `POSTGRES_PASSWORD`: PostgreSQL database credentials
|
||||
- `BULL_AUTH_KEY`: Access key for the queue admin UI
|
||||
- `OPENAI_API_KEY`: OpenAI API key for AI-powered features (optional)
|
||||
|
||||
### Architecture
|
||||
|
||||
The self-hosted version includes the following service components:
|
||||
- **API Service**: Main API server handling all requests (4 CPU cores, 8GB RAM limit)
|
||||
- **Playwright Service**: Browser automation service (2 CPU cores, 4GB RAM limit)
|
||||
- **Redis**: Job queue and cache backend
|
||||
- **RabbitMQ**: NuQ message broker
|
||||
- **PostgreSQL**: Job state management database
|
||||
|
||||
## Links
|
||||
|
||||
- Website: https://www.firecrawl.dev
|
||||
- GitHub: https://github.com/firecrawl/firecrawl
|
||||
- Documentation: https://docs.firecrawl.dev
|
||||
- Discord: https://discord.gg/firecrawl
|
||||
Reference in New Issue
Block a user