Firecrawl Scraping
Overview
Scrape individual web pages and convert them to clean, LLM-ready markdown. Handles JavaScript rendering, anti-bot protection, and dynamic content.
Quick Decision Tree
What are you scraping?
│
├── Single page (article, blog, docs)
│ └── references/single-page.md
│ └── Script: scripts/firecrawl_scrape.py
│
└── Entire website (multiple pages, crawling)
└── references/website-crawler.md
└── (Use Apify Website Content Crawler for multi-page)
Environment Setup
# Required in .env FIRECRAWL_API_KEY=fc-your-api-key-here
Get your API key: https://firecrawl.dev/app/api-keys
Common Usage
Simple Scrape
python scripts/firecrawl_scrape.py "https://example.com/article"
With Options
python scripts/firecrawl_scrape.py "https://wsj.com/article" \ --proxy stealth \ --format markdown summary \ --timeout 60000
Proxy Modes
| Mode | Use Case |
|---|---|
basic | Standard sites, fastest |
stealth | Anti-bot protection, premium content (WSJ, NYT) |
auto | Let Firecrawl decide (recommended) |
Output Formats
- •
markdown- Clean markdown content (default) - •
html- Raw HTML - •
summary- AI-generated summary - •
screenshot- Page screenshot - •
links- All links on page
Cost
~1 credit per page. Stealth proxy may use additional credits.
Security Notes
Credential Handling
- •Store
FIRECRAWL_API_KEYin.envfile (never commit to git) - •API keys can be regenerated at https://firecrawl.dev/app/api-keys
- •Never log or print API keys in script output
- •Use environment variables, not hardcoded values
Data Privacy
- •Only scrapes publicly accessible web pages
- •Scraped content is processed by Firecrawl servers temporarily
- •Markdown output stored locally in
.tmp/directory - •Screenshots (if requested) are stored locally
- •No persistent data retention by Firecrawl after request
Access Scopes
- •API key provides full access to scraping features
- •No granular permission scopes available
- •Monitor usage via Firecrawl dashboard
Compliance Considerations
- •Robots.txt: Firecrawl respects robots.txt by default
- •Public Content Only: Only scrape publicly accessible pages
- •Terms of Service: Respect target site ToS
- •Rate Limiting: Built-in rate limiting prevents abuse
- •Stealth Proxy: Use stealth mode only when necessary (paywalled news, not auth bypass)
- •GDPR: Scraped content may contain PII - handle accordingly
- •Copyright: Respect intellectual property rights of scraped content
Troubleshooting
Common Issues
Issue: Credits exhausted
Symptoms: API returns "insufficient credits" or quota exceeded error Cause: Account credits depleted Solution:
- •Check credit balance at https://firecrawl.dev/app
- •Upgrade plan or purchase additional credits
- •Reduce scraping frequency
- •Use
basicproxy mode to conserve credits
Issue: Page not rendering correctly
Symptoms: Empty content or partial HTML returned Cause: JavaScript-heavy page not fully loading Solution:
- •Enable JavaScript rendering with
--js-renderflag - •Increase timeout with
--timeout 60000(60 seconds) - •Try
stealthproxy mode for protected sites - •Wait for specific elements with
--wait-forselector
Issue: 403 Forbidden error
Symptoms: Script returns 403 status code Cause: Site blocking automated access Solution:
- •Enable
stealthproxy mode - •Add delay between requests
- •Try at different times (some sites rate limit by time)
- •Check if site requires login (not supported)
Issue: Empty markdown output
Symptoms: Scrape succeeds but markdown is empty or malformed Cause: Dynamic content loaded after page load, or unusual page structure Solution:
- •Increase wait time for JavaScript to execute
- •Use
--wait-forto wait for specific content - •Try
htmlformat to see raw content - •Check if content is in an iframe (not always supported)
Issue: Timeout errors
Symptoms: Request times out before completion Cause: Slow page load or large page content Solution:
- •Increase timeout value (up to 120000ms)
- •Use
basicproxy for faster response - •Target specific page sections if possible
- •Check if site is experiencing issues
Resources
- •references/single-page.md - Single page scraping details
- •references/website-crawler.md - Multi-page website crawling
Integration Patterns
Scrape and Analyze
Skills: firecrawl-scraping → parallel-research Use case: Scrape competitor pages, then analyze content strategy Flow:
- •Scrape competitor website pages with Firecrawl
- •Convert to clean markdown
- •Use parallel-research to analyze positioning, messaging, features
Scrape and Document
Skills: firecrawl-scraping → content-generation Use case: Create summary documents from web research Flow:
- •Scrape multiple article pages on a topic
- •Combine markdown content
- •Generate summary document via content-generation
Scrape and Enrich CRM
Skills: firecrawl-scraping → attio-crm Use case: Enrich company records with website data Flow:
- •Scrape company website (about page, team page, product pages)
- •Extract key information (funding, team size, products)
- •Update company record in Attio CRM with enriched data