Crawl entire websites asynchronously with ZapFetch

Crawling discovers and fetches every reachable page under a URL, not just one. Unlike scraping a single URL, a crawl runs as an asynchronous background job: you start it, receive a job ID, and poll for results as pages come in. This design lets you crawl large sites without holding an open HTTP connection. Credits are consumed per page successfully fetched — if a crawl stops early because you run out of credits, you are only billed for the pages that completed.

Start a crawl

Send POST /v1/crawl with a starting URL. Use limit to cap the total number of pages and maxDepth to restrict how deep the crawler follows links from the seed URL.

# Kick off the crawl — returns a job ID.
curl -X POST https://api.zapfetch.com/v1/crawl \
  -H "Authorization: Bearer $ZAPFETCH_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "limit": 50
  }'

The response from POST /v1/crawl contains a jobId you use to poll for results.

Poll for status and results

Call GET /v1/crawl/{job_id} to check whether the crawl is still running and to retrieve the pages fetched so far. Poll this endpoint periodically until status is completed or failed.

curl https://api.zapfetch.com/v1/crawl/JOB_ID \
  -H "Authorization: Bearer $ZAPFETCH_KEY"

The Python and Node.js SDK helpers (crawl_url with wait_until_done=True / crawlUrl with the third argument true) handle polling automatically — they block until the job finishes and return all pages in one go. Use the curl pattern above when you need to poll manually or from a language without an official SDK.

Key parameters

Parameter	Type	Description
`url`	string	Seed URL. The crawler stays within this domain by default.
`limit`	integer	Maximum number of pages to fetch. Defaults vary by plan.
`maxDepth`	integer	Maximum link depth from the seed URL.

Credit cost and billing behavior

Crawls are billed at 1 credit per page fetched (HTTP 2xx). Pages that fail — 4xx, 5xx, timeouts, robots.txt denials — do not consume credits. Each plan also sets a concurrent crawl limit:

Plan	Concurrent crawls
Free	2
Starter	10
Pro	20
Scale	50
Business	100
Enterprise	Custom

If your credit balance hits zero mid-crawl, pages already dispatched to workers finish and are billed normally. No new pages are scheduled once the balance is exhausted. The job status will reflect how far the crawl got. To avoid interruptions, monitor remainingCredits in responses or enable opt-in overage from the Billing page.

Next steps

Scrape a single URL with Scrape.
Discover all URLs on a domain without fetching their content using the Map endpoint.
Pull structured data from crawled pages with Extract.

Get Started

Guides

Reference

Crawl entire websites asynchronously with ZapFetch

Start a crawl

Poll for status and results

Key parameters

Credit cost and billing behavior

Next steps

Get Started

Guides

Reference

​Start a crawl

​Poll for status and results

​Key parameters

​Credit cost and billing behavior

​Next steps

Start a crawl

Poll for status and results

Key parameters

Credit cost and billing behavior

Next steps