Skip to main content
Crawling discovers and fetches every reachable page under a URL, not just one. Unlike scraping a single URL, a crawl runs as an asynchronous background job: you start it, receive a job ID, and poll for results as pages come in. This design lets you crawl large sites without holding an open HTTP connection. Credits are consumed per page successfully fetched — if a crawl stops early because you run out of credits, you are only billed for the pages that completed.

Start a crawl

Send POST /v1/crawl with a starting URL. Use limit to cap the total number of pages and maxDepth to restrict how deep the crawler follows links from the seed URL.
# Kick off the crawl — returns a job ID.
curl -X POST https://api.zapfetch.com/v1/crawl \
  -H "Authorization: Bearer $ZAPFETCH_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "limit": 50
  }'
The response from POST /v1/crawl contains a jobId you use to poll for results.

Poll for status and results

Call GET /v1/crawl/{job_id} to check whether the crawl is still running and to retrieve the pages fetched so far. Poll this endpoint periodically until status is completed or failed.
curl https://api.zapfetch.com/v1/crawl/JOB_ID \
  -H "Authorization: Bearer $ZAPFETCH_KEY"
The Python and Node.js SDK helpers (crawl_url with wait_until_done=True / crawlUrl with the third argument true) handle polling automatically — they block until the job finishes and return all pages in one go. Use the curl pattern above when you need to poll manually or from a language without an official SDK.

Key parameters

ParameterTypeDescription
urlstringSeed URL. The crawler stays within this domain by default.
limitintegerMaximum number of pages to fetch. Defaults vary by plan.
maxDepthintegerMaximum link depth from the seed URL.

Credit cost and billing behavior

Crawls are billed at 1 credit per page fetched (HTTP 2xx). Pages that fail — 4xx, 5xx, timeouts, robots.txt denials — do not consume credits. Each plan also sets a concurrent crawl limit:
PlanConcurrent crawls
Free2
Starter10
Pro20
Scale50
Business100
EnterpriseCustom
If your credit balance hits zero mid-crawl, pages already dispatched to workers finish and are billed normally. No new pages are scheduled once the balance is exhausted. The job status will reflect how far the crawl got. To avoid interruptions, monitor remainingCredits in responses or enable opt-in overage from the Billing page.

Next steps

  • Scrape a single URL with Scrape.
  • Discover all URLs on a domain without fetching their content using the Map endpoint.
  • Pull structured data from crawled pages with Extract.