Website Crawling

The Website Crawling service allows you to systematically explore and extract content from websites.

Required Parameters

url

string

required

The base URL to start crawling from

Optional Parameters

Crawl Scope

maxDepth

integer

default:"2"

Maximum depth to crawl relative to the base URL

limit

integer

default:"10000"

Maximum number of pages to crawl

allowBackwardLinks

boolean

default:"false"

Enable navigation to previously linked pages

allowExternalLinks

boolean

default:"false"

Allow following links to external websites

URL Filtering

includePaths

array

Regex patterns for URLs to include (e.g., ["blog/*"])

excludePaths

array

Regex patterns for URLs to exclude (e.g., ["admin/*"])

Crawl Settings

ignoreSitemap

boolean

default:"false"

Ignore the website’s sitemap.xml

webhook

string

Webhook URL for crawl events:

crawl.started
crawl.page
crawl.completed
crawl.failed

Scrape Options

scrapeOptions

object

Configure how each page is scraped:

{
  "formats": ["markdown"],
  "onlyMainContent": true,
  "removeBase64Images": true,
  "mobile": false,
  "waitFor": 0,
  "headers": {},
  "includeTags": [],
  "excludeTags": []
}

Example Request

curl --request POST \
  --url https://crawl.taam.cloud/v1/web \
  --header 'Authorization: Bearer YOUR_API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "url": "https://example.com",
    "maxDepth": 2,
    "includePaths": ["blog/*"],
    "excludePaths": ["admin/*"],
    "limit": 100,
    "scrapeOptions": {
      "formats": ["markdown"],
      "onlyMainContent": true
    }
  }'

Example Response

{
  "success": true,
  "id": "crawl_123abc",
  "url": "https://example.com"
}

Status Codes

200 - Success

Crawl job started successfully

402 - Payment Required

Insufficient credits or requires payment

429 - Too Many Requests

Rate limit exceeded

500 - Server Error

Internal server error

Authorizations

Authorization

string

header

required

Enter your API key prefixed with 'Bearer '

Body

application/json

model

enum<string>

required

Type of web service to use

Available options:

scrape,

crawl,

map,

taam-ai-search,

crawl-status

params

object

Parameters specific to the selected model

Response

200 - application/json

Successful response

string

Unique identifier for the request

object

string

Type of completion (e.g., scrape.completion)

created

integer

Unix timestamp of when the request was created

model

string

Model used for the request

data

object

Model-specific response data

usage

object

Show child attributes

system_fingerprint

string

API Reference

Web & Search

Required Parameters

Optional Parameters

Example Request

Example Response

Status Codes

Authorizations

Body

Response

API Reference

Web & Search

​Required Parameters

​Optional Parameters

​Example Request

​Example Response

​Status Codes

Authorizations

Body

Response

Required Parameters

Optional Parameters

Example Request

Example Response

Status Codes