Add Website

Add Website allows you to crawl website pages and link the crawled content with a chatbot as a knowledge source.

This endpoint is used when you want to add public website content, documentation pages, help center articles, product pages, blogs, FAQs, or other web pages to an AI agent. Once the crawl starts, the website content can be processed and indexed so the chatbot can answer questions using the linked website data.

The API returns crawl run details, including the crawl run ID, current status, and start time.

Endpoint

Request Body

{
  "project_id": "3..",
  "chatbot_id": "DKDvwn...",
  "vector_provider": 0,
  "start_urls": [
    {
      "url": "https://example.com/docs"
    }
  ],
  "max_crawl_pages": 50,
  "max_crawl_depth": 3,
  "include_urls": [
    "https://example.com/docs/*"
  ],
  "exclude_urls": [
    "https://example.com/blog/*",
    "https://example.com/login",
    "https://example.com/pricing"
  ],
  "use_sitemap": "1",
  "is_main_website": true,
  "crawl_mode": 2
}

Request Parameters

Field	Type	Required	Description
`project_id`	string	No	Project ID associated with the chatbot.
`chatbot_id`	string	No	Unique chatbot identifier that should be linked with the crawled website content.
`vector_provider`	integer	No	Vector provider used to process and index the crawled website content.
`start_urls`	array	Yes	List of starting URLs from where the website crawl should begin.
`start_urls[].url`	string	Yes	Website URL to crawl.
`max_crawl_pages`	integer	No	Maximum number of pages the crawler should crawl.
`max_crawl_depth`	integer	No	Maximum depth the crawler should follow from the starting URL.
`include_urls`	array of strings	No	URL patterns or specific URLs that should be included during crawling.
`exclude_urls`	array of strings	No	URL patterns or specific URLs that should be excluded during crawling.
`use_sitemap`	string	No	Indicates whether the crawler should use the website sitemap when available. Example: `1`.
`is_main_website`	boolean	No	Indicates whether this website should be marked as the main website source for the chatbot.
`crawl_mode`	integer	No	Crawl mode that controls how the website should be crawled.

Success Response

{
  "run_id": "crawl_run_12345",
  "status": "crawling",
  "started_at": "2026-02-20 14:45:00"
}

Response Fields

Field	Type	Description
`run_id`	string	Unique identifier of the website crawl run.
`status`	string	Current status of the crawl run. Example values include `running`, `completed`, or `failed`.
`started_at`	string	Date and time when the crawl process started.

Error Responses

400 Bad Request

Returned when the request body is invalid, required values are missing, or one or more URLs are not valid.

{
  "message": "Bad Request",
  "error": {
    "code": "BAD_REQUEST",
    "detail": "Invalid request body or missing required fields."
  }
}

401 Unauthorized

Returned when the request does not include a valid bearer token.

{
  "message": "Unauthorized",
  "error": {
    "code": "UNAUTHORIZED",
    "detail": "Authentication token is missing or invalid."
  }
}

403 Forbidden

Returned when the authenticated user does not have permission to add website content to the specified chatbot.

{
  "message": "Forbidden",
  "error": {
    "code": "FORBIDDEN",
    "detail": "You do not have permission to update this chatbot."
  }
}

404 Not Found

Returned when the project or chatbot does not exist.

{
  "message": "Not Found",
  "error": {
    "code": "CHATBOT_NOT_FOUND",
    "detail": "Chatbot not found."
  }
}

500 Internal Server Error

Returned when an unexpected error occurs while starting the crawl.

{
  "message": "Internal Server Error",
  "error": {
    "code": "INTERNAL_SERVER_ERROR",
    "detail": "Something went wrong. Please try again later."
  }
}

Notes

Use this endpoint when you want to add website content as a knowledge source for a chatbot.

The API starts a crawl run and returns the crawl run details. Crawling and indexing may continue after the response is returned.

Use max_crawl_pages and max_crawl_depth to control how much website content is crawled.

Use include_urls to limit crawling to specific website sections, such as documentation or help center pages.

Use exclude_urls to prevent unnecessary pages from being crawled, such as login pages, checkout pages, pricing pages, or blog sections.

Use use_sitemap when you want the crawler to discover pages from the website sitemap.

After starting the crawl, use the Get Crawl Status API with the returned run_id to check the crawl progress.

curl --location 'https://agents.robofy.ai/v1/ai-agent/crawl_website' \ --header 'Content-Type: application/json' \ --data '{ "project_id": "3..", "chatbot_id": "DKDvwn...", "vector_provider": 0, "start_urls": [ { "url": "https://example.com/docs" } ], "max_crawl_pages": 50, "max_crawl_depth": 3, "include_urls": [ "https://example.com/docs/*" ], "exclude_urls": [ "https://example.com/blog/*", "https://example.com/login", "https://example.com/pricing" ], "use_sitemap": "1", "is_main_website": true, "crawl_mode": 2 }'

Endpoint#

Request Body#

Request Parameters#

Success Response#

Response Fields#

Error Responses#

400 Bad Request#

401 Unauthorized#

403 Forbidden#

404 Not Found#

500 Internal Server Error#

Notes#

Request

Responses