Web Browsing API

The Web Browsing API allows Oliver to browse and extract content from web pages to provide detailed information in conversations.

Note: Web browsing is subject to rate limits and content policies. Some websites may block automated access.

Browse Web Page

Extract content from a specific web page URL.

POST /api/v1/browsing/browse

Browse a web page and extract its content for analysis.

Headers

Authorization: Bearer <your_api_token>
Content-Type: application/json

Request Body

{
  "url": "https://example.com/financial-news/market-update",
  "chat_id": 123,
  "message_id": 456,
  "purpose": "Retrieve current market information for client query",
  "extract_links": true,
  "extract_images": false,
  "max_content_length": 10000
}

Parameters

Parameter Type Required Description
url string Yes The URL to browse (must be HTTPS)
chat_id integer No Chat ID to associate browsing session with
message_id integer No Message ID that triggered the browse
purpose string No Description of why the page is being browsed
extract_links boolean No Whether to extract links from the page (default: false)
extract_images boolean No Whether to extract image URLs (default: false)
max_content_length integer No Maximum content length to extract (default: 50000)

Response

{
  "success": true,
  "data": {
    "browse_session": {
      "id": "browse_xyz789abc",
      "url": "https://example.com/financial-news/market-update",
      "chat_id": 123,
      "message_id": 456,
      "purpose": "Retrieve current market information for client query",
      "status": "completed",
      "created_at": "2025-03-18T14:30:00Z",
      "completed_at": "2025-03-18T14:30:15Z"
    },
    "web_page": {
      "id": "webpage_abc123xyz",
      "url": "https://example.com/financial-news/market-update",
      "title": "Market Update: Q1 2025 Financial Outlook",
      "description": "Latest analysis of market trends and financial outlook for Q1 2025",
      "content": "The financial markets have shown resilience in Q1 2025, with major indices posting gains...",
      "text_content": "Market Update: Q1 2025 Financial Outlook\n\nThe financial markets have shown resilience...",
      "word_count": 1250,
      "metadata": {
        "author": "Jane Smith",
        "published_date": "2025-03-18T08:00:00Z",
        "publisher": "Financial News Today",
        "language": "en",
        "keywords": ["market", "financial", "Q1", "2025", "outlook"],
        "canonical_url": "https://example.com/financial-news/market-update"
      },
      "extracted_links": [
        {
          "url": "https://example.com/sec-filing/xyz-corp",
          "text": "XYZ Corp SEC Filing",
          "type": "internal"
        },
        {
          "url": "https://sec.gov/edgar/search/",
          "text": "SEC EDGAR Database",
          "type": "external"
        }
      ],
      "images": [
        {
          "url": "https://example.com/images/market-chart-q1-2025.png",
          "alt": "Market performance chart Q1 2025",
          "caption": "Market trends for Q1 2025"
        }
      ],
      "last_updated": "2025-03-18T14:30:15Z"
    }
  },
  "message": "Web page browsed successfully"
}

Browsing History

Retrieve browsing history for a user or chat.

GET /api/v1/browsing/history

Get paginated browsing history with optional filtering.

Headers

Authorization: Bearer <your_api_token>

Query Parameters

Parameter Type Required Description
chat_id integer No Filter by specific chat ID
status string No Filter by status: pending, completed, failed
date_from string No Start date filter (ISO 8601 format)
date_to string No End date filter (ISO 8601 format)
page integer No Page number (default: 1)
per_page integer No Results per page (default: 20, max: 100)

Response

{
  "success": true,
  "data": [
    {
      "id": "browse_xyz789abc",
      "url": "https://example.com/financial-news/market-update",
      "title": "Market Update: Q1 2025 Financial Outlook",
      "chat_id": 123,
      "message_id": 456,
      "status": "completed",
      "word_count": 1250,
      "created_at": "2025-03-18T14:30:00Z",
      "completed_at": "2025-03-18T14:30:15Z"
    },
    {
      "id": "browse_abc456def",
      "url": "https://sec.gov/news/press-release/2025-12",
      "title": "SEC Announces New Investment Advisor Rules",
      "chat_id": 124,
      "message_id": 789,
      "status": "completed",
      "word_count": 890,
      "created_at": "2025-03-18T13:15:00Z",
      "completed_at": "2025-03-18T13:15:08Z"
    }
  ],
  "meta": {
    "current_page": 1,
    "per_page": 20,
    "total": 2,
    "total_pages": 1,
    "has_next_page": false,
    "has_previous_page": false
  }
}

Browse Session Details

Get detailed information about a specific browsing session.

GET /api/v1/browsing/{session_id}

Retrieve detailed information about a browsing session.

Headers

Authorization: Bearer <your_api_token>

Path Parameters

Parameter Type Required Description
session_id string Yes Unique identifier for the browsing session

Response

{
  "success": true,
  "data": {
    "id": "browse_xyz789abc",
    "url": "https://example.com/financial-news/market-update",
    "chat_id": 123,
    "message_id": 456,
    "purpose": "Retrieve current market information for client query",
    "status": "completed",
    "web_page": {
      "id": "webpage_abc123xyz",
      "title": "Market Update: Q1 2025 Financial Outlook",
      "description": "Latest analysis of market trends and financial outlook for Q1 2025",
      "content": "The financial markets have shown resilience in Q1 2025...",
      "word_count": 1250,
      "metadata": {
        "author": "Jane Smith",
        "published_date": "2025-03-18T08:00:00Z",
        "publisher": "Financial News Today",
        "language": "en",
        "keywords": ["market", "financial", "Q1", "2025", "outlook"]
      }
    },
    "processing_time": 15.2,
    "created_at": "2025-03-18T14:30:00Z",
    "completed_at": "2025-03-18T14:30:15Z"
  }
}

Refresh Web Page

Re-browse a previously accessed web page to get updated content.

POST /api/v1/browsing/{session_id}/refresh

Refresh the content of a previously browsed web page.

Headers

Authorization: Bearer <your_api_token>

Path Parameters

Parameter Type Required Description
session_id string Yes Unique identifier for the browsing session to refresh

Response

{
  "success": true,
  "data": {
    "browse_session": {
      "id": "browse_xyz789abc_refresh",
      "original_session_id": "browse_xyz789abc",
      "url": "https://example.com/financial-news/market-update",
      "status": "completed",
      "created_at": "2025-03-18T16:45:00Z",
      "completed_at": "2025-03-18T16:45:12Z"
    },
    "changes_detected": true,
    "content_diff": {
      "added_sections": ["New section: Market Predictions for Q2"],
      "removed_sections": [],
      "modified_sections": ["Updated market data in Introduction"]
    }
  },
  "message": "Web page refreshed successfully"
}

Supported Content Types

The Web Browsing API can extract content from various types of web pages:

Content Type Supported Notes
HTML Pages ✅ Yes Full content extraction with metadata
News Articles ✅ Yes Optimized for article structure
Blog Posts ✅ Yes Extracts main content and metadata
PDF Documents ⚠️ Limited Basic text extraction only
JavaScript SPAs ⚠️ Limited Static content only, no JS execution
Paywalled Content ❌ No Cannot bypass subscription walls
Social Media ❌ No Blocked by platform policies

Error Handling

The Web Browsing API returns specific error codes for different failure scenarios:

Error Code HTTP Status Description
invalid_url 422 The provided URL is not valid or accessible
url_blocked 403 The URL is blocked by content policy
page_not_found 404 The web page could not be found
content_too_large 413 The web page content exceeds size limits
browse_timeout 408 The browsing operation timed out
rate_limit_exceeded 429 Too many browsing requests

Example Error Response

{
  "success": false,
  "error": {
    "code": "url_blocked",
    "message": "The requested URL is blocked by content policy",
    "details": {
      "url": "https://example.com/blocked-content",
      "reason": "Content type not supported for financial services"
    }
  }
}

Rate Limits

Web Browsing API endpoints have the following rate limits:

  • Browse Page: 10 requests per minute
  • Browsing History: 60 requests per minute
  • Session Details: 60 requests per minute
  • Refresh Page: 5 requests per minute

Best Practices

  • Verify URLs: Ensure URLs are valid and accessible before browsing
  • Handle Timeouts: Implement proper timeout handling for slow-loading pages
  • Cache Content: Cache frequently accessed content to reduce API calls
  • Respect Rate Limits: Implement exponential backoff for rate limit errors
  • Content Length: Use max_content_length parameter to control response size
  • Monitor Failures: Track and log browsing failures for debugging
Privacy & Compliance: Web browsing activities are logged for compliance and security purposes. Ensure that browsed content complies with your organization's data usage policies.
Content Freshness: Browsed content is cached for up to 1 hour to improve performance. Use the refresh endpoint to get the latest version of frequently changing pages.