Web Browsing API
The Web Browsing API allows Oliver to browse and extract content from web pages to provide detailed information in conversations.
Browse Web Page
Extract content from a specific web page URL.
Browse a web page and extract its content for analysis.
Headers
Authorization: Bearer <your_api_token>
Content-Type: application/json
Request Body
{
"url": "https://example.com/financial-news/market-update",
"chat_id": 123,
"message_id": 456,
"purpose": "Retrieve current market information for client query",
"extract_links": true,
"extract_images": false,
"max_content_length": 10000
}
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
url | string | Yes | The URL to browse (must be HTTPS) |
chat_id | integer | No | Chat ID to associate browsing session with |
message_id | integer | No | Message ID that triggered the browse |
purpose | string | No | Description of why the page is being browsed |
extract_links | boolean | No | Whether to extract links from the page (default: false) |
extract_images | boolean | No | Whether to extract image URLs (default: false) |
max_content_length | integer | No | Maximum content length to extract (default: 50000) |
Response
{
"success": true,
"data": {
"browse_session": {
"id": "browse_xyz789abc",
"url": "https://example.com/financial-news/market-update",
"chat_id": 123,
"message_id": 456,
"purpose": "Retrieve current market information for client query",
"status": "completed",
"created_at": "2025-03-18T14:30:00Z",
"completed_at": "2025-03-18T14:30:15Z"
},
"web_page": {
"id": "webpage_abc123xyz",
"url": "https://example.com/financial-news/market-update",
"title": "Market Update: Q1 2025 Financial Outlook",
"description": "Latest analysis of market trends and financial outlook for Q1 2025",
"content": "The financial markets have shown resilience in Q1 2025, with major indices posting gains...",
"text_content": "Market Update: Q1 2025 Financial Outlook\n\nThe financial markets have shown resilience...",
"word_count": 1250,
"metadata": {
"author": "Jane Smith",
"published_date": "2025-03-18T08:00:00Z",
"publisher": "Financial News Today",
"language": "en",
"keywords": ["market", "financial", "Q1", "2025", "outlook"],
"canonical_url": "https://example.com/financial-news/market-update"
},
"extracted_links": [
{
"url": "https://example.com/sec-filing/xyz-corp",
"text": "XYZ Corp SEC Filing",
"type": "internal"
},
{
"url": "https://sec.gov/edgar/search/",
"text": "SEC EDGAR Database",
"type": "external"
}
],
"images": [
{
"url": "https://example.com/images/market-chart-q1-2025.png",
"alt": "Market performance chart Q1 2025",
"caption": "Market trends for Q1 2025"
}
],
"last_updated": "2025-03-18T14:30:15Z"
}
},
"message": "Web page browsed successfully"
}
Browsing History
Retrieve browsing history for a user or chat.
Get paginated browsing history with optional filtering.
Headers
Authorization: Bearer <your_api_token>
Query Parameters
Parameter | Type | Required | Description |
---|---|---|---|
chat_id | integer | No | Filter by specific chat ID |
status | string | No | Filter by status: pending, completed, failed |
date_from | string | No | Start date filter (ISO 8601 format) |
date_to | string | No | End date filter (ISO 8601 format) |
page | integer | No | Page number (default: 1) |
per_page | integer | No | Results per page (default: 20, max: 100) |
Response
{
"success": true,
"data": [
{
"id": "browse_xyz789abc",
"url": "https://example.com/financial-news/market-update",
"title": "Market Update: Q1 2025 Financial Outlook",
"chat_id": 123,
"message_id": 456,
"status": "completed",
"word_count": 1250,
"created_at": "2025-03-18T14:30:00Z",
"completed_at": "2025-03-18T14:30:15Z"
},
{
"id": "browse_abc456def",
"url": "https://sec.gov/news/press-release/2025-12",
"title": "SEC Announces New Investment Advisor Rules",
"chat_id": 124,
"message_id": 789,
"status": "completed",
"word_count": 890,
"created_at": "2025-03-18T13:15:00Z",
"completed_at": "2025-03-18T13:15:08Z"
}
],
"meta": {
"current_page": 1,
"per_page": 20,
"total": 2,
"total_pages": 1,
"has_next_page": false,
"has_previous_page": false
}
}
Browse Session Details
Get detailed information about a specific browsing session.
Retrieve detailed information about a browsing session.
Headers
Authorization: Bearer <your_api_token>
Path Parameters
Parameter | Type | Required | Description |
---|---|---|---|
session_id | string | Yes | Unique identifier for the browsing session |
Response
{
"success": true,
"data": {
"id": "browse_xyz789abc",
"url": "https://example.com/financial-news/market-update",
"chat_id": 123,
"message_id": 456,
"purpose": "Retrieve current market information for client query",
"status": "completed",
"web_page": {
"id": "webpage_abc123xyz",
"title": "Market Update: Q1 2025 Financial Outlook",
"description": "Latest analysis of market trends and financial outlook for Q1 2025",
"content": "The financial markets have shown resilience in Q1 2025...",
"word_count": 1250,
"metadata": {
"author": "Jane Smith",
"published_date": "2025-03-18T08:00:00Z",
"publisher": "Financial News Today",
"language": "en",
"keywords": ["market", "financial", "Q1", "2025", "outlook"]
}
},
"processing_time": 15.2,
"created_at": "2025-03-18T14:30:00Z",
"completed_at": "2025-03-18T14:30:15Z"
}
}
Refresh Web Page
Re-browse a previously accessed web page to get updated content.
Refresh the content of a previously browsed web page.
Headers
Authorization: Bearer <your_api_token>
Path Parameters
Parameter | Type | Required | Description |
---|---|---|---|
session_id | string | Yes | Unique identifier for the browsing session to refresh |
Response
{
"success": true,
"data": {
"browse_session": {
"id": "browse_xyz789abc_refresh",
"original_session_id": "browse_xyz789abc",
"url": "https://example.com/financial-news/market-update",
"status": "completed",
"created_at": "2025-03-18T16:45:00Z",
"completed_at": "2025-03-18T16:45:12Z"
},
"changes_detected": true,
"content_diff": {
"added_sections": ["New section: Market Predictions for Q2"],
"removed_sections": [],
"modified_sections": ["Updated market data in Introduction"]
}
},
"message": "Web page refreshed successfully"
}
Supported Content Types
The Web Browsing API can extract content from various types of web pages:
Content Type | Supported | Notes |
---|---|---|
HTML Pages | ✅ Yes | Full content extraction with metadata |
News Articles | ✅ Yes | Optimized for article structure |
Blog Posts | ✅ Yes | Extracts main content and metadata |
PDF Documents | ⚠️ Limited | Basic text extraction only |
JavaScript SPAs | ⚠️ Limited | Static content only, no JS execution |
Paywalled Content | ❌ No | Cannot bypass subscription walls |
Social Media | ❌ No | Blocked by platform policies |
Error Handling
The Web Browsing API returns specific error codes for different failure scenarios:
Error Code | HTTP Status | Description |
---|---|---|
invalid_url | 422 | The provided URL is not valid or accessible |
url_blocked | 403 | The URL is blocked by content policy |
page_not_found | 404 | The web page could not be found |
content_too_large | 413 | The web page content exceeds size limits |
browse_timeout | 408 | The browsing operation timed out |
rate_limit_exceeded | 429 | Too many browsing requests |
Example Error Response
{
"success": false,
"error": {
"code": "url_blocked",
"message": "The requested URL is blocked by content policy",
"details": {
"url": "https://example.com/blocked-content",
"reason": "Content type not supported for financial services"
}
}
}
Rate Limits
Web Browsing API endpoints have the following rate limits:
- Browse Page: 10 requests per minute
- Browsing History: 60 requests per minute
- Session Details: 60 requests per minute
- Refresh Page: 5 requests per minute
Best Practices
- Verify URLs: Ensure URLs are valid and accessible before browsing
- Handle Timeouts: Implement proper timeout handling for slow-loading pages
- Cache Content: Cache frequently accessed content to reduce API calls
- Respect Rate Limits: Implement exponential backoff for rate limit errors
- Content Length: Use max_content_length parameter to control response size
- Monitor Failures: Track and log browsing failures for debugging