Network Capabilities
Pydoll provides powerful capabilities for monitoring, intercepting, and manipulating network traffic during browser automation. These features give you fine-grained control over how your browser communicates with the web, enabling advanced use cases like request modification, response analysis, and network optimization.
Network Architecture Overview
Pydoll's network capabilities are built on top of the Chrome DevTools Protocol (CDP), which provides a direct interface to the browser's internal networking stack. This architecture eliminates the limitations of traditional proxy-based approaches and enables real-time monitoring and modification of requests and responses.
flowchart TB
subgraph Browser["Chrome/Edge Browser"]
Net["Network Stack"] --> CDP["Chrome DevTools Protocol"]
end
subgraph Pydoll["Pydoll Library"]
CDP --> NetMon["Network Monitoring"]
CDP --> Interception["Request Interception"]
CDP --> Headers["Headers Manipulation"]
CDP --> Body["Body Modification"]
CDP --> Emulation["Network Condition Emulation"]
end
subgraph UserCode["User Automation Code"]
NetMon --> Analysis["Traffic Analysis"]
Interception --> Auth["Authentication Handling"]
Headers --> CustomHeaders["Custom Headers Injection"]
Body --> DataModification["Request/Response Data Modification"]
Emulation --> Testing["Network Condition Testing"]
end
class Browser,Pydoll,UserCode rounded
class Browser blue
class Pydoll green
class UserCode orange
The network capabilities in Pydoll can be organized into two main categories:
- Network Monitoring: Passive observation of network activity
- Request Interception: Active modification of network requests and responses
Network Monitoring
Network monitoring allows you to observe and analyze the network activity of your browser session without modifying it. This is useful for understanding how a website loads resources, detecting API endpoints, or troubleshooting performance issues.
Enabling Network Monitoring
To start monitoring network activity, you need to enable network events:
import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.protocol.network.events import NetworkEvent
from functools import partial
async def main():
async with Chrome() as browser:
tab = await browser.start()
# Enable network monitoring
await tab.enable_network_events()
# Navigate to a page
await tab.go_to('https://example.com')
print("Network monitoring enabled and page loaded")
asyncio.run(main())
When you enable network events, Pydoll automatically captures information about all network requests, including:
- URLs
- HTTP methods
- Request headers
- Status codes
- Response sizes
- Content types
- Timing information
Network Event Callbacks
You can register callbacks to be notified about specific network events in real-time:
from pydoll.protocol.network.events import NetworkEvent
from functools import partial
# Define a callback to handle request events
async def on_request(tab, event):
url = event['params']['request']['url']
method = event['params']['request']['method']
print(f"{method} request to: {url}")
# You can access request headers
headers = event['params']['request'].get('headers', {})
if 'content-type' in headers:
print(f"Content-Type: {headers['content-type']}")
# Define a callback to handle response events
async def on_response(tab, event):
url = event['params']['response']['url']
status = event['params']['response']['status']
print(f"Response from {url}: Status {status}")
# Extract response timing information
timing = event['params']['response'].get('timing')
if timing:
total_time = timing['receiveHeadersEnd'] - timing['requestTime']
print(f"Request completed in {total_time:.2f}s")
async def main():
async with Chrome() as browser:
tab = await browser.start()
# Register the callbacks
await tab.enable_network_events()
await tab.on(NetworkEvent.REQUEST_WILL_BE_SENT, partial(on_request, tab))
await tab.on(NetworkEvent.RESPONSE_RECEIVED, partial(on_response, tab))
# Navigate to trigger network activity
await tab.go_to('https://example.com')
# Wait to see network activity
await asyncio.sleep(5)
asyncio.run(main())
Key Network Events
Pydoll provides access to a wide range of network-related events:
Event Constant | Description | Useful Information Available |
---|---|---|
NetworkEvent.REQUEST_WILL_BE_SENT |
Fired when a request is about to be sent | URL, method, headers, POST data |
NetworkEvent.RESPONSE_RECEIVED |
Fired when HTTP response is available | Status code, headers, MIME type, timing |
NetworkEvent.LOADING_FAILED |
Fired when a request fails | Error information, canceled status |
NetworkEvent.LOADING_FINISHED |
Fired when a request completes | Encoding, compressed data size |
NetworkEvent.RESOURCE_CHANGED_PRIORITY |
Fired when resource loading priority changes | New priority level |
NetworkEvent.WEBSOCKET_CREATED |
Fired when a WebSocket is created | URL, initiator |
NetworkEvent.WEBSOCKET_FRAME_SENT |
Fired when a WebSocket frame is sent | Payload data |
NetworkEvent.WEBSOCKET_FRAME_RECEIVED |
Fired when a WebSocket frame is received | Response data |
Advanced Network Monitoring Example
Here's a more comprehensive example that tracks various network metrics:
import asyncio
import time
from pydoll.browser.chromium import Chrome
from pydoll.protocol.network.events import NetworkEvent
from functools import partial
async def main():
# Statistics counters
stats = {
'total_requests': 0,
'completed_requests': 0,
'failed_requests': 0,
'bytes_received': 0,
'request_types': {},
'status_codes': {},
'domains': {},
'start_time': time.time()
}
async def update_dashboard():
while True:
# Calculate elapsed time
elapsed = time.time() - stats['start_time']
# Clear console and print stats
print("\033c", end="") # Clear console
print(f"Network Activity Dashboard - Running for {elapsed:.1f}s")
print(f"Total Requests: {stats['total_requests']}")
print(f"Completed: {stats['completed_requests']} | Failed: {stats['failed_requests']}")
print(f"Data Received: {stats['bytes_received'] / 1024:.1f} KB")
print("\nRequest Types:")
for rtype, count in sorted(stats['request_types'].items(), key=lambda x: x[1], reverse=True):
print(f" {rtype}: {count}")
print("\nStatus Codes:")
for code, count in sorted(stats['status_codes'].items()):
print(f" {code}: {count}")
print("\nTop Domains:")
top_domains = sorted(stats['domains'].items(), key=lambda x: x[1], reverse=True)[:5]
for domain, count in top_domains:
print(f" {domain}: {count}")
await asyncio.sleep(1)
# Start the dashboard updater task
dashboard_task = asyncio.create_task(update_dashboard())
async with Chrome() as browser:
tab = await browser.start()
# Track request starts
async def on_request_sent(tab, event):
stats['total_requests'] += 1
# Track request type
resource_type = event['params'].get('type', 'Other')
stats['request_types'][resource_type] = stats['request_types'].get(resource_type, 0) + 1
# Track domain
url = event['params']['request']['url']
try:
from urllib.parse import urlparse
domain = urlparse(url).netloc
stats['domains'][domain] = stats['domains'].get(domain, 0) + 1
except:
pass
# Track responses
async def on_response(tab, event):
status = event['params']['response']['status']
stats['status_codes'][status] = stats['status_codes'].get(status, 0) + 1
# Track request completions
async def on_loading_finished(tab, event):
stats['completed_requests'] += 1
if 'encodedDataLength' in event['params']:
stats['bytes_received'] += event['params']['encodedDataLength']
# Track failures
async def on_loading_failed(tab, event):
stats['failed_requests'] += 1
# Register callbacks
await tab.enable_network_events()
await tab.on(NetworkEvent.REQUEST_WILL_BE_SENT, partial(on_request_sent, tab))
await tab.on(NetworkEvent.RESPONSE_RECEIVED, partial(on_response, tab))
await tab.on(NetworkEvent.LOADING_FINISHED, partial(on_loading_finished, tab))
await tab.on(NetworkEvent.LOADING_FAILED, partial(on_loading_failed, tab))
# Navigate to a page with lots of requests
await tab.go_to('https://news.ycombinator.com')
# Wait for user to press Enter to exit
await asyncio.sleep(60)
# Clean up
dashboard_task.cancel()
asyncio.run(main())
Request Interception and Modification
Request interception is where Pydoll's network capabilities truly shine. Unlike traditional browser automation tools that can only observe network traffic, Pydoll allows you to intercept and modify network requests before they are sent.
The Fetch Domain
The Fetch domain in the Chrome DevTools Protocol provides advanced functionality for intercepting and manipulating network requests. Pydoll exposes this functionality through a clean API that makes it easy to implement complex network manipulation scenarios.
sequenceDiagram
participant App as Application Code
participant Pydoll as Pydoll Library
participant Browser as Browser
participant Server as Web Server
App->>Pydoll: Enable fetch events
Pydoll->>Browser: FetchCommands.enable()
Browser-->>Pydoll: Enabled
App->>Pydoll: Register callback for REQUEST_PAUSED
App->>Pydoll: Navigate to URL
Pydoll->>Browser: Navigate command
Browser->>Browser: Initiates request
Browser->>Pydoll: Fetch.requestPaused event
Pydoll->>App: Execute callback
App->>Pydoll: Modify and continue request
Pydoll->>Browser: browser.continue_request() with modifications
Browser->>Server: Modified request
Server-->>Browser: Response
Browser-->>Pydoll: Complete
Pydoll-->>App: Continue execution
Enabling Request Interception
To intercept requests, you need to enable the Fetch domain:
import asyncio
from pydoll.browser.chromium import Chrome
from pydoll.protocol.fetch.events import FetchEvent
from functools import partial
async def main():
async with Chrome() as browser:
tab = await browser.start()
# Define a request interceptor
async def intercept_request(tab, event):
request_id = event['params']['requestId']
request = event['params']['request']
url = request['url']
print(f"Intercepted request to: {url}")
# You must continue the request to proceed
await browser.continue_request(request_id)
# Enable fetch events and register the interceptor
await tab.enable_fetch_events()
await tab.on(FetchEvent.REQUEST_PAUSED, partial(intercept_request, tab))
# Navigate to a page
await tab.go_to('https://example.com')
asyncio.run(main())
Always Continue Intercepted Requests
When intercepting requests, you must always call browser.continue_request()
, browser.fail_request()
, or browser.fulfill_request()
to resolve the intercepted request. If you don't, the browser will hang, waiting for a resolution of the intercepted request.
Interception Scope and Resource Types
You can limit the scope of request interception to specific resource types:
from pydoll.constants import ResourceType
# Intercept all requests (could be resource-intensive)
await tab.enable_fetch_events()
# Intercept only document (HTML) requests
await tab.enable_fetch_events(resource_type=ResourceType.DOCUMENT)
# Intercept only XHR/fetch API requests
await tab.enable_fetch_events(resource_type=ResourceType.XHR)
# Intercept only image requests
await tab.enable_fetch_events(resource_type=ResourceType.IMAGE)
Resource types available for interception:
Resource Type | Description | Common Examples |
---|---|---|
ResourceType.DOCUMENT |
Main HTML documents | HTML pages, iframes |
ResourceType.STYLESHEET |
CSS files | .css files |
ResourceType.IMAGE |
Image resources | .jpg, .png, .gif, .webp |
ResourceType.MEDIA |
Media files | .mp4, .webm, audio files |
ResourceType.FONT |
Font files | .woff, .woff2, .ttf |
ResourceType.SCRIPT |
JavaScript files | .js files |
ResourceType.TEXTTRACK |
Text track files | .vtt, .srt (captions, subtitles) |
ResourceType.XHR |
XMLHttpRequest calls | API calls, AJAX requests |
ResourceType.FETCH |
Fetch API requests | Modern API calls |
ResourceType.EVENTSOURCE |
Server-sent events | Stream connections |
ResourceType.WEBSOCKET |
WebSocket connections | Real-time communications |
ResourceType.MANIFEST |
Web app manifests | .webmanifest files |
ResourceType.OTHER |
Other resource types | Miscellaneous resources |
Request Modification Capabilities
When intercepting requests, you can modify various aspects of the request before it's sent to the server:
1. Modifying URL and Method
async def redirect_request(tab, event):
request_id = event['params']['requestId']
request = event['params']['request']
url = request['url']
# Redirect requests for one domain to another
if 'old-domain.com' in url:
new_url = url.replace('old-domain.com', 'new-domain.com')
print(f"Redirecting {url} to {new_url}")
await browser.continue_request(
request_id=request_id,
url=new_url
)
# Change GET to POST for specific endpoints
elif '/api/data' in url and request['method'] == 'GET':
print(f"Converting GET to POST for {url}")
await browser.continue_request(
request_id=request_id,
method='POST'
)
else:
# Continue normally
await browser.continue_request(request_id)
2. Adding or Modifying Headers
async def inject_headers(tab, event):
request_id = event['params']['requestId']
request = event['params']['request']
url = request['url']
# Get existing headers
headers = request.get('headers', {})
# Add or modify headers
custom_headers = [
{'name': 'X-Custom-Header', 'value': 'CustomValue'},
{'name': 'Authorization', 'value': 'Bearer your-token-here'},
{'name': 'User-Agent', 'value': 'Custom User Agent String'},
]
# Add existing headers to the list
for name, value in headers.items():
custom_headers.append({'name': name, 'value': value})
await browser.continue_request(
request_id=request_id,
headers=custom_headers
)
3. Modifying Request Body
import json
import time
async def modify_post_data(tab, event):
request_id = event['params']['requestId']
request = event['params']['request']
url = request['url']
method = request['method']
# Only process POST requests to specific endpoints
if method == 'POST' and '/api/submit' in url:
# Get the original post data, if any
original_post_data = request.get('postData', '{}')
try:
# Parse the original data
data = json.loads(original_post_data)
# Modify the data
data['additionalField'] = 'injected-value'
data['timestamp'] = int(time.time())
# Convert back to string
modified_post_data = json.dumps(data)
print(f"Modified POST data for {url}")
await browser.continue_request(
request_id=request_id,
post_data=modified_post_data
)
except json.JSONDecodeError:
# If not JSON, continue normally
await browser.continue_request(request_id)
else:
# Continue normally for non-POST requests
await browser.continue_request(request_id)
Failing and Fulfilling Requests
Besides continuing requests with modifications, you can also fail requests or fulfill them with custom responses:
Failing Requests
from pydoll.constants import NetworkErrorReason
async def block_requests(tab, event):
request_id = event['params']['requestId']
request = event['params']['request']
url = request['url']
# Block requests to tracking domains
blocked_domains = ['google-analytics.com', 'facebook.com/tr']
if any(domain in url for domain in blocked_domains):
print(f"Blocking request to: {url}")
await browser.fail_request(request_id, NetworkErrorReason.BLOCKED_BY_CLIENT)
else:
await browser.continue_request(request_id)
Fulfilling Requests with Custom Responses
async def mock_api_response(tab, event):
request_id = event['params']['requestId']
request = event['params']['request']
url = request['url']
# Mock API responses
if '/api/user' in url:
mock_response = {
'id': 123,
'name': 'Mock User',
'email': 'mock@example.com'
}
response_headers = [
{'name': 'Content-Type', 'value': 'application/json'},
{'name': 'Access-Control-Allow-Origin', 'value': '*'}
]
print(f"Mocking response for: {url}")
await browser.fulfill_request(
request_id=request_id,
response_code=200,
response_headers=response_headers,
response_body=json.dumps(mock_response)
)
else:
await browser.continue_request(request_id)
Authentication Handling
The Fetch domain can also intercept authentication challenges, allowing you to automatically handle HTTP authentication:
async def main():
async with Chrome() as browser:
tab = await browser.start()
# Define authentication handler
async def handle_auth(tab, event):
request_id = event['params']['requestId']
auth_challenge = event['params']['authChallenge']
print(f"Authentication required: {auth_challenge['origin']}")
# Provide credentials
await browser.continue_request_with_auth(
request_id=request_id,
auth_challenge_response='ProvideCredentials',
username="username",
password="password"
)
# Enable fetch events with auth handling
await tab.enable_fetch_events(handle_auth=True)
await tab.on(FetchEvent.AUTH_REQUIRED, partial(handle_auth, tab))
# Navigate to a page requiring authentication
await tab.go_to('https://protected-site.com')
Advanced Network Patterns
Comprehensive Request Interception Example
Here's a complete example that demonstrates various interception techniques:
import asyncio
import json
from pydoll.browser.chromium import Chrome
from pydoll.protocol.fetch.events import FetchEvent
from pydoll.constants import NetworkErrorReason, ResourceType
from functools import partial
async def main():
async with Chrome() as browser:
tab = await browser.start()
async def comprehensive_interceptor(tab, event):
request_id = event['params']['requestId']
request = event['params']['request']
url = request['url']
method = request['method']
print(f"Intercepting {method} request to: {url}")
# Block tracking scripts
if any(tracker in url for tracker in ['google-analytics', 'facebook.com/tr']):
print(f"Blocking tracker: {url}")
await browser.fail_request(request_id, NetworkErrorReason.BLOCKED_BY_CLIENT)
return
# Mock API responses
if '/api/config' in url:
mock_config = {
'feature_flags': {'new_ui': True, 'beta_features': True},
'api_version': '2.0'
}
await browser.fulfill_request(
request_id=request_id,
response_code=200,
response_headers=[
{'name': 'Content-Type', 'value': 'application/json'},
{'name': 'Cache-Control', 'value': 'no-cache'}
],
response_body=json.dumps(mock_config)
)
return
# Inject custom headers for API requests
if '/api/' in url:
headers = [
{'name': 'X-Custom-Client', 'value': 'Pydoll-Automation'},
{'name': 'X-Request-ID', 'value': f'req-{request_id}'}
]
# Preserve existing headers
for name, value in request.get('headers', {}).items():
headers.append({'name': name, 'value': value})
await browser.continue_request(
request_id=request_id,
headers=headers
)
return
# Continue all other requests normally
await browser.continue_request(request_id)
# Enable fetch events for XHR and Fetch requests only
await tab.enable_fetch_events(resource_type=ResourceType.XHR)
await tab.on(FetchEvent.REQUEST_PAUSED, partial(comprehensive_interceptor, tab))
# Navigate and interact with the page
await tab.go_to('https://example.com')
await asyncio.sleep(5) # Wait for network activity
asyncio.run(main())
Performance Considerations
While Pydoll's network capabilities are powerful, there are some performance considerations to keep in mind:
-
Selective Interception: Intercepting all requests can significantly slow down page loading. Be selective about which resource types you intercept.
-
Memory Management: Network event callbacks can consume memory if they store large amounts of data. Be mindful of memory usage in long-running automations.
-
Callback Efficiency: Keep your event callbacks efficient, especially for high-frequency events like network requests. Inefficient callbacks can slow down the entire automation process.
-
Cleanup: Always disable network and fetch events when you're done using them to prevent memory leaks.
# Enable events only when needed
await tab.enable_network_events()
await tab.enable_fetch_events(resource_type=ResourceType.XHR) # Only intercept XHR requests
# Do your automation work...
# Clean up when done
await tab.disable_network_events()
await tab.disable_fetch_events()
Best Practices
1. Use Resource Type Filtering Effectively
# Bad: Intercept all requests (performance impact)
await tab.enable_fetch_events()
# Good: Only intercept the specific resource types you need
await tab.enable_fetch_events(resource_type=ResourceType.XHR) # For API calls
await tab.enable_fetch_events(resource_type=ResourceType.DOCUMENT) # For main documents
2. Always Resolve Intercepted Requests
# Always resolve every intercepted request
async def intercept_handler(tab, event):
request_id = event['params']['requestId']
try:
# Make any modifications needed
custom_headers = [{'name': 'X-Custom', 'value': 'Value'}]
# Continue the request
await browser.continue_request(
request_id=request_id,
headers=custom_headers
)
except Exception as e:
print(f"Error in request handler: {e}")
# Always try to continue the request even if there was an error
try:
await browser.continue_request(request_id)
except:
pass
3. Implement Proper Error Handling
async def safe_network_handler(tab, event):
request_id = event['params']['requestId']
try:
# Your interception logic here
await process_request(event)
await browser.continue_request(request_id)
except Exception as e:
print(f"Error in request handler: {e}")
# Try to continue the request even if there was an error
try:
await browser.continue_request(request_id)
except:
# If we can't continue, try to fail it gracefully
try:
await browser.fail_request(request_id, NetworkErrorReason.FAILED)
except:
pass
4. Use Partial for Clean Callback Management
from functools import partial
# Define your handler with tab object as first parameter
async def handle_request(tab, config, event):
# Now you have access to both tab and custom config
request_id = event['params']['requestId']
if config['block_trackers'] and is_tracker(event['params']['request']['url']):
await browser.fail_request(request_id, NetworkErrorReason.BLOCKED_BY_CLIENT)
else:
await browser.continue_request(request_id)
# Register with partial to pre-bind parameters
config = {"block_trackers": True}
await tab.on(
FetchEvent.REQUEST_PAUSED,
partial(handle_request, tab, config)
)
Conclusion
Pydoll's network capabilities provide unprecedented control over browser network traffic, enabling advanced use cases that go beyond traditional browser automation. Whether you're monitoring API calls, injecting custom headers, or modifying request data, these features can greatly enhance your automation workflows.
By leveraging the power of the Chrome DevTools Protocol, Pydoll makes it easy to implement sophisticated network monitoring and interception patterns while maintaining high performance and reliability.
Remember to use these capabilities responsibly and consider the performance implications of extensive network monitoring and interception in your automation scripts.