WebElement Domain
The WebElement domain is a cornerstone of Pydoll's architecture, providing a rich representation of DOM elements that allows for intuitive and powerful interactions with web page components. This domain bridges the gap between high-level automation code and the underlying DOM elements rendered by the browser.
graph TB
Client[User Code] --> Page[Page Domain]
Page --> FindElement[FindElementsMixin]
FindElement --> WebElement[WebElement Domain]
WebElement --> DOM[Browser DOM]
WebElement --> Properties[Properties & Attributes]
WebElement --> Interactions[User Interactions]
WebElement --> State[Element State]
WebElement --> TextOperations[Text Operations]
class WebElement stroke:#4CAF50,stroke-width:3px
Understanding WebElement
At its core, a WebElement represents a snapshot of a DOM element within a page. Unlike traditional DOM references in JavaScript, a WebElement in Pydoll is:
- Asynchronous - All interactions follow Python's async/await pattern
- Persistent - Maintains a reference to the element across page changes
- Self-contained - Encapsulates all operations possible on a DOM element
- Intelligent - Implements specialized handling for different element types
Each WebElement instance maintains several crucial pieces of information:
class WebElement(FindElementsMixin):
def __init__(
self,
object_id: str,
connection_handler: ConnectionHandler,
method: str = None,
selector: str = None,
attributes_list: list = [],
):
self._object_id = object_id
self._search_method = method
self._selector = selector
self._connection_handler = connection_handler
self._attributes = {}
self._last_input = ''
self._command_id = 0
self._def_attributes(attributes_list)
The core components include:
- The object_id
provides a remote JavaScript reference to the element
- The connection_handler
enables communication with the browser
- The _search_method
and _selector
track how the element was found
- The _attributes
dictionary stores element attributes
- The _last_input
remembers the last text input (useful for backspacing)
By inheriting from FindElementsMixin
, each WebElement can also function as a starting point for finding child elements.
Technical Architecture
The WebElement domain combines several key design patterns to provide a robust and flexible API:
classDiagram
class WebElement {
-_object_id: str
-_search_method: str
-_selector: str
-_connection_handler: ConnectionHandler
-_attributes: dict
-_last_input: str
-_command_id: int
+click()
+click_using_js()
+type_keys(text: str)
+get_attribute(name: str)
+text
+inner_html
+value
+id
+class_name
+is_enabled
}
class FindElementsMixin {
+find_element(by: By, value: str, raise_exc: bool = True)
+find_elements(by: By, value: str, raise_exc: bool = True)
+wait_element(by: By, value: str, timeout: int, raise_exc: bool = True)
+wait_elements(by: By, value: str, timeout: int, raise_exc: bool = True)
}
class ConnectionHandler {
+execute_command(command: dict)
}
WebElement --|> FindElementsMixin : inherits
WebElement *-- ConnectionHandler : uses
The architectural design follows several key principles:
- Command Pattern - Element interactions are translated into CDP commands
- Property System - Combines synchronous attribute access with asynchronous DOM property retrieval
- Mixin Inheritance - Inherits element finding capabilities through the FindElementsMixin
- Bridge Pattern - Abstracts the CDP protocol details from the user-facing API
Attribute Management
A unique aspect of WebElement's design is how it handles HTML attributes:
def _def_attributes(self, attributes_list: list):
"""
Defines element attributes from a flat list of key-value pairs.
"""
for i in range(0, len(attributes_list), 2):
key = attributes_list[i]
key = key if key != 'class' else 'class_name'
value = attributes_list[i + 1]
self._attributes[key] = value
This approach:
1. Processes attributes during element creation
2. Provides fast, synchronous access to common attributes
3. Handles Python reserved keywords (like class
→ class_name
)
4. Forms the basis for the element's string representation
Attribute vs. Property Access
WebElement provides two complementary ways to access element data:
- Attribute Dictionary: Fast, synchronous access to HTML attributes available at element creation
- Asynchronous Properties: Dynamic access to current DOM state through CDP commands
Core Interaction Patterns
The WebElement domain provides several categories of interactions:
Element Properties
WebElement offers both synchronous and asynchronous property access:
# Synchronous properties (from attributes present at element creation)
element_id = element.id
element_class = element.class_name
is_element_enabled = element.is_enabled
element_value = element.value
# Asynchronous properties (retrieved from live DOM)
element_text = await element.text
element_html = await element.inner_html
element_bounds = await element.bounds
The implementation balances performance and freshness by determining which properties should be synchronous (static HTML attributes) and which should be asynchronous (dynamic DOM state):
@property
async def text(self) -> str:
"""
Retrieves the text of the element.
"""
outer_html = await self.inner_html
soup = BeautifulSoup(outer_html, 'html.parser')
return soup.get_text(strip=True)
@property
def id(self) -> str:
"""
Retrieves the id of the element.
"""
return self._attributes.get('id')
Mouse Interactions
WebElement provides multiple ways to interact with elements through mouse events:
# Standard click at element center
await element.click()
# Click with offset from center
await element.click(x_offset=10, y_offset=5)
# Click with longer hold time (like for long press)
await element.click(hold_time=1.0)
# JavaScript-based click (useful for elements that are difficult to click)
await element.click_using_js()
The implementation intelligently handles different element types and visibility states:
async def click(
self,
x_offset: int = 0,
y_offset: int = 0,
hold_time: float = 0.1,
):
"""
Clicks on the element using mouse events.
"""
if self._is_option_tag():
return await self.click_option_tag()
if not await self._is_element_visible():
raise exceptions.ElementNotVisible(
'Element is not visible on the page.'
)
await self.scroll_into_view()
# Get element position and calculate click point
# ... (position calculation code)
# Send mouse press and release events
press_command = InputCommands.mouse_press(*position_to_click)
release_command = InputCommands.mouse_release(*position_to_click)
await self._connection_handler.execute_command(press_command)
await asyncio.sleep(hold_time)
await self._connection_handler.execute_command(release_command)
Special Element Handling
The WebElement implementation includes specialized handling for different element types:
Keyboard Interactions
WebElement provides multiple ways to input text into form elements:
# Quick text insertion
await element.insert_text("Hello, world!")
# Realistic typing with configurable speed
await element.type_keys("Hello, world!", interval=0.05)
# Raw keyboard events
await element.send_keys("Hello")
# Special key combinations
await element.send_keys(("Control", 65)) # Ctrl+A
# Clearing previous input
await element.backspace()
The implementation tracks input state to enable operations like backspacing:
async def backspace(self, interval: float = 0.1):
"""
Backspaces the key at the element.
"""
for _ in range(len(self._last_input)):
await self.send_keys(('Backspace', 8))
await asyncio.sleep(interval)
self._last_input = self._last_input[:-1]
File Upload Handling
For file input elements, WebElement provides a specialized method:
Visual Capabilities
Element Screenshots
WebElement can capture screenshots of specific elements:
This implementation involves: 1. Getting the element's bounds 2. Creating a clip region for the screenshot 3. Taking a screenshot of just that region 4. Saving the image to the specified path
async def get_screenshot(self, path: str):
"""
Takes a screenshot of the element.
"""
bounds = await self.get_bounds_using_js()
clip = {
'x': bounds['x'],
'y': bounds['y'],
'width': bounds['width'],
'height': bounds['height'],
'scale': 1,
}
screenshot = await self._connection_handler.execute_command(
PageCommands.screenshot(fmt='jpeg', clip=clip)
)
async with aiofiles.open(path, 'wb') as file:
image_bytes = decode_image_to_bytes(screenshot['result']['data'])
await file.write(image_bytes)
Multiple Bounds Methods
WebElement provides two ways to get element bounds:
JavaScript Integration
WebElement provides seamless integration with JavaScript for operations that require direct DOM interaction:
# Execute JavaScript in the context of this element
await element._execute_script("this.style.border = '2px solid red';")
# Get result from JavaScript execution
visibility = await element._is_element_visible()
The implementation uses the CDP Runtime domain to execute JavaScript with the element as the context:
async def _execute_script(
self, script: str, return_by_value: bool = False
):
"""
Executes a JavaScript script in the context of this element.
"""
return await self._execute_command(
RuntimeCommands.call_function_on(
self._object_id, script, return_by_value
)
)
Element State Verification
WebElement provides methods to check the element's visibility and interactability:
# Check if element is visible
is_visible = await element._is_element_visible()
# Check if element is the topmost at its position
is_on_top = await element._is_element_on_top()
These verifications are crucial for reliable automation, ensuring that elements can be interacted with before attempting operations.
Position and Scrolling
The WebElement domain includes methods for positioning and scrolling:
# Scroll element into view
await element.scroll_into_view()
# Get element bounds
bounds = await element.bounds
These capabilities ensure that elements are visible in the viewport before interaction, mimicking how a real user would interact with a page.
Performance and Reliability Considerations
The WebElement domain balances performance and reliability through several key strategies:
Smart Fallbacks
Many methods implement multiple approaches to ensure operations succeed even in challenging scenarios:
async def click(self, ...):
# Try using CDP mouse events first
# If that fails, fallback to JavaScript click
# If that fails, provide a clear error message
Appropriate Context Selection
The implementation chooses the most appropriate context for each operation:
Operation | Approach | Rationale |
---|---|---|
Get Text | Parse HTML with BeautifulSoup | More accurate text extraction |
Click | Mouse events via CDP | Most realistic user simulation |
Select Option | Specialized JavaScript | Required for dropdown elements |
Check Visibility | JavaScript | Most reliable across browser variations |
Command Batching
Where possible, operations are combined to reduce round-trips to the browser:
# Get element bounds in a single operation
bounds = await element.get_bounds_using_js()
# Calculate position in local code without additional browser calls
position_to_click = (
bounds['x'] + bounds['width'] / 2,
bounds['y'] + bounds['height'] / 2,
)
Conclusion
The WebElement domain provides a comprehensive and intuitive interface for interacting with elements in a web page. By encapsulating the complexities of DOM interaction, event handling, and state management, it allows automation code to focus on high-level tasks rather than low-level details.
The domain demonstrates several key design principles:
- Abstraction - Hides the complexity of CDP commands behind a clean API
- Specialization - Provides unique handling for different element types
- Hybrid Access - Balances synchronous and asynchronous operations for optimal performance
- Resilience - Implements fallback strategies for common operations
When used in conjunction with the Page domain and Browser domain, WebElement creates a powerful toolset for web automation that handles the complexities of modern web applications while providing a straightforward and reliable API for developers.