Under the Hood
This section explains the inner workings of our test automation engine for engineers and advanced users, covering its logic, safeguards, and design choices.
Last updated
This section explains the inner workings of our test automation engine for engineers and advanced users, covering its logic, safeguards, and design choices.
Last updated
When a test step includes a supported command, the system always processes it first β before using any AI-based behavior.
Commands must follow the specific syntax defined in the .
During this phase, the system tries to resolve the target element by:
Searching for an element ID
Or matching visible text from the UI hierarchy
This search lasts up to 5 seconds.
During this phase, no AI models are used.
If the element cannot be found within this initial phase, the system then proceeds to AI-based handling (described in ).
Before executing a command, the system first attempts to locate the relevant UI element using element ID or visible text β without invoking AI. The detection process works in stages:
Primary Lookup β The system first searches for elements using:
Internal element IDs
Visible text on the screen
This stage lasts for up to 5 seconds.
Fallback to AI β If no matching element is found within that time:
The system triggers AI-based assistance (see next section).
If traditional lookup fails, AI steps in to try to accomplish the intended action:
Wait & Retry: AI will wait up to 3 seconds, up to 2 times, to give the UI a chance to settle.
Scroll Behavior: If it suspects the needed element is off-screen, it will automatically scroll.
Popup Handling: AI can dismiss overlays, modals, or popups if they appear to block interaction.
Goal-Oriented: The AI's decisions are guided by an understanding of what the test step is trying to achieve β not just whatβs on screen.
Before taking any further action after a command or AI intervention, the system checks for UI stability:
It waits up to 3 seconds for the screen to stabilize.
A screen is considered unstable if there are ongoing animations, loading spinners, or blinking cursors.
If the screen hasnβt stabilized within the 3-second window, the system proceeds anyway, assuming forward progress is better than stalling indefinitely.
To ensure reliable, predictable behavior, the system enforces several hard limits:
Cloud Run Time Limit: Automation runs will fail if they exceed 60 minutes.
Identical Screenshots: If the same screenshot appears 10 times in a row, the test is marked as failed (assumed to be stuck).
Repeated Step Count: If the same test step number executes 10 times, the run is failed to avoid infinite loops.
AI plays a key role in automation resilience. The system uses a combination of specialized models:
OCR (Optical Character Recognition) β For reading on-screen text
UI Element Detection β For identifying components like buttons, inputs, toggles
Icon Recognition β For detecting familiar UI icons
LLMs (Large Language Models) β To understand test prompts and determine next actions
β withVision: Mode
A single, powerful model is used that can natively understand visuals.
Ideal for visual assertions, such as:
Checking color schemes
Verifying layout alignment
Detecting images or custom graphics
Reduces the need to stitch together results from multiple models.
When the is used: