βοΈUnder the Hood
This section explains the inner workings of our test automation engine for engineers and advanced users, covering its logic, safeguards, and design choices.
π Command Handling
When a test step includes a supported command, the system always processes it first β before using any AI-based behavior.
Commands must follow the specific syntax defined in the Commands Reference.
During this phase, the system tries to resolve the target element by:
Searching for an element ID
Or matching visible text from the UI hierarchy
This search lasts up to 5 seconds.
During this phase, no AI models are used.
If the element cannot be found within this initial phase, the system then proceeds to AI-based handling (described in AI Decision-Making).
π§© Element Detection Logic
Before executing a command, the system first attempts to locate the relevant UI element using element ID or visible text β without invoking AI. The detection process works in stages:
Primary Lookup β The system first searches for elements using:
Internal element IDs
Visible text on the screen
This stage lasts for up to 5 seconds.
Fallback to AI β If no matching element is found within that time:
The system triggers AI-based assistance (see next section).
π€ AI Decision-Making
If traditional lookup fails, AI steps in to try to accomplish the intended action:
Wait & Retry: AI will wait up to 3 seconds, up to 2 times, to give the UI a chance to settle.
Scroll Behavior: If it suspects the needed element is off-screen, it will automatically scroll.
Popup Handling: AI can dismiss overlays, modals, or popups if they appear to block interaction.
Goal-Oriented: The AI's decisions are guided by an understanding of what the test step is trying to achieve β not just whatβs on screen.
π§· Screen Stability Handling
Before taking any further action after a command or AI intervention, the system checks for UI stability:
It waits up to 3 seconds for the screen to stabilize.
A screen is considered unstable if there are ongoing animations, loading spinners, or blinking cursors.
If the screen hasnβt stabilized within the 3-second window, the system proceeds anyway, assuming forward progress is better than stalling indefinitely.
π§± Fail-Safe Conditions
To ensure reliable, predictable behavior, the system enforces several hard limits:
Cloud Run Time Limit: Automation runs will fail if they exceed 60 minutes.
Identical Screenshots: If the same screenshot appears 10 times in a row, the test is marked as failed (assumed to be stuck).
Repeated Step Count: If the same test step number executes 10 times, the run is failed to avoid infinite loops.
π§ AI Model Stack
AI plays a key role in automation resilience. The system uses a combination of specialized models:
OCR (Optical Character Recognition) β For reading on-screen text
UI Element Detection β For identifying components like buttons, inputs, toggles
Icon Recognition β For detecting familiar UI icons
LLMs (Large Language Models) β To understand test prompts and determine next actions
β withVision: Mode
When the withVision: feature is used:
A single, powerful model is used that can natively understand visuals.
Ideal for visual assertions, such as:
Checking color schemes
Verifying layout alignment
Detecting images or custom graphics
Reduces the need to stitch together results from multiple models.
Last updated