Test Execution

Command-first automation logic with layered fallback: resolves UI elements by ID or text, then uses AI for retries, scrolling, and popups. Built-in fail-safes and visual model support included.

🔁 Command Handling

When a test step includes a supported command, the system always processes it first — before using any AI-based behavior.

Commands must follow the specific syntax defined in the Commands Reference.
During this phase, the system tries to resolve the target element by:
- Searching for an element ID
- Or matching visible text from the UI hierarchy
This search lasts up to 5 seconds.
During this phase, no AI models are used.

If the element cannot be found within this initial phase, the system then proceeds to AI-based handling (described in AI Decision-Making).

🧩 Element Detection Logic

Before executing a command, the system first attempts to locate the relevant UI element using element ID or visible text — without invoking AI. The detection process works in stages:

Primary Lookup – The system first searches for elements using:
1. Internal element IDs
2. Visible text on the screen
This stage lasts for up to 5 seconds.
Fallback to AI – If no matching element is found within that time:
1. The system triggers AI-based assistance (see next section).

🤖 AI Decision-Making

If traditional lookup fails, AI steps in to try to accomplish the intended action:

Wait & Retry: AI will wait up to 3 seconds, up to 2 times, to give the UI a chance to settle.
Scroll Behavior: If it suspects the needed element is off-screen, it will automatically scroll.
Popup Handling: AI can dismiss overlays, modals, or popups if they appear to block interaction.
Goal-Oriented: The AI's decisions are guided by an understanding of what the test step is trying to achieve — not just what’s on screen.

🧷 Screen Stability Handling

Before taking any further action after a command or AI intervention, the system checks for UI stability:

It waits up to 3 seconds for the screen to stabilize.
A screen is considered unstable if there are ongoing animations, loading spinners, or blinking cursors.
If the screen hasn’t stabilized within the 3-second window, the system proceeds anyway, assuming forward progress is better than stalling indefinitely.

🧱 Fail-Safe Conditions

To ensure reliable, predictable behavior, the system enforces several hard limits:

Cloud Run Time Limit: Automation runs will fail if they exceed 60 minutes.
Identical Screenshots: If the same screenshot appears 10 times in a row, the test is marked as failed (assumed to be stuck).
Repeated Step Count: If the same test step number executes 10 times, the run is failed to avoid infinite loops.

🧠 AI Model Stack

AI plays a key role in automation resilience. The system uses a combination of specialized models:

OCR (Optical Character Recognition) – For reading on-screen text
UI Element Detection – For identifying components like buttons, inputs, toggles
Icon Recognition – For detecting familiar UI icons
LLMs (Large Language Models) – To understand test prompts and determine next actions

➕ withVision: Mode

When the withVision: feature is used:

A single, powerful model is used that can natively understand visuals.
Ideal for visual assertions, such as:
- Checking color schemes
- Verifying layout alignment
- Detecting images or custom graphics
Reduces the need to stitch together results from multiple models.

PreviousTest Generation NextTest Editor

Last updated 5 months ago