βš™οΈUnder the Hood

This section explains the inner workings of our test automation engine for engineers and advanced users, covering its logic, safeguards, and design choices.

πŸ” Command Handling

When a test step includes a supported command, the system always processes it first β€” before using any AI-based behavior.

  • Commands must follow the specific syntax defined in the Commands Reference.

  • During this phase, the system tries to resolve the target element by:

    • Searching for an element ID

    • Or matching visible text from the UI hierarchy

  • This search lasts up to 5 seconds.

  • During this phase, no AI models are used.

If the element cannot be found within this initial phase, the system then proceeds to AI-based handling (described in AI Decision-Making).


🧩 Element Detection Logic

Before executing a command, the system first attempts to locate the relevant UI element using element ID or visible text β€” without invoking AI. The detection process works in stages:

  1. Primary Lookup – The system first searches for elements using:

    1. Internal element IDs

    2. Visible text on the screen

    This stage lasts for up to 5 seconds.

  2. Fallback to AI – If no matching element is found within that time:

    1. The system triggers AI-based assistance (see next section).


πŸ€– AI Decision-Making

If traditional lookup fails, AI steps in to try to accomplish the intended action:

  • Wait & Retry: AI will wait up to 3 seconds, up to 2 times, to give the UI a chance to settle.

  • Scroll Behavior: If it suspects the needed element is off-screen, it will automatically scroll.

  • Popup Handling: AI can dismiss overlays, modals, or popups if they appear to block interaction.

  • Goal-Oriented: The AI's decisions are guided by an understanding of what the test step is trying to achieve β€” not just what’s on screen.


🧷 Screen Stability Handling

Before taking any further action after a command or AI intervention, the system checks for UI stability:

  • It waits up to 3 seconds for the screen to stabilize.

  • A screen is considered unstable if there are ongoing animations, loading spinners, or blinking cursors.

  • If the screen hasn’t stabilized within the 3-second window, the system proceeds anyway, assuming forward progress is better than stalling indefinitely.


🧱 Fail-Safe Conditions

To ensure reliable, predictable behavior, the system enforces several hard limits:

  • Cloud Run Time Limit: Automation runs will fail if they exceed 60 minutes.

  • Identical Screenshots: If the same screenshot appears 10 times in a row, the test is marked as failed (assumed to be stuck).

  • Repeated Step Count: If the same test step number executes 10 times, the run is failed to avoid infinite loops.


🧠 AI Model Stack

AI plays a key role in automation resilience. The system uses a combination of specialized models:

  • OCR (Optical Character Recognition) – For reading on-screen text

  • UI Element Detection – For identifying components like buttons, inputs, toggles

  • Icon Recognition – For detecting familiar UI icons

  • LLMs (Large Language Models) – To understand test prompts and determine next actions

βž• withVision: Mode

When the withVision: feature is used:

  • A single, powerful model is used that can natively understand visuals.

  • Ideal for visual assertions, such as:

    • Checking color schemes

    • Verifying layout alignment

    • Detecting images or custom graphics

  • Reduces the need to stitch together results from multiple models.

Last updated