GPT Driver User Guide
GPT Driver User Guide
GPT Driver User Guide
  • Getting Started
    • πŸš€Getting Started
    • πŸ”ƒUploading a Build File
    • πŸ§™β€β™‚οΈCreating Tests
      • Commands
        • Tap Command
        • Type Command
        • scroll Command
        • scrollUntilVisible Command
        • swipe Command
        • slide Command
        • wait Command
        • assertVisible Command
        • openLink Command
        • pressBackButton Command
        • launchApp Command
        • restartApp Command
      • πŸ‘οΈwithVision: Instructions
      • 🧠AI Instructions
    • 🏁Executing Tests
  • ☎️Device Configuration
  • βš™οΈUnder the Hood
  • Getting Around
    • ✏️Test Editor
    • πŸ›°οΈTest Overview
    • πŸ…Test Reports
    • ⏺️Test Recordings
    • πŸ‘€Settings
  • Best Practices
    • πŸ§‘β€πŸ’»API Documentation
    • Versioning
    • ↗️Templates
    • πŸ–‡οΈTest Dependencies
    • πŸ”—Deep Links
    • πŸ“§Email Verification
    • πŸ“‘Network Calls
    • πŸͺ‘Parameterized Strings
    • πŸ“Changing Device Location
    • πŸͺΆConditional Steps
    • 🐦Nested Steps
    • ⌚Smart Caching
    • πŸ—£οΈEnv. Variables
    • πŸ’―Bulk Step Testing for Robustness
    • πŸ“–Exact Text Assertions
    • πŸ’¬Auto-grant Permissions
  • πŸ§ͺMocking Network Data
  • 🌎Localization Testing
  • ❔FAQs
Powered by GitBook
On this page
  • πŸ” Command Handling
  • 🧩 Element Detection Logic
  • πŸ€– AI Decision-Making
  • 🧷 Screen Stability Handling
  • 🧱 Fail-Safe Conditions

Under the Hood

This section explains the inner workings of our test automation engine for engineers and advanced users, covering its logic, safeguards, and design choices.

PreviousDevice ConfigurationNextTest Editor

Last updated 11 days ago

πŸ” Command Handling

When a test step includes a supported command, the system always processes it first β€” before using any AI-based behavior.

  • Commands must follow the specific syntax defined in the .

  • During this phase, the system tries to resolve the target element by:

    • Searching for an element ID

    • Or matching visible text from the UI hierarchy

  • This search lasts up to 5 seconds.

  • During this phase, no AI models are used.

If the element cannot be found within this initial phase, the system then proceeds to AI-based handling (described in ).


🧩 Element Detection Logic

Before executing a command, the system first attempts to locate the relevant UI element using element ID or visible text β€” without invoking AI. The detection process works in stages:

  1. Primary Lookup – The system first searches for elements using:

    1. Internal element IDs

    2. Visible text on the screen

    This stage lasts for up to 5 seconds.

  2. Fallback to AI – If no matching element is found within that time:

    1. The system triggers AI-based assistance (see next section).


πŸ€– AI Decision-Making

If traditional lookup fails, AI steps in to try to accomplish the intended action:

  • Wait & Retry: AI will wait up to 3 seconds, up to 2 times, to give the UI a chance to settle.

  • Scroll Behavior: If it suspects the needed element is off-screen, it will automatically scroll.

  • Popup Handling: AI can dismiss overlays, modals, or popups if they appear to block interaction.

  • Goal-Oriented: The AI's decisions are guided by an understanding of what the test step is trying to achieve β€” not just what’s on screen.


🧷 Screen Stability Handling

Before taking any further action after a command or AI intervention, the system checks for UI stability:

  • It waits up to 3 seconds for the screen to stabilize.

  • A screen is considered unstable if there are ongoing animations, loading spinners, or blinking cursors.

  • If the screen hasn’t stabilized within the 3-second window, the system proceeds anyway, assuming forward progress is better than stalling indefinitely.


🧱 Fail-Safe Conditions

To ensure reliable, predictable behavior, the system enforces several hard limits:

  • Cloud Run Time Limit: Automation runs will fail if they exceed 60 minutes.

  • Identical Screenshots: If the same screenshot appears 10 times in a row, the test is marked as failed (assumed to be stuck).

  • Repeated Step Count: If the same test step number executes 10 times, the run is failed to avoid infinite loops.


🧠 AI Model Stack

AI plays a key role in automation resilience. The system uses a combination of specialized models:

  • OCR (Optical Character Recognition) – For reading on-screen text

  • UI Element Detection – For identifying components like buttons, inputs, toggles

  • Icon Recognition – For detecting familiar UI icons

  • LLMs (Large Language Models) – To understand test prompts and determine next actions

βž• withVision: Mode

  • A single, powerful model is used that can natively understand visuals.

  • Ideal for visual assertions, such as:

    • Checking color schemes

    • Verifying layout alignment

    • Detecting images or custom graphics

  • Reduces the need to stitch together results from multiple models.

When the is used:

βš™οΈ
withVision: feature
Commands Reference
AI Decision-Making