GPT Driver User Guide
GPT Driver User Guide
GPT Driver User Guide
  • Getting Started
    • πŸš€Getting Started
    • πŸ”ƒUploading a Build File
    • πŸ§™β€β™‚οΈCreating Tests
      • Commands
        • Tap Command
        • Type Command
        • scroll Command
        • scrollUntilVisible Command
        • swipe Command
        • slide Command
        • wait Command
        • assertVisible Command
        • openLink Command
        • pressBackButton Command
        • launchApp Command
        • restartApp Command
      • πŸ‘οΈwithVision: Instructions
      • 🧠AI Instructions
    • 🏁Executing Tests
  • ☎️Device Configuration
  • βš™οΈUnder the Hood
    • Test Generation
    • Test Execution
  • Getting Around
    • ✏️Test Editor
    • πŸ›°οΈTest Overview
    • πŸ…Test Reports
    • ⏺️Test Recordings
    • πŸ‘€Settings
  • Best Practices
    • πŸ§‘β€πŸ’»API Documentation
    • Versioning
    • ↗️Templates
    • πŸ–‡οΈTest Dependencies & Prompt References
    • πŸ”—Deep Links
    • πŸ“§Email Verification
    • πŸ“‘Network Calls
    • πŸͺ‘Parameterized Strings
    • πŸ“Changing Device Location
    • πŸͺΆConditional Steps
    • 🐦Nested Steps
    • ⌚Smart Caching
    • πŸ—£οΈEnv. Variables
    • πŸ’―Bulk Step Testing for Robustness
    • πŸ“–Exact Text Assertions
    • πŸ’¬Auto-grant Permissions
  • πŸ§ͺMocking Network Data
  • 🌎Localization Testing
  • Code Generation
  • ❔FAQs
Powered by GitBook
On this page
  • πŸ” Command Handling
  • 🧩 Element Detection Logic
  • πŸ€– AI Decision-Making
  • 🧷 Screen Stability Handling
  • 🧱 Fail-Safe Conditions
  1. Under the Hood

Test Execution

Command-first automation logic with layered fallback: resolves UI elements by ID or text, then uses AI for retries, scrolling, and popups. Built-in fail-safes and visual model support included.

πŸ” Command Handling

When a test step includes a supported command, the system always processes it first β€” before using any AI-based behavior.

  • Commands must follow the specific syntax defined in the Commands Reference.

  • During this phase, the system tries to resolve the target element by:

    • Searching for an element ID

    • Or matching visible text from the UI hierarchy

  • This search lasts up to 5 seconds.

  • During this phase, no AI models are used.

If the element cannot be found within this initial phase, the system then proceeds to AI-based handling (described in AI Decision-Making).


🧩 Element Detection Logic

Before executing a command, the system first attempts to locate the relevant UI element using element ID or visible text β€” without invoking AI. The detection process works in stages:

  1. Primary Lookup – The system first searches for elements using:

    1. Internal element IDs

    2. Visible text on the screen

    This stage lasts for up to 5 seconds.

  2. Fallback to AI – If no matching element is found within that time:

    1. The system triggers AI-based assistance (see next section).


πŸ€– AI Decision-Making

If traditional lookup fails, AI steps in to try to accomplish the intended action:

  • Wait & Retry: AI will wait up to 3 seconds, up to 2 times, to give the UI a chance to settle.

  • Scroll Behavior: If it suspects the needed element is off-screen, it will automatically scroll.

  • Popup Handling: AI can dismiss overlays, modals, or popups if they appear to block interaction.

  • Goal-Oriented: The AI's decisions are guided by an understanding of what the test step is trying to achieve β€” not just what’s on screen.


🧷 Screen Stability Handling

Before taking any further action after a command or AI intervention, the system checks for UI stability:

  • It waits up to 3 seconds for the screen to stabilize.

  • A screen is considered unstable if there are ongoing animations, loading spinners, or blinking cursors.

  • If the screen hasn’t stabilized within the 3-second window, the system proceeds anyway, assuming forward progress is better than stalling indefinitely.


🧱 Fail-Safe Conditions

To ensure reliable, predictable behavior, the system enforces several hard limits:

  • Cloud Run Time Limit: Automation runs will fail if they exceed 60 minutes.

  • Identical Screenshots: If the same screenshot appears 10 times in a row, the test is marked as failed (assumed to be stuck).

  • Repeated Step Count: If the same test step number executes 10 times, the run is failed to avoid infinite loops.


🧠 AI Model Stack

AI plays a key role in automation resilience. The system uses a combination of specialized models:

  • OCR (Optical Character Recognition) – For reading on-screen text

  • UI Element Detection – For identifying components like buttons, inputs, toggles

  • Icon Recognition – For detecting familiar UI icons

  • LLMs (Large Language Models) – To understand test prompts and determine next actions

βž• withVision: Mode

When the withVision: feature is used:

  • A single, powerful model is used that can natively understand visuals.

  • Ideal for visual assertions, such as:

    • Checking color schemes

    • Verifying layout alignment

    • Detecting images or custom graphics

  • Reduces the need to stitch together results from multiple models.

PreviousTest GenerationNextTest Editor

Last updated 4 days ago

βš™οΈ