GPT Driver User Guide
GPT Driver User Guide
GPT Driver User Guide
  • Getting Started
    • 🚀Getting Started
    • 🔃Uploading a Build File
    • 🧙‍♂️Creating Tests
      • Commands
        • Tap Command
        • Type Command
        • scroll Command
        • scrollUntilVisible Command
        • swipe Command
        • slide Command
        • wait Command
        • assertVisible Command
        • openLink Command
        • pressBackButton Command
        • launchApp Command
        • restartApp Command
      • 👁️withVision: Instructions
      • 🧠AI Instructions
    • 🏁Executing Tests
  • ☎️Device Configuration
  • ⚙️Under the Hood
    • Test Generation
    • Test Execution
  • Getting Around
    • ✏️Test Editor
    • 🛰️Test Overview
    • 🏅Test Reports
    • ⏺️Test Recordings
    • 👤Settings
  • Best Practices
    • 🧑‍💻API Documentation
    • Versioning
    • ↗️Templates
    • 🖇️Test Dependencies & Prompt References
    • 🔗Deep Links
    • 📧Email Verification
    • 📡Network Calls
    • 🪡Parameterized Strings
    • 📍Changing Device Location
    • 🪶Conditional Steps
    • 🐦Nested Steps
    • ⌚Smart Caching
    • 🗣️Env. Variables
    • 💯Bulk Step Testing for Robustness
    • 📖Exact Text Assertions
    • 💬Auto-grant Permissions
  • 🧪Mocking Network Data
  • 🌎Localization Testing
  • Code Generation
  • ❔FAQs
Powered by GitBook
On this page
  • Creating a New Test
  • Step Types Overview
  • 🧱 Command-Based Steps
  • 👁️ withVision: Instructions
  • 🧠 AI Instructions
  • General Guideline:
  1. Getting Started

Creating Tests

Learn how to build test prompts using commands, AI instructions, and withVision: steps—each optimized for different UI types and automation needs.

PreviousUploading a Build FileNextCommands

Last updated 1 month ago

After successfully and , proceed by navigating to the , as shown in the screenshots below. This is where your journey begins, crafting your first test prompt.

Creating a New Test

Step 1: Choose Add a new test from the test overview.

Step 2: Choose the build for which you would like to create the test.

Step 3: When you are ready to begin, select 'Add new test!'

Important: Remember to save your test prompt frequently while editing. Exiting the test editor without saving will result in the loss of any recent changes.


Step Types Overview

GPT Driver supports three types of steps. Each is suited to different UI complexities.

🧱 Command-Based Steps

  • Direct actions on the app, no language model involved

  • Use the app’s UI hierarchy (element IDs or text).

  • Fast, deterministic, and free to execute.

  • Ideal for stable, predictable flows.

  • If a command fails (e.g. due to a popup or missing element), GPT Driver automatically falls back to an AI instruction.

Examples

tapOn.id: "com.spotify.music:id/email"
tapOn: "text"
type: "com.spotify.music:id/email"

👁️ withVision: Instructions

  • Use vision-language models to analyze the full screen like a human would.

  • Ideal for icon-based, image-heavy, or fast-changing UIs without accessible text or IDs.

  • Interprets layout, color, spatial relationships, and unlabeled icons.

  • Can perform high-level instructions across UI variations.

Examples

  • High-level instruction (community creation):

withVision: tap the plus icon to add a new community, enter the name, and tap save
  • Business logic (wallet app):

withVision: Remember the $ balance on the card, perform a transaction, then verify that the balance has changed 
  • graphical UI (trading app):

withVision: Verify that the chart displayed on the trading screen is an area chart, characterized by a continuous line graph with shading beneath the line, and confirm that no candlesticks are visible.

🧠 AI Instructions

  • Use large language models (LLMs) to understand test prompts and determine next actions

  • Use OCR (Optical Character Recognition) – For reading on-screen text

  • Use UI element detection and icon recognition to interpret the screen. This allows GPT Driver to identify elements such as buttons, toggles, and inputs even when element IDs are missing or changing.

  • Best for flexible, text-based UIs where element IDs are missing or text may change.

  • Suitable for:

    • Conditional logic (e.g. handling popups)

      • So we only need 1 test rather than several for different variations of the same flow

    • Business rule checks

    • Dynamic content flows (e.g. survey or quiz)

Examples

  • Dynamic scenario (education app):

Complete the lesson until you see "Great Job!" or "n-day streak",
then scroll to and tap “Continue”.
  • Business logic (news app):

Verify that search results only show articles from the business section.


General Guideline:

  • Use Commands for fast, stable, UI-hierarchy-based steps.

  • Use withVision: when relying on visual layout, icons, or dynamic designs or for high-level instructions

  • Use AI Instructions for handling frequently changing text elements or conditional steps.

To get started with Commands, :

→

→

→

🧙‍♂️
check out this Quick Start (3 min): Video walkthrough of the Basics
Explore command syntax and supported actions
Learn how to use withVision:
Read more about AI Instructions
uploading a build file
configuring your device settings
Test Editor