🧙♂️Creating Tests
Learn how to build test prompts using commands, AI instructions, and withVision: steps—each optimized for different UI types and automation needs.
After successfully uploading a build file and configuring your device settings, proceed by navigating to the Test Editor, as shown in the screenshots below. This is where your journey begins, crafting your first test prompt.
Creating a New Test
Step 1: Choose Add a new test
from the test overview.

Step 2: Choose the build for which you would like to create the test.

Step 3: When you are ready to begin, select 'Add new test!'

Important: Remember to save your test prompt frequently while editing. Exiting the test editor without saving will result in the loss of any recent changes.
Step Types Overview
GPT Driver supports three types of steps. Each is suited to different UI complexities.
🧱 Command-Based Steps
To get started with Commands, check out this Quick Start (3 min): Video walkthrough of the Basics:
Direct actions on the app, no language model involved
Use the app’s UI hierarchy (element IDs or text).
Fast, deterministic, and free to execute.
Ideal for stable, predictable flows.
If a command fails (e.g. due to a popup or missing element), GPT Driver automatically falls back to an AI instruction.
Examples
tapOn.id: "com.spotify.music:id/email"
tapOn: "text"
type: "com.spotify.music:id/email"
→ Explore command syntax and supported actions
👁️ withVision: Instructions
Use vision-language models to analyze the full screen like a human would.
Ideal for icon-based, image-heavy, or fast-changing UIs without accessible text or IDs.
Interprets layout, color, spatial relationships, and unlabeled icons.
Can perform high-level instructions across UI variations.
Examples
High-level instruction (community creation):
withVision: tap the plus icon to add a new community, enter the name, and tap save
Business logic (wallet app):
withVision: Remember the $ balance on the card, perform a transaction, then verify that the balance has changed
graphical UI (trading app):
withVision: Verify that the chart displayed on the trading screen is an area chart, characterized by a continuous line graph with shading beneath the line, and confirm that no candlesticks are visible.
→ Learn how to use withVision:
🧠 AI Instructions
Use large language models (LLMs) to understand test prompts and determine next actions
Use OCR (Optical Character Recognition) – For reading on-screen text
Use UI element detection and icon recognition to interpret the screen. This allows GPT Driver to identify elements such as buttons, toggles, and inputs even when element IDs are missing or changing.
Best for flexible, text-based UIs where element IDs are missing or text may change.
Suitable for:
Conditional logic (e.g. handling popups)
So we only need 1 test rather than several for different variations of the same flow
Business rule checks
Dynamic content flows (e.g. survey or quiz)
Examples
Dynamic scenario (education app):
Complete the lesson until you see "Great Job!" or "n-day streak",
then scroll to and tap “Continue”.
Business logic (news app):
Verify that search results only show articles from the business section.
→ Read more about AI Instructions
General Guideline:
Use Commands for fast, stable, UI-hierarchy-based steps.
Use
withVision:
when relying on visual layout, icons, or dynamic designs or for high-level instructionsUse AI Instructions for handling frequently changing text elements or conditional steps.
Last updated