Creating Tests
Learn how to build test prompts using commands, AI instructions, and withVision: steps—each optimized for different UI types and automation needs.
Last updated
Learn how to build test prompts using commands, AI instructions, and withVision: steps—each optimized for different UI types and automation needs.
Last updated
After successfully and , proceed by navigating to the , as shown in the screenshots below. This is where your journey begins, crafting your first test prompt.
Step 1: Choose Add a new test
from the test overview.
Step 2: Choose the build for which you would like to create the test.
Step 3: When you are ready to begin, select 'Add new test!'
Important: Remember to save your test prompt frequently while editing. Exiting the test editor without saving will result in the loss of any recent changes.
GPT Driver supports three types of steps. Each is suited to different UI complexities.
Direct actions on the app, no language model involved
Use the app’s UI hierarchy (element IDs or text).
Fast, deterministic, and free to execute.
Ideal for stable, predictable flows.
If a command fails (e.g. due to a popup or missing element), GPT Driver automatically falls back to an AI instruction.
Examples
Use vision-language models to analyze the full screen like a human would.
Ideal for icon-based, image-heavy, or fast-changing UIs without accessible text or IDs.
Interprets layout, color, spatial relationships, and unlabeled icons.
Can perform high-level instructions across UI variations.
Examples
High-level instruction (community creation):
Business logic (wallet app):
graphical UI (trading app):
Use large language models (LLMs) to understand test prompts and determine next actions
Use OCR (Optical Character Recognition) – For reading on-screen text
Use UI element detection and icon recognition to interpret the screen. This allows GPT Driver to identify elements such as buttons, toggles, and inputs even when element IDs are missing or changing.
Best for flexible, text-based UIs where element IDs are missing or text may change.
Suitable for:
Conditional logic (e.g. handling popups)
So we only need 1 test rather than several for different variations of the same flow
Business rule checks
Dynamic content flows (e.g. survey or quiz)
Examples
Dynamic scenario (education app):
Business logic (news app):
Use Commands for fast, stable, UI-hierarchy-based steps.
Use withVision:
when relying on visual layout, icons, or dynamic designs or for high-level instructions
Use AI Instructions for handling frequently changing text elements or conditional steps.
To get started with Commands, :
→
→
→