🧙‍♂️Creating Tests

Learn how to build test prompts using commands, AI instructions, and withVision: steps—each optimized for different UI types and automation needs.

After successfully uploading a build file and configuring your device settings, proceed by navigating to the Test Editor, as shown in the screenshots below. This is where your journey begins, crafting your first test prompt.

Creating a New Test

Step 1: Choose Add a new test from the test overview.

Step 2: Choose the build for which you would like to create the test.

Step 3: When you are ready to begin, select 'Add new test!'


Step Types Overview

GPT Driver supports three types of steps. Each is suited to different UI complexities.

🧱 Command-Based Steps

To get started with Commands, check out this Quick Start (3 min): Video walkthrough of the Basics:

  • Direct actions on the app, no language model involved

  • Use the app’s UI hierarchy (element IDs or text).

  • Fast, deterministic, and free to execute.

  • Ideal for stable, predictable flows.

  • If a command fails (e.g. due to a popup or missing element), GPT Driver automatically falls back to an AI instruction.

Examples

tapOn.id: "com.spotify.music:id/email"
tapOn: "text"
type: "com.spotify.music:id/email"

Explore command syntax and supported actions


👁️ withVision: Instructions

  • Use vision-language models to analyze the full screen like a human would.

  • Ideal for icon-based, image-heavy, or fast-changing UIs without accessible text or IDs.

  • Interprets layout, color, spatial relationships, and unlabeled icons.

  • Can perform high-level instructions across UI variations.

Examples

  • High-level instruction (community creation):

withVision: tap the plus icon to add a new community, enter the name, and tap save
  • Business logic (wallet app):

withVision: Remember the $ balance on the card, perform a transaction, then verify that the balance has changed 
  • graphical UI (trading app):

withVision: Verify that the chart displayed on the trading screen is an area chart, characterized by a continuous line graph with shading beneath the line, and confirm that no candlesticks are visible.

Learn how to use withVision:


🧠 AI Instructions

  • Use large language models (LLMs) to understand test prompts and determine next actions

  • Use OCR (Optical Character Recognition) – For reading on-screen text

  • Use UI element detection and icon recognition to interpret the screen. This allows GPT Driver to identify elements such as buttons, toggles, and inputs even when element IDs are missing or changing.

  • Best for flexible, text-based UIs where element IDs are missing or text may change.

  • Suitable for:

    • Conditional logic (e.g. handling popups)

      • So we only need 1 test rather than several for different variations of the same flow

    • Business rule checks

    • Dynamic content flows (e.g. survey or quiz)

Examples

  • Dynamic scenario (education app):

Complete the lesson until you see "Great Job!" or "n-day streak",
then scroll to and tap “Continue”.
  • Business logic (news app):

Verify that search results only show articles from the business section.

Read more about AI Instructions


General Guideline:

  • Use Commands for fast, stable, UI-hierarchy-based steps.

  • Use withVision: when relying on visual layout, icons, or dynamic designs or for high-level instructions

  • Use AI Instructions for handling frequently changing text elements or conditional steps.

Last updated