GPT Driver User Guide
GPT Driver User Guide
GPT Driver User Guide
  • Getting Started
    • πŸš€Getting Started
    • πŸ”ƒUploading a Build File
    • πŸ§™β€β™‚οΈCreating Tests
      • Commands
        • Tap Command
        • Type Command
        • scroll Command
        • scrollUntilVisible Command
        • swipe Command
        • slide Command
        • wait Command
        • assertVisible Command
        • openLink Command
        • pressBackButton Command
        • launchApp Command
        • restartApp Command
      • πŸ‘οΈwithVision: Instructions
      • 🧠AI Instructions
    • 🏁Executing Tests
  • ☎️Device Configuration
  • βš™οΈUnder the Hood
  • Getting Around
    • ✏️Test Editor
    • πŸ›°οΈTest Overview
    • πŸ…Test Reports
    • ⏺️Test Recordings
    • πŸ‘€Settings
  • Best Practices
    • πŸ§‘β€πŸ’»API Documentation
    • Versioning
    • ↗️Templates
    • πŸ–‡οΈTest Dependencies
    • πŸ”—Deep Links
    • πŸ“§Email Verification
    • πŸ“‘Network Calls
    • πŸͺ‘Parameterized Strings
    • πŸ“Changing Device Location
    • πŸͺΆConditional Steps
    • 🐦Nested Steps
    • ⌚Smart Caching
    • πŸ—£οΈEnv. Variables
    • πŸ’―Bulk Step Testing for Robustness
    • πŸ“–Exact Text Assertions
    • πŸ’¬Auto-grant Permissions
  • πŸ§ͺMocking Network Data
  • 🌎Localization Testing
  • ❔FAQs
Powered by GitBook
On this page
  • What Elements Can You Reference?
  • Syntax
  • Fallback to AI Execution: What If a Command Is Non-Executable?
  • ⚠️ iOS UI Hierarchy Limitation
  • How to Easily Access Element IDs using the UI Inspector
  • Which type of tapping execution works best?
  1. Getting Started
  2. Creating Tests
  3. Commands

Tap Command

What Elements Can You Reference?

  1. Text: Use the tapOn command to interact with text-based elements.

  2. Element IDs:

    • For iOS, use the accessibilityIdentifier.

    • For Android, use the resource-id.

Syntax

Referencing Text

To reference text within your mobile app, use the following syntax:

tapOn: "text"

Example:

tapOn: "Continue"

Referencing Element IDs

To reference element IDs, use the following syntax:

 tapOn.id: "element ID"

Example:

tapOn.id: "com.spotify.music:id/email"

Note: Both text and element ID references are case sensitive. Quotation marks may break when copying from external sources like Google Documents. If this occurs, simply retype the quotation marks manually to ensure proper formatting.

Fallback to AI Execution: What If a Command Is Non-Executable?

When writing commands, you can add additional context to clarify their purpose in case they aren't initially executable. For example:

tapOn.id: "com.spotify.music:id/email" (to enter your email address)

GPT Driver will always first attempt to execute the command directly using code logic, ignoring the context. Only if the command cannot be executed this way will GPT Driver escalate the execution to the large language model. At that point, the entire lineβ€”including both the command and the contextβ€”is treated as an AI step execution.

Also, make sure the context is always positive. Negative instructions, like tapOn: "Continue" (unless the button is grey), won’t work. This is because GPT Driver will try to tap on "Continue" immediately, without considering the context in parentheses. It won’t check whether the button is grey unless the command fails and escalates to AI executionβ€”by which time, it may have already attempted an incorrect action.

⚠️ iOS UI Hierarchy Limitation

On iOS, GPT Driver may not always retrieve the full UI hierarchy. This can cause commands like tapOn.id: "element ID" or tapOn: "text" to fail when the native accessibility layer isn't fully available.

To improve reliability:

  • for tap commands on iOS always add contextual information in parentheses, e.g., tapOn.id: "com.app.login:id/submit_button" (to log in)

This helps guide AI fallback execution when needed.

πŸ›  Note: We're actively improving platform support to reduce these issues on iOS.

How to Easily Access Element IDs using the UI Inspector

You can use our UI Inspector to quickly identify and reference element IDs. The inspector simplifies locating the relevant identifiers, allowing you to copy and paste them directly into your test scripts for seamless execution. Follow the steps below to use the UI Inspector:

Step 1: Activate the UI inspector tool by clicking the icon.

  • On the right-hand side of the interface, click on the UI Inspector icon (as shown in the image).

Step 2: Click on the element you wish to capture.

  • Once the UI Inspector is active, hover over the element you wish to reference in your test. The element will be highlighted (see image). Click on it to select.

  • Note: If the element.ID is available, GPT Driver will prioritize using it first. Only if the element.ID is not available will GPT Driver use the text attribute as a fallback.

Step 3: The element ID is automatically copied, ready to be pasted into your test script.

  • After selecting the element, the element ID will be automatically copied to your clipboard. When you paste the copied element ID into your test script, it will be pasted as tapOn.id: "<element ID>". Since tapping is the most common action in tests, this has been set as the default.

Which type of tapping execution works best?

The best execution type depends on the tests you’re running. For cross-platform tests (iOS, Android, Web), use platform-agnostic prompts by referencing text in tap commands (e.g., tapOn: β€˜β€™<text>"), as element IDs differ across platforms. If your app supports multiple languages, using element IDs for tap commands (e.g., tapOn.id: β€˜β€™element id") ensures language-agnostic steps and more consistent results.

PreviousCommandsNextType Command

Last updated 12 days ago

πŸ§™β€β™‚οΈ