Skip to main content
The TypeScript SDK (gpt-driver-node) wraps your existing WebdriverIO + Appium session. You keep writing normal Appium code, and the SDK adds AI commands (aiExecute, assert, extract, …) on top of the same browser instance. Nothing about your existing setup has to change.
This SDK is for engineers who already run Appium tests in code. If you are looking for the no-code recorder and cloud runner, see QA Studio instead.

Prerequisites

A working WebdriverIO + Appium project. If you are starting from scratch, install Appium and the platform drivers:
# Install Appium
npm install -g appium

# Install drivers
appium driver install xcuitest      # iOS
appium driver install uiautomator2  # Android
A typical WebdriverIO project also has these dev dependencies:
npm install -D @wdio/cli @wdio/local-runner @wdio/mocha-framework @wdio/spec-reporter

Installation

Add the SDK to your project:
npm install gpt-driver-node

Configure WebdriverIO

Your wdio.conf.js (or .ts) stays a standard WebdriverIO config. Two things matter for the SDK:
  1. Your Appium capabilities (the app and device under test).
  2. Where you keep your GPT Driver API key. A convenient pattern is a custom top-level field so every spec can read it from browser.options:
wdio.conf.js
export const config = {
    runner: 'local',
    protocol: 'http',
    hostname: '127.0.0.1',
    port: 4723,
    path: '/',

    // Your GPT Driver API key (or read it from process.env.GPT_DRIVER_API_KEY)
    gptDriverApiKey: process.env.GPT_DRIVER_API_KEY,

    specs: ['./test/**/*.spec.js'],
    maxInstances: 1,

    capabilities: [
        {
            platformName: 'Android',
            'appium:automationName': 'UiAutomator2',
            // 'appium:app': '/path/to/app.apk',
            // 'appium:deviceName': 'Pixel 7',
        },
    ],

    framework: 'mocha',
    mochaOpts: { ui: 'bdd', timeout: 60000 * 10 },
};
The long Mocha timeout matters: AI fallback steps can take several seconds, so keep the per-test timeout generous.

Wire the SDK into a spec

Construct a GptDriver and hand it the live WebdriverIO browser. The SDK attaches to that session and reuses it. A beforeEach keeps one instance per test:
test/specs/login.spec.js
import GptDriver from "gpt-driver-node";

describe("Login", () => {
    let gptDriver;

    beforeEach(async function () {
        const { protocol, hostname, port, path } = browser.options;
        const baseUrl = `${protocol}://${hostname}:${port}${path}`;

        gptDriver = new GptDriver({
            apiKey: browser.options.gptDriverApiKey,
            driver: browser,                       // the existing WebdriverIO session
            serverConfig: { url: baseUrl },        // the Appium server URL
            cachingMode: "INTERACTION_REGION",     // "NONE" | "FULL_SCREEN" | "INTERACTION_REGION"
            testId: this.currentTest?.fullTitle(), // shows up on the dashboard
            appId: "com.example.app",              // appPackage (Android) / bundleId (iOS)
        });
    });

    it("logs in", async () => {
        await gptDriver.aiExecute("Tap the login button and wait for the home screen");
        await gptDriver.assert("The home screen is displayed with a welcome message");
        await gptDriver.setSessionSucceeded();
    });
});

What each option does

OptionRequiredDescription
apiKeyyesYour GPT Driver API key.
driveryes*An existing WebdriverIO / Appium browser. The SDK runs on this session.
serverConfig.urlyesThe Appium server URL (required whenever you pass a driver).
cachingModenoDefault caching for AI steps. Defaults to "NONE".
testIdnoA label for this run, shown on the dashboard.
appIdnoApp identifier (appPackage / bundleId); auto-read from the session when omitted.
additionalUserContextnoFree-text guidance passed to the AI on every AI step (for example, "When asked about Location Permissions, grant it.").
*If you do not pass an existing driver, provide serverConfig.url and serverConfig.device.platform and the SDK will start its own Appium session. Attaching to your existing browser is the common case and keeps the SDK in your normal WebdriverIO lifecycle.

Session lifecycle

You do not start the session manually. It starts automatically on the first AI command (aiExecute, assert, extract, …). In case an AI step fails the test, we’ll set the test status in the dashboard to failed automatically. For successful tests, report the outcome so it is recorded on the dashboard:
await gptDriver.setSessionSucceeded();

Run your tests

Start the Appium server (or use the appium WebdriverIO service) and run WebdriverIO as usual:
appium                       # if you are not using the @wdio/appium-service
npx wdio run wdio.conf.js

Next steps

Deterministic execution

Run native Appium code first and only fall back to AI when it fails. The fastest, most stable way to use the SDK.

Reference

Every constructor option and command.