Skip to main content

Minimal end-to-end (WebdriverIO + Mocha)

A complete spec. The SDK attaches to the WebdriverIO browser, the session starts on the first AI command, and the outcome is reported at the end.
test/specs/basic.spec.js
import GptDriver from "gpt-driver-node";

describe("Chrome smoke test", () => {
    let gptDriver;

    beforeEach(async function () {
        const { protocol, hostname, port, path } = browser.options;
        gptDriver = new GptDriver({
            apiKey: browser.options.gptDriverApiKey,
            driver: browser,
            serverConfig: { url: `${protocol}://${hostname}:${port}${path}` },
            cachingMode: "INTERACTION_REGION",
            testId: this.currentTest?.fullTitle(),
        });
    });

    it("opens the Chrome app", async () => {
        await gptDriver.aiExecute("Open the Chrome app");
        await gptDriver.assert("A new tab option is shown at the top of the screen");
        await gptDriver.setSessionSucceeded();
    });
});

Native-first, AI fallback

Run native Appium code first for speed and determinism, and let the SDK fall back to AI only when a native block throws. This is the recommended pattern for real suites.
async function step(title, appiumHandler, aiPrompt) {
    console.log(`▶ ${title}`);
    await gptDriver.aiExecute(aiPrompt, { appiumHandler });
}

await step(
    "enter the email",
    async () => { await $("~login-email").setValue("test@example.com"); },
    `Type "test@example.com" into the email field.`
);

await step(
    "submit the form",
    async () => { await $("~login-button").click(); },
    `Tap the Sign In button.`
);
See Deterministic execution for the full pattern and a complete worked example.

Extract values, then assert on them

Pull data off the screen with extract, then verify it with assertBulk:
const order = await gptDriver.extract([
    "totalPrice",
    "deliveryDate",
]);

await gptDriver.assertBulk([
    `The total price is ${order.totalPrice}`,
    "The VAT is calculated correctly",
    "The delivery date is shown in the format dd.mm.YYYY",
]);

Handle non-native elements

The AI works visually, so it can drive WebViews and other elements that selectors struggle with:
await gptDriver.aiExecute("Tap the 'Accept cookies' button in the cookie banner");
await gptDriver.aiExecute("Scroll down and tap on 'View pricing'");
See non-native elements for details.