Automation API

Android device automation

Image Recognition

Actions

OCR and image analysis capabilities

Access these methods through agent.actions. Extract text and analyze images using ML Kit, or locate icons in screenshots with OpenCV template matching.

recognizeText()

TypeScript
recognizeText(imageBase64: string): Promise<TextJSON>

Performs OCR on an image using ML Kit.

Parameters

NameTypeDescription
imageBase64stringBase64-encoded image

Returns

TextJSONHierarchical text structure with confidence and bounding boxes

Examples

TypeScript
const { screenshot } = await agent.actions.screenshot(1080, 1920, 90);
const result = await agent.actions.recognizeText(screenshot);
console.log("Full text:", result.text);
// Access individual text blocks
for (const block of result.textBlocks) {
console.log("Block:", block.text, "at", block.boundingBox);
}

findImage()

TypeScript
findImage(image: string, images: string[], threshold?: number): Promise<FindImageResult>

Locates one or more template images within a source image using OpenCV multi-scale template matching. Useful for finding icons or fixed UI elements on a screenshot when accessibility nodes are unreliable. Requires Android 11+ (API 30+) and RemoteMobile app version code 195+.Since 195

Parameters

NameTypeDescription
imagestringBase64-encoded source image to search within (e.g. a screenshot)
imagesstring[]Array of base64-encoded template images to find
threshold?numberConfidence threshold between 0.0 and 1.0 (default 0.7)

Returns

FindImageResultPer-template results array. Each entry corresponds to the same index in the input images array.

Examples

TypeScript
// Take a screenshot and look for an icon (e.g. a delete bin button)
const device = agent.info.getDeviceInfo();
if (device.sdkVersion < 30 || device.appVersionCode < 195) {
console.log("findImage not supported on this device");
return;
}
const { screenshot } = await agent.actions.screenshot(device.width, device.height, 90);
const templates = await (await fetch("templates.json")).json();
const result = await agent.actions.findImage(
screenshot,
[templates.bin, templates.bin_dark],
0.7,
);
for (const match of result.results) {
if (match.found && match.bounds) {
console.log(`Template ${match.index} found at (${match.x}, ${match.y}) with confidence ${match.confidence?.toFixed(3)}`);
agent.utils.randomClick(
match.bounds.left,
match.bounds.top,
match.bounds.right,
match.bounds.bottom,
);
}
}

Return Types

TextJSON

Root level OCR result containing all recognized text.

TypeScript
interface TextJSON {
text: string; // Complete recognized text
textBlocks: TextBlock[]; // Array of text blocks
}

TextBlock

A block of text, typically a paragraph.

TypeScript
interface TextBlock {
text: string;
boundingBox: BoundingBox;
cornerPoints: Point[];
recognizedLanguages: string[];
lines: TextLine[];
}

TextLine

A line of text within a block.

TypeScript
interface TextLine {
text: string;
boundingBox: BoundingBox;
cornerPoints: Point[];
recognizedLanguages: string[];
elements: TextElement[];
confidence: number;
angle: number;
}

TextElement

Individual text element (usually a word).

TypeScript
interface TextElement {
text: string;
boundingBox: BoundingBox;
cornerPoints: Point[];
recognizedLanguages: string[];
symbols: TextSymbol[];
confidence: number;
angle: number;
}

TextSymbol

Individual character/symbol.

TypeScript
interface TextSymbol {
text: string;
boundingBox: BoundingBox;
cornerPoints: Point[];
confidence: number;
angle: number;
}

BoundingBox

TypeScript
interface BoundingBox {
left: number;
top: number;
right: number;
bottom: number;
}

FindImageResult

Top-level result returned by findImage.

TypeScript
interface FindImageResult {
results: FindImageMatchResult[]; // One entry per template image (same order as input)
}

FindImageMatchResult

Match information for a single template. Coordinates and bounds are only present when found is true.

TypeScript
interface FindImageMatchResult {
index: number; // Index of the template in the input images array
found: boolean; // Whether a match was found above the threshold
x?: number; // Center x-coordinate of the match (only if found)
y?: number; // Center y-coordinate of the match (only if found)
confidence?: number; // Match confidence 0.0 - 1.0 (only if found)
bounds?: BoundingBox; // Bounding box of the matched region (only if found)
scale?: number; // Scale at which the template was found (only if found)
}