Image Recognition
ActionsOCR and image analysis capabilities
Access these methods through agent.actions. Extract text and analyze images using ML Kit, or locate icons in screenshots with OpenCV template matching.
recognizeText()
recognizeText(imageBase64: string): Promise<TextJSON>Performs OCR on an image using ML Kit.
Parameters
| Name | Type | Description |
|---|---|---|
imageBase64 | string | Base64-encoded image |
Returns
TextJSONHierarchical text structure with confidence and bounding boxes
Examples
const { screenshot } = await agent.actions.screenshot(1080, 1920, 90);const result = await agent.actions.recognizeText(screenshot);console.log("Full text:", result.text);
// Access individual text blocksfor (const block of result.textBlocks) { console.log("Block:", block.text, "at", block.boundingBox);}findImage()
findImage(image: string, images: string[], threshold?: number): Promise<FindImageResult>Locates one or more template images within a source image using OpenCV multi-scale template matching. Useful for finding icons or fixed UI elements on a screenshot when accessibility nodes are unreliable. Requires Android 11+ (API 30+) and RemoteMobile app version code 195+.Since 195
Parameters
| Name | Type | Description |
|---|---|---|
image | string | Base64-encoded source image to search within (e.g. a screenshot) |
images | string[] | Array of base64-encoded template images to find |
threshold? | number | Confidence threshold between 0.0 and 1.0 (default 0.7) |
Returns
FindImageResultPer-template results array. Each entry corresponds to the same index in the input images array.
Examples
// Take a screenshot and look for an icon (e.g. a delete bin button)const device = agent.info.getDeviceInfo();if (device.sdkVersion < 30 || device.appVersionCode < 195) { console.log("findImage not supported on this device"); return;}
const { screenshot } = await agent.actions.screenshot(device.width, device.height, 90);const templates = await (await fetch("templates.json")).json();
const result = await agent.actions.findImage( screenshot, [templates.bin, templates.bin_dark], 0.7,);
for (const match of result.results) { if (match.found && match.bounds) { console.log(`Template ${match.index} found at (${match.x}, ${match.y}) with confidence ${match.confidence?.toFixed(3)}`); agent.utils.randomClick( match.bounds.left, match.bounds.top, match.bounds.right, match.bounds.bottom, ); }}Return Types
TextJSON
Root level OCR result containing all recognized text.
interface TextJSON { text: string; // Complete recognized text textBlocks: TextBlock[]; // Array of text blocks}TextBlock
A block of text, typically a paragraph.
interface TextBlock { text: string; boundingBox: BoundingBox; cornerPoints: Point[]; recognizedLanguages: string[]; lines: TextLine[];}TextLine
A line of text within a block.
interface TextLine { text: string; boundingBox: BoundingBox; cornerPoints: Point[]; recognizedLanguages: string[]; elements: TextElement[]; confidence: number; angle: number;}TextElement
Individual text element (usually a word).
interface TextElement { text: string; boundingBox: BoundingBox; cornerPoints: Point[]; recognizedLanguages: string[]; symbols: TextSymbol[]; confidence: number; angle: number;}TextSymbol
Individual character/symbol.
interface TextSymbol { text: string; boundingBox: BoundingBox; cornerPoints: Point[]; confidence: number; angle: number;}BoundingBox
interface BoundingBox { left: number; top: number; right: number; bottom: number;}FindImageResult
Top-level result returned by findImage.
interface FindImageResult { results: FindImageMatchResult[]; // One entry per template image (same order as input)}FindImageMatchResult
Match information for a single template. Coordinates and bounds are only present when found is true.
interface FindImageMatchResult { index: number; // Index of the template in the input images array found: boolean; // Whether a match was found above the threshold x?: number; // Center x-coordinate of the match (only if found) y?: number; // Center y-coordinate of the match (only if found) confidence?: number; // Match confidence 0.0 - 1.0 (only if found) bounds?: BoundingBox; // Bounding box of the matched region (only if found) scale?: number; // Scale at which the template was found (only if found)}