GitHubMIT

Build AI agents with desktop context

Open source API for AI agents that can see your screen. Access screen recordings, OCR text, and audio transcriptions through simple HTTP endpoints on localhost:3030.

api.ts
async fetchUser(id) {
const res = await fetch(url);
return res.json(); // TypeError
}
“I've fixed this before... where was it?”

See it in action

Building desktop AI is hard

You want to build AI features, not screen recording infrastructure.

01

Screen recording across platforms means dealing with different OS APIs and permissions

02

OCR and text extraction needs to handle multiple languages and formats

03

Audio transcription requires speech-to-text pipelines

04

Storing and searching through recordings needs efficient indexing

05

All this infrastructure work before you can build actual features

Desktop context as an API

screenpipe handles the hard parts - cross-platform recording, OCR, transcription, search. You just call the REST API on localhost:3030.

Search API

Full-text search across OCR text, audio transcriptions, and UI elements. Filter by app, window, time range.

Frames API

Get recent screen frames with OCR data. Useful for real-time screen understanding.

Audio API

Access audio transcriptions with timestamps and speaker detection.

Health API

Check recording status - whether screen, audio, and UI capture are working.

How it works

1

Install and run screenpipe

Download from screenpi.pe. The API starts automatically on port 3030.

# macOS/Linux
curl -fsSL https://screenpi.pe/install.sh | sh
screenpipe

# Or download the app from screenpi.pe
2

Search your screen history

Query the search endpoint with filters for content, time, and apps.

curl "http://localhost:3030/search?q=error&content_type=ocr&limit=10"
3

Build your features

Use any language that can make HTTP requests. Here's a Python example:

import requests

# Search for content
response = requests.get(
    "http://localhost:3030/search",
    params={
        "q": "meeting notes",
        "content_type": "ocr",
        "start_time": "2024-01-01T00:00:00Z",
        "limit": 20
    }
)
results = response.json()

for item in results["data"]:
    print(item["content"]["text"])

Code examples

Search API

Search across all captured content with filters

# Basic search
curl "http://localhost:3030/search?q=react+hooks&limit=10"

# Filter by content type (ocr, audio, ui)
curl "http://localhost:3030/search?q=meeting&content_type=audio"

# Filter by app name
curl "http://localhost:3030/search?q=error&app_name=Terminal"

# Filter by time range
curl "http://localhost:3030/search?q=bug&start_time=2024-01-01T00:00:00Z&end_time=2024-01-02T00:00:00Z"

Health check

Verify screenpipe is recording correctly

curl "http://localhost:3030/health"

# Response:
# {
#   "status": "healthy",
#   "frame_status": "ok",
#   "audio_status": "ok",
#   "ui_status": "ok",
#   "last_frame_timestamp": "2024-01-15T10:30:00Z",
#   "last_audio_timestamp": "2024-01-15T10:30:00Z"
# }

TypeScript/JavaScript

Query the API with fetch

// Search for content from the last 24 hours
const response = await fetch(
  "http://localhost:3030/search?" + new URLSearchParams({
    q: "api documentation",
    content_type: "ocr",
    limit: "10",
    start_time: new Date(Date.now() - 24 * 60 * 60 * 1000).toISOString()
  })
);

const results = await response.json();

// Process results
results.data.forEach(item => {
  console.log(item.content.text);
  console.log(item.content.app_name);
  console.log(item.content.timestamp);
});

Key benefits

Skip months of infrastructure work
Cross-platform (macOS, Windows, Linux)
MIT licensed - use commercially
Active community on Discord
All data stays local on your machine

Frequently asked questions

Start building

From zero to desktop AI agent in minutes. Open source.