Security Whitepaper · v1.2 · April 2026

Screenpipe Security Architecture

Local-first data architecture. Open source Rust capture engine. On-premise AI. Enterprise-grade access controls. Fully auditable.

verify our security claims

don't trust us — audit the code yourself, or ask AI to review our security architecture against the open source repository.

security audit prompt

paste this into any AI chat to get a security review of our codebase. works with ChatGPT, Claude, Gemini, or any LLM with web access.

1. Security Model

Screenpipe is local-first by design. All screen captures, audio transcriptions, and metadata are processed and stored on the user's device. The capture engine is written in Rust (13 crates), providing compile-time memory safety — eliminating buffer overflows, use-after-free, and data races that affect C/C++ recording tools.

Core Principles

Local-First

All data stored on the user's device. SQLite + media files at ~/.screenpipe/. No server-side processing for core functionality.

Open Source

MIT licensed, 18,000+ GitHub stars. Every line of capture, encryption, and access control code is auditable.

Memory-Safe Engine

13 Rust crates. No GC overhead. Compile-time safety eliminates entire vulnerability classes (buffer overflows, use-after-free, data races).

Admin-Controlled

Enterprise admins lock settings, hide UI, push content filters, and control AI providers across all devices via MDM/Intune.

System Architecture

screenpipe-engine

main server, HTTP API on :3030, orchestration

Vision Manager

Audio Manager

Pipes Manager

Sync Service

API Routes

screenpipe-screen

SCK/WGC · OCR/A11y

screenpipe-audio

CPAL/PA · VAD/STT speaker diarize

screenpipe-core

pipes · sync · crypto PII · permissions

screenpipe-vault

at-rest encryption ChaCha20 · Argon2id

screenpipe-db

SQLite, FTS5 88 migrations

screenpipe-a11y

tree walker · UI events

screenpipe-events

shared types across crates

All 13 crates compile to a single binary. Everything runs on-device.

2. Vision Pipeline Architecture

The vision pipeline uses an event-driven capture model — not continuous polling. Captures are triggered by user activity (app switch, click, typing pause, scroll stop, clipboard change) or visual change detection (histogram diff > 5%). This reduces resource usage while maintaining complete coverage. event_driven_capture.rs

Vision Pipeline — Event-Driven Capture

Trigger Detection

App switch · Window focus · Click · Typing pause · Scroll stop · Clipboard change · Visual change (>5% diff) · Idle (30s). Debounce: min 200ms.

Guard Checks

Screen locked? · DRM content? · Outside work-hours? · Ignored window? · Content hash unchanged? → Skip (unless >30s since last write)

Screenshot Capture

async

macOS: ScreenCaptureKit · Windows: WGC · Linux: xcap

1. Exclude filtered windows

2. Full-screen capture

3. Black frame → skip

4. Write JPEG to ~/.screenpipe/data/

Accessibility Tree Walk

spawn_blocking

macOS: AX APIs (cidre) · Windows: UI Automation · Linux: AT-SPI2 D-Bus

1. Walk focused window

2. Extract all text nodes

3. Browser URL detection

4. Compute content hash

5. Adaptive budget per app

OCR (conditional)

IF no a11y text → run OCR · IF terminal app → always OCR · IF a11y text "thin" (canvas apps, meetings) → OCR + a11y (hybrid) · ELSE → a11y text only

Engines: macOS Vision.framework · Windows OCR API · Linux Tesseract. Semaphore: 1 concurrent. Cache: per window+hash, 5 min.

PII Removal (if enabled) — 27 regex patterns

Database Insert

frames table + ocr_text (if OCR ran) + elements (a11y nodes) + element dedup (ref prev frame) + FTS5 index update → hot_frame_cache (in-memory)

Sources: event_driven_capture.rs · paired_capture.rs · vision_manager · apple.rs (OCR) · a11y/tree

3. Audio Pipeline Architecture

The audio pipeline captures from multiple devices simultaneously, processes 30-second segments with 2-second overlap, runs voice activity detection, speaker diarization, and transcription — all on-device. audio_manager

Audio Pipeline — Capture to Storage

Audio Capture

OS audio devices (mic + system audio, per-device streams). macOS/Windows: CPAL · Linux: PulseAudio. broadcast::channel (1000 sample capacity).

Recording Loop

30-second segments + 2-second overlap. SourceBuffer: Bluetooth packet loss detection, silence insertion (max 500ms — prevents Whisper hallucinations). Flush to crossbeam::channel (capacity 256).

Realtime Mode

Transcribe immediately after capture. Default for normal usage.

Batch Mode

Defer during meetings (Zoom, Teams). Persist audio to disk first. Reconcile and transcribe when meeting ends.

Audio Processing

1. Resample to 16kHz · 2. Normalize + music filtering · 3. VAD (Silero / WebRtc) — 512-sample chunks (Win) / 1600 (macOS). Speech threshold: 0.5 (input), 0.15 (output). Spectral noise subtraction. Min speech ratio: 2% → skip if below.

Speaker Diarization (on-device)

1. Pyannote v3.0 segmentation (ONNX, 10s windows) → speech/silence boundaries · 2. WeSpeaker CAM++ embedding (ONNX) → 192-dim fingerprint · 3. Cosine similarity matching (threshold: 0.9) → assign or create speaker · 4. Calendar-assisted: seed known speakers, constrain max during meetings.

Transcription

Local (no network): Parakeet MLX · Whisper Large v3 Turbo · Whisper Large v3 · Qwen3 ASR · OpenAI-compatible endpoint. Cloud (opt-in): Deepgram API. RMS energy check: skip if <0.015. Per-device overlap dedup.

Database Insert

audio_chunks + audio_transcriptions tables. Speaker ID → speakers table (embedding centroid). PII removal (optional) · FTS5 index update.

Sources: recording loop · VAD · segmentation · embedding · transcription engine

4. Cryptography

All cryptographic primitives use audited libraries. Parameters are from source: crypto.rs and team-crypto.ts.

ComponentAlgorithmParameters
Data EncryptionChaCha20-Poly1305256-bit key, 96-bit nonce, AEAD auth tag
Key DerivationArgon2id v0x1364 MB memory, 3 iterations, parallelism 4, 256-bit output, 256-bit salt
Searchable EncryptionHMAC-SHA256256-bit output, normalized keywords (lowercase, trimmed, deduped)
Data IntegritySHA-256Post-decryption verification checksum
At-Rest VaultChaCha20-Poly1305 + Argon2idPassword-derived keys, lock/unlock via screenpipe-vault crate
Team Config EncryptionAES-256-GCM96-bit random nonce per operation (Web Crypto API)
Team Key WrappingPBKDF2 + AES-256-GCM600,000 iterations, SHA-256, 128-bit salt
Credential StorageTauri Secure StoreOS keychain (macOS Keychain, Windows Credential Manager)

Zero-Knowledge Key Hierarchy

Password + Salt

▼ Argon2id (64MB, 3 iter, p=4)

Password Key

▼ Decrypts

Encrypted Master Key

stored on server, never in plaintext

Data Key

ChaCha20-Poly1305

Encrypts blobs (OCR, audio, frames)

Search Key

HMAC-SHA256

Search tokens — server matches without seeing plaintext

Source: crypto.rs · screenpipe-vault

5. Data Controls & PII Protection

Capture Controls

ControlDetails
Window FilteringExclude specific apps from capture (e.g., 1Password, banking). Admin-pushable include/exclude lists. Incognito window auto-detection (Safari, Chrome, Firefox).
URL FilteringExclude specific websites and URL patterns from capture.
Audio Device SelectionEnable/disable per device. Select specific microphones and system audio sources.
Monitor SelectionChoose which displays to record. Exclude specific monitors. Dynamic monitor connect/disconnect handling.
Data RetentionAuto-delete data older than 1–90 days (configurable, min 1 day enforced). Batch deletion in 1-hour windows. Disk reclaim via PRAGMA incremental_vacuum. Runs every 5 minutes.
DRM Content DetectionPauses all capture when DRM apps are focused. 11 streaming services (Netflix, Disney+, Hulu, Prime Video, Apple TV+, Peacock, Paramount+, HBO Max, Crunchyroll, DAZN), 10 domains, URL path detection for Amazon Video. Browser detection across 15+ browsers via Accessibility API.

Sources: retention.rs · drm_detector.rs

PII Detection Engine

Regex-based PII detection engine with 27 pattern categories. Uses RegexSet for single-pass detection. No ML — deterministic, auditable pattern matching. pii_removal.rs

CategoryPatterns
FinancialCredit card numbers (4-digit groups), IBAN (ISO 13616)
Government IDsUS Social Security Numbers (XXX-XX-XXXX)
Contact InfoEmail (RFC 5322), formatted phone numbers (with country code), IPv4 addresses
CredentialsJWTs, PEM private keys, database connection strings (7 DB types), Bearer tokens, password fields, env var secrets
Service API KeysAWS (AKIA + secrets), GCP, Azure, GitHub, OpenAI (sk-proj/sk-), Anthropic (sk-ant-), Stripe (sk_live/sk_test), Slack (xoxb/xoxp), Discord, GitLab, NPM, PyPI, DigitalOcean, Telegram, Twilio, SendGrid, Mailchimp
SecretsBIP39 seed phrases (12–24 words), 2FA backup codes, password context fields, password UI indicators

Pipe Permission System

Each automation pipe runs with configurable access controls. Evaluation order: Deny → Allow → Default → Reject. Deny rules always take precedence. permissions.rs

Rule TypeDescription
Api(METHOD /path)HTTP endpoint access control. Reader preset: 14 safe endpoints. Writer: +7 mutation endpoints.
App(name)Filter by application name (case-insensitive substring).
Window(glob)Filter by window title (glob patterns with * and ?).
Content(type)Restrict to content types: ocr, audio, input, accessibility.
Time & DayRestrict execution to hours (HH:MM-HH:MM, midnight wrap) and weekdays.
Offline ModeBlocks all non-localhost outbound network requests from pipes.

6. Speaker Identification

Screenpipe includes on-device speaker diarization — all processing runs locally with no cloud dependency:

ComponentDetails
Segmentation ModelPyannote v3.0 (segmentation-3.0.onnx) — speaker activity detection on 10-second windows, runs via ONNX Runtime on-device
Speaker EmbeddingsWeSpeaker CAM++ (wespeaker_en_voxceleb_CAM++.onnx) — 192-dimensional voice fingerprint per segment via filterbank features
Matching AlgorithmCosine similarity with configurable threshold (default: 0.9). Embeddings stored locally in SQLite. At capacity: force-merge to closest speaker.
Calendar IntegrationCalendar-assisted diarization seeds known speakers from meeting attendees, constrains max speakers during active meetings for improved accuracy.

Sources: embedding_manager.rs · embedding.rs · calendar_speaker_id.rs

7. Enterprise Deployment

FeatureDetails
Admin PolicyLock settings, hide UI sections (chat, timeline, settings). "Managed by [Org]" overlay. Policies sync every 5 min with offline fallback. Enterprise policy via Tauri command layer.
MDM DeploymentDeploy via Kandji, Intune, or any MDM. Reads enterprise.json from managed directory. Auto-updates disabled for IT control.
License ManagementSeat-based licensing with feature matrix. Cached 4 hours, 14-day offline grace period.
Team Config EncryptionAES-256-GCM. Key generated on admin device, never sent to server. Shared via passphrase-protected invite (PBKDF2, 600K iterations).
Controlled UIHide chat, timeline, settings per device. Control AI models, transcription engines, pipe execution, data types.
Content Filter PushPush window/URL filters to all devices. Team filters are additive and cannot be removed by members.
API AuthenticationNon-localhost requests require Bearer token. Configurable api_auth and api_key per deployment.

Sources: admin-policy.ts · license-validation.ts · enterprise_policy.rs

MDM Configuration

// Pushed by MDM to: <app_dir>/enterprise.json
// Or manually at: ~/.screenpipe/enterprise.json
// macOS also checks: ../Resources/enterprise.json
{
  "license_key": "your-enterprise-license-key"
}

8. Network Requests

Core functionality requires zero network requests. Capture, OCR, local transcription, search, and pipes all run offline. Network requests occur only for explicitly enabled optional features:

FeatureDestinationData Transmitted
Cloud TranscriptionDeepgram APIAudio chunks (opt-in, admin-disableable)
Cloud AI (Pipes)Screenpipe Cloud (ZDR)Prompts + context (zero data retention — see §9)
OAuth ConnectionsGoogle, Notion, etc.Tokens stored locally, data fetched to device only
AnalyticsPostHog (eu.i.posthog.com)Anonymous usage events — no screen content, no PII
License Validationscreenpi.pe APILicense key only. Cached 4h, 14-day offline grace.
Cloud SyncS3 (encrypted blobs)ChaCha20-Poly1305 ciphertext only. Server never sees plaintext.

Enterprise deployments can run fully air-gapped with local transcription (Parakeet/Whisper) and local AI (Ollama). In this configuration the only network request is license validation, which supports 14-day offline operation.

Data Flow Summary

On Device — Always Local, No Network

Screen Capture

ScreenCaptureKit (macOS) WGC (Win) · xcap (Linux)

Text Extraction

Accessibility Tree + OCR (hybrid, conditional)

Audio Processing

CPAL capture · Silero VAD Speaker diarization (ONNX)

Local Storage

SQLite DB (88 migrations) JPEG + Audio files FTS5 full-text index

Optional — Explicit Admin/User Opt-In

Cloud Transcription

Deepgram API (audio only, opt-in)

Cloud AI for Pipes

Screenpipe Cloud (ZDR) Or BYOK: OpenAI, Anthropic

Cloud Sync

Zero-knowledge encrypted ChaCha20-Poly1305 blobs Server never sees plaintext

Never

Screen recordings sent to any server

Unencrypted data transmitted over network

Data shared with third parties

Screenpipe employees accessing user data

9. AI & Transcription Providers

Screenpipe supports fully on-premise AI for both transcription and pipe execution. Enterprise users are free to use their own AI providers. Cloud AI through Screenpipe is optional and operates under zero data retention (ZDR) policies.

On-Premise Transcription

The following speech-to-text engines run entirely on-device with no network dependency. engine.rs

EngineModelNotes
Parakeet MLXparakeet-tdt-0.6b-v3-mlxMetal GPU acceleration on Apple Silicon. 25 languages. Fastest local option.
Parakeet CPUparakeet-tdt-0.6b-v3OpenBLAS. Cross-platform.
Whisper Large v3 Turboggml-large-v3-turbo.binDefault. 99 languages. Best accuracy/speed tradeoff.
Whisper Large v3ggml-large-v3.binHighest accuracy. Higher resource usage.
Whisper Tinyggml-tiny.binLightweight. For resource-constrained devices.
Qwen3 ASRqwen3-asr-0.6b-antirez0.6B multilingual model.
OpenAI-CompatibleCustom endpointConnect any OpenAI-compatible STT API (e.g., on-premise Whisper server).

AI for Pipes (Automations)

Pipe execution supports multiple AI providers. Enterprise users can bring their own API keys or use fully on-premise models. pipes/mod.rs

ProviderTypeData Retention
Ollama (local)On-premiseNo data leaves device. Runs at localhost:11434.
Custom OpenAI-CompatibleOn-premise / BYOKYour infrastructure, your policies. Configurable endpoint, API key, headers.
OpenAI (BYOK)Cloud — user's keySubject to OpenAI's API data usage policy (not used for training with API keys).
Anthropic (BYOK)Cloud — user's keySubject to Anthropic's API data usage policy (not used for training with API keys).
Screenpipe CloudCloud — managedZero data retention (ZDR). OpenRouter configured with data_collection: 'deny'. Vertex AI for Gemini (better retention terms). No prompts or outputs stored.

Sources: openrouter.ts (ZDR config) · gemini.ts (Vertex routing)

Enterprise recommendation: For maximum data control, deploy with on-premise Ollama or a custom OpenAI-compatible endpoint. No data leaves your network. Cloud AI is entirely optional.

10. Testing & CI/CD Pipeline

Screenpipe maintains a multi-layered testing infrastructure across 14 CI/CD workflows, covering unit tests, integration tests, E2E tests, benchmarks, security audits, and longevity testing.

Testing & Release Pipeline

On Every PR / Push

cargo test

Unit + integration 12 crates, 3 platforms

cargo clippy + fmt

Linting + formatting All packages

cargo audit + deny

Supply chain scan Unused deps (machete)

E2E Tests (WebDriver IO + Mocha)

12 test scripts: app lifecycle, health check, search API, settings, timeline, WebSocket, MCP, onboarding. Platforms: macOS (arm64), Windows (x64), Linux. Video recording for debugging.

PII Removal Tests

27 pattern categories validated. Performance benchmarks for regex engine.

Scheduled

Longevity Test

4-hour stress run. Memory tracking CSV. Windows, every 4h.

Benchmarks (daily)

OCR: Apple / Tesseract / Win. STT: Whisper. DB: search accuracy, FTS perf.

Release

Desktop App: macOS (Intel + Apple Silicon) + Windows + Linux · CLI: cross-platform with LTO, codegen-units=1, strip · MCP Server: npm publish · Code Signing: SSL.com EV (Windows), Apple notarization (macOS)

Sources: ci.yml · style.yml (audit + lint) · e2e-test.yml · longevity-test.yml · benchmark.yml

Security-Specific Testing

Test CategoryDetails
cargo auditDependency vulnerability scanning on every PR. Blocks merge on known CVEs.
cargo denyLicense compliance + duplicate dependency detection. Prevents supply chain issues.
PII Redaction TestsONNX-based entity detection tests. Pattern matching validated across 27 categories.
Database IntegrityFTS contention tests, heavy read scenarios, FK constraint validation, audio reconciliation.
Longevity Testing4-hour continuous run on Windows (every 4h). Memory usage tracked via CSV. Detects leaks and resource exhaustion.
E2E SecuritySettings persistence, API health endpoints, MCP integration, WebSocket stability.

11. Compliance

StandardStatus
SOC 2 Type IICompliant. Continuous monitoring of security controls. Audit trail via enterprise policy system.
GDPRCompliant — all data processed and stored locally by default. Full user control over collection, retention, and deletion. No cross-border transfers for core functionality. Data minimization via configurable retention (1–90 days).
HIPAACompliant — local-first architecture means PHI never leaves the device. On-premise AI eliminates BAA requirements with third-party processors. Configurable retention and access controls. PII detection engine covers healthcare identifiers.
CCPACompliant — no data sold or shared. Full user control. Deletion via retention settings or manual purge.
Open Source AuditMIT licensed — full source code available for independent security review. 18,000+ stars on GitHub.

Liability & Data Responsibility

Local-first architecture shifts data liability to the deploying organization. Because all data is processed and stored on the user's device (or the organization's managed devices), Screenpipe does not act as a data processor for core functionality.

Enterprise controls enable compliance ownership: Admin policies, MDM deployment, content filters, data retention, and AI provider selection are all configurable by the organization. The enterprise admin controls what data is captured, how long it is retained, and where (if anywhere) it is transmitted.

Zero data retention for cloud features: When optional cloud features are enabled (cloud AI, cloud sync), data is either encrypted end-to-end (sync) or processed under zero data retention policies (AI). Screenpipe employees cannot access user data at any point in the pipeline.

Open source transparency: All security claims are verifiable against the public source code. Organizations can audit the codebase, fork it, or run modified builds to meet specific compliance requirements.

12. Source Code Audit

Screenpipe is fully open source. The following modules are directly relevant to security review:

Capture Enginecrates/screenpipe-engine
Screen Capturecrates/screenpipe-screen
Audio Pipelinecrates/screenpipe-audio
Accessibilitycrates/screenpipe-a11y
Cryptographycrates/screenpipe-core/src/sync
Vault (At-Rest)crates/screenpipe-vault
PII Detectioncrates/screenpipe-core/src/pii_removal.rs
Pipe Permissionscrates/screenpipe-core/src/pipes/permissions.rs
Databasecrates/screenpipe-db
AI Gatewaypackages/ai-gateway/src/providers
Team Encryptionapps/screenpipe-app-tauri/lib/team-crypto.ts
Enterprise Policyee
Full Repositorygithub.com/screenpipe/screenpipe

For documentation, see the Screenpipe Docs including the Pipes Guide, Pipe Permissions, Teams & Encryption, and Cloud Sync Architecture.

Security Contact

To report a vulnerability, request a security review for your organization, or discuss enterprise deployment: louis@screenpi.pe

Document version 1.2 · April 2026 · Screenpipe v2.3.x · All claims verified against source code