Security Whitepaper · v1.2 · April 2026
Screenpipe Security Architecture
Local-first data architecture. Open source Rust capture engine. On-premise AI. Enterprise-grade access controls. Fully auditable.
verify our security claims
don't trust us — audit the code yourself, or ask AI to review our security architecture against the open source repository.
security audit prompt
paste this into any AI chat to get a security review of our codebase. works with ChatGPT, Claude, Gemini, or any LLM with web access.
1. Security Model
Screenpipe is local-first by design. All screen captures, audio transcriptions, and metadata are processed and stored on the user's device. The capture engine is written in Rust (13 crates), providing compile-time memory safety — eliminating buffer overflows, use-after-free, and data races that affect C/C++ recording tools.
Core Principles
Local-First
All data stored on the user's device. SQLite + media files at ~/.screenpipe/. No server-side processing for core functionality.
Open Source
MIT licensed, 18,000+ GitHub stars. Every line of capture, encryption, and access control code is auditable.
Memory-Safe Engine
13 Rust crates. No GC overhead. Compile-time safety eliminates entire vulnerability classes (buffer overflows, use-after-free, data races).
Admin-Controlled
Enterprise admins lock settings, hide UI, push content filters, and control AI providers across all devices via MDM/Intune.
System Architecture
screenpipe-engine
main server, HTTP API on :3030, orchestration
Vision Manager
Audio Manager
Pipes Manager
Sync Service
API Routes
screenpipe-screen
SCK/WGC · OCR/A11y
screenpipe-audio
CPAL/PA · VAD/STT speaker diarize
screenpipe-core
pipes · sync · crypto PII · permissions
screenpipe-vault
at-rest encryption ChaCha20 · Argon2id
screenpipe-db
SQLite, FTS5 88 migrations
screenpipe-a11y
tree walker · UI events
screenpipe-events
shared types across crates
All 13 crates compile to a single binary. Everything runs on-device.
2. Vision Pipeline Architecture
The vision pipeline uses an event-driven capture model — not continuous polling. Captures are triggered by user activity (app switch, click, typing pause, scroll stop, clipboard change) or visual change detection (histogram diff > 5%). This reduces resource usage while maintaining complete coverage. event_driven_capture.rs
Vision Pipeline — Event-Driven Capture
Trigger Detection
App switch · Window focus · Click · Typing pause · Scroll stop · Clipboard change · Visual change (>5% diff) · Idle (30s). Debounce: min 200ms.
↓
Guard Checks
Screen locked? · DRM content? · Outside work-hours? · Ignored window? · Content hash unchanged? → Skip (unless >30s since last write)
↓
Screenshot Capture
async
macOS: ScreenCaptureKit · Windows: WGC · Linux: xcap
1. Exclude filtered windows
2. Full-screen capture
3. Black frame → skip
4. Write JPEG to ~/.screenpipe/data/
Accessibility Tree Walk
spawn_blocking
macOS: AX APIs (cidre) · Windows: UI Automation · Linux: AT-SPI2 D-Bus
1. Walk focused window
2. Extract all text nodes
3. Browser URL detection
4. Compute content hash
5. Adaptive budget per app
↓
OCR (conditional)
IF no a11y text → run OCR · IF terminal app → always OCR · IF a11y text "thin" (canvas apps, meetings) → OCR + a11y (hybrid) · ELSE → a11y text only
Engines: macOS Vision.framework · Windows OCR API · Linux Tesseract. Semaphore: 1 concurrent. Cache: per window+hash, 5 min.
↓
PII Removal (if enabled) — 27 regex patterns
↓
Database Insert
frames table + ocr_text (if OCR ran) + elements (a11y nodes) + element dedup (ref prev frame) + FTS5 index update → hot_frame_cache (in-memory)
Sources: event_driven_capture.rs · paired_capture.rs · vision_manager · apple.rs (OCR) · a11y/tree
3. Audio Pipeline Architecture
The audio pipeline captures from multiple devices simultaneously, processes 30-second segments with 2-second overlap, runs voice activity detection, speaker diarization, and transcription — all on-device. audio_manager
Audio Pipeline — Capture to Storage
Audio Capture
OS audio devices (mic + system audio, per-device streams). macOS/Windows: CPAL · Linux: PulseAudio. broadcast::channel (1000 sample capacity).
↓
Recording Loop
30-second segments + 2-second overlap. SourceBuffer: Bluetooth packet loss detection, silence insertion (max 500ms — prevents Whisper hallucinations). Flush to crossbeam::channel (capacity 256).
↓
Realtime Mode
Transcribe immediately after capture. Default for normal usage.
Batch Mode
Defer during meetings (Zoom, Teams). Persist audio to disk first. Reconcile and transcribe when meeting ends.
↓
Audio Processing
1. Resample to 16kHz · 2. Normalize + music filtering · 3. VAD (Silero / WebRtc) — 512-sample chunks (Win) / 1600 (macOS). Speech threshold: 0.5 (input), 0.15 (output). Spectral noise subtraction. Min speech ratio: 2% → skip if below.
↓
Speaker Diarization (on-device)
1. Pyannote v3.0 segmentation (ONNX, 10s windows) → speech/silence boundaries · 2. WeSpeaker CAM++ embedding (ONNX) → 192-dim fingerprint · 3. Cosine similarity matching (threshold: 0.9) → assign or create speaker · 4. Calendar-assisted: seed known speakers, constrain max during meetings.
↓
Transcription
Local (no network): Parakeet MLX · Whisper Large v3 Turbo · Whisper Large v3 · Qwen3 ASR · OpenAI-compatible endpoint. Cloud (opt-in): Deepgram API. RMS energy check: skip if <0.015. Per-device overlap dedup.
↓
Database Insert
audio_chunks + audio_transcriptions tables. Speaker ID → speakers table (embedding centroid). PII removal (optional) · FTS5 index update.
Sources: recording loop · VAD · segmentation · embedding · transcription engine
4. Cryptography
All cryptographic primitives use audited libraries. Parameters are from source: crypto.rs and team-crypto.ts.
| Component | Algorithm | Parameters |
|---|---|---|
| Data Encryption | ChaCha20-Poly1305 | 256-bit key, 96-bit nonce, AEAD auth tag |
| Key Derivation | Argon2id v0x13 | 64 MB memory, 3 iterations, parallelism 4, 256-bit output, 256-bit salt |
| Searchable Encryption | HMAC-SHA256 | 256-bit output, normalized keywords (lowercase, trimmed, deduped) |
| Data Integrity | SHA-256 | Post-decryption verification checksum |
| At-Rest Vault | ChaCha20-Poly1305 + Argon2id | Password-derived keys, lock/unlock via screenpipe-vault crate |
| Team Config Encryption | AES-256-GCM | 96-bit random nonce per operation (Web Crypto API) |
| Team Key Wrapping | PBKDF2 + AES-256-GCM | 600,000 iterations, SHA-256, 128-bit salt |
| Credential Storage | Tauri Secure Store | OS keychain (macOS Keychain, Windows Credential Manager) |
Zero-Knowledge Key Hierarchy
Password + Salt
▼ Argon2id (64MB, 3 iter, p=4)
Password Key
▼ Decrypts
Encrypted Master Key
stored on server, never in plaintext
▼
Data Key
ChaCha20-Poly1305
Encrypts blobs (OCR, audio, frames)
Search Key
HMAC-SHA256
Search tokens — server matches without seeing plaintext
Source: crypto.rs · screenpipe-vault
5. Data Controls & PII Protection
Capture Controls
| Control | Details |
|---|---|
| Window Filtering | Exclude specific apps from capture (e.g., 1Password, banking). Admin-pushable include/exclude lists. Incognito window auto-detection (Safari, Chrome, Firefox). |
| URL Filtering | Exclude specific websites and URL patterns from capture. |
| Audio Device Selection | Enable/disable per device. Select specific microphones and system audio sources. |
| Monitor Selection | Choose which displays to record. Exclude specific monitors. Dynamic monitor connect/disconnect handling. |
| Data Retention | Auto-delete data older than 1–90 days (configurable, min 1 day enforced). Batch deletion in 1-hour windows. Disk reclaim via PRAGMA incremental_vacuum. Runs every 5 minutes. |
| DRM Content Detection | Pauses all capture when DRM apps are focused. 11 streaming services (Netflix, Disney+, Hulu, Prime Video, Apple TV+, Peacock, Paramount+, HBO Max, Crunchyroll, DAZN), 10 domains, URL path detection for Amazon Video. Browser detection across 15+ browsers via Accessibility API. |
Sources: retention.rs · drm_detector.rs
PII Detection Engine
Regex-based PII detection engine with 27 pattern categories. Uses RegexSet for single-pass detection. No ML — deterministic, auditable pattern matching. pii_removal.rs
| Category | Patterns |
|---|---|
| Financial | Credit card numbers (4-digit groups), IBAN (ISO 13616) |
| Government IDs | US Social Security Numbers (XXX-XX-XXXX) |
| Contact Info | Email (RFC 5322), formatted phone numbers (with country code), IPv4 addresses |
| Credentials | JWTs, PEM private keys, database connection strings (7 DB types), Bearer tokens, password fields, env var secrets |
| Service API Keys | AWS (AKIA + secrets), GCP, Azure, GitHub, OpenAI (sk-proj/sk-), Anthropic (sk-ant-), Stripe (sk_live/sk_test), Slack (xoxb/xoxp), Discord, GitLab, NPM, PyPI, DigitalOcean, Telegram, Twilio, SendGrid, Mailchimp |
| Secrets | BIP39 seed phrases (12–24 words), 2FA backup codes, password context fields, password UI indicators |
Pipe Permission System
Each automation pipe runs with configurable access controls. Evaluation order: Deny → Allow → Default → Reject. Deny rules always take precedence. permissions.rs
| Rule Type | Description |
|---|---|
Api(METHOD /path) | HTTP endpoint access control. Reader preset: 14 safe endpoints. Writer: +7 mutation endpoints. |
App(name) | Filter by application name (case-insensitive substring). |
Window(glob) | Filter by window title (glob patterns with * and ?). |
Content(type) | Restrict to content types: ocr, audio, input, accessibility. |
Time & Day | Restrict execution to hours (HH:MM-HH:MM, midnight wrap) and weekdays. |
Offline Mode | Blocks all non-localhost outbound network requests from pipes. |
6. Speaker Identification
Screenpipe includes on-device speaker diarization — all processing runs locally with no cloud dependency:
| Component | Details |
|---|---|
| Segmentation Model | Pyannote v3.0 (segmentation-3.0.onnx) — speaker activity detection on 10-second windows, runs via ONNX Runtime on-device |
| Speaker Embeddings | WeSpeaker CAM++ (wespeaker_en_voxceleb_CAM++.onnx) — 192-dimensional voice fingerprint per segment via filterbank features |
| Matching Algorithm | Cosine similarity with configurable threshold (default: 0.9). Embeddings stored locally in SQLite. At capacity: force-merge to closest speaker. |
| Calendar Integration | Calendar-assisted diarization seeds known speakers from meeting attendees, constrains max speakers during active meetings for improved accuracy. |
Sources: embedding_manager.rs · embedding.rs · calendar_speaker_id.rs
7. Enterprise Deployment
| Feature | Details |
|---|---|
| Admin Policy | Lock settings, hide UI sections (chat, timeline, settings). "Managed by [Org]" overlay. Policies sync every 5 min with offline fallback. Enterprise policy via Tauri command layer. |
| MDM Deployment | Deploy via Kandji, Intune, or any MDM. Reads enterprise.json from managed directory. Auto-updates disabled for IT control. |
| License Management | Seat-based licensing with feature matrix. Cached 4 hours, 14-day offline grace period. |
| Team Config Encryption | AES-256-GCM. Key generated on admin device, never sent to server. Shared via passphrase-protected invite (PBKDF2, 600K iterations). |
| Controlled UI | Hide chat, timeline, settings per device. Control AI models, transcription engines, pipe execution, data types. |
| Content Filter Push | Push window/URL filters to all devices. Team filters are additive and cannot be removed by members. |
| API Authentication | Non-localhost requests require Bearer token. Configurable api_auth and api_key per deployment. |
Sources: admin-policy.ts · license-validation.ts · enterprise_policy.rs
MDM Configuration
// Pushed by MDM to: <app_dir>/enterprise.json
// Or manually at: ~/.screenpipe/enterprise.json
// macOS also checks: ../Resources/enterprise.json
{
"license_key": "your-enterprise-license-key"
}8. Network Requests
Core functionality requires zero network requests. Capture, OCR, local transcription, search, and pipes all run offline. Network requests occur only for explicitly enabled optional features:
| Feature | Destination | Data Transmitted |
|---|---|---|
| Cloud Transcription | Deepgram API | Audio chunks (opt-in, admin-disableable) |
| Cloud AI (Pipes) | Screenpipe Cloud (ZDR) | Prompts + context (zero data retention — see §9) |
| OAuth Connections | Google, Notion, etc. | Tokens stored locally, data fetched to device only |
| Analytics | PostHog (eu.i.posthog.com) | Anonymous usage events — no screen content, no PII |
| License Validation | screenpi.pe API | License key only. Cached 4h, 14-day offline grace. |
| Cloud Sync | S3 (encrypted blobs) | ChaCha20-Poly1305 ciphertext only. Server never sees plaintext. |
Enterprise deployments can run fully air-gapped with local transcription (Parakeet/Whisper) and local AI (Ollama). In this configuration the only network request is license validation, which supports 14-day offline operation.
Data Flow Summary
On Device — Always Local, No Network
Screen Capture
ScreenCaptureKit (macOS) WGC (Win) · xcap (Linux)
Text Extraction
Accessibility Tree + OCR (hybrid, conditional)
Audio Processing
CPAL capture · Silero VAD Speaker diarization (ONNX)
Local Storage
SQLite DB (88 migrations) JPEG + Audio files FTS5 full-text index
Optional — Explicit Admin/User Opt-In
Cloud Transcription
Deepgram API (audio only, opt-in)
Cloud AI for Pipes
Screenpipe Cloud (ZDR) Or BYOK: OpenAI, Anthropic
Cloud Sync
Zero-knowledge encrypted ChaCha20-Poly1305 blobs Server never sees plaintext
Never
✕ Screen recordings sent to any server
✕ Unencrypted data transmitted over network
✕ Data shared with third parties
✕ Screenpipe employees accessing user data
9. AI & Transcription Providers
Screenpipe supports fully on-premise AI for both transcription and pipe execution. Enterprise users are free to use their own AI providers. Cloud AI through Screenpipe is optional and operates under zero data retention (ZDR) policies.
On-Premise Transcription
The following speech-to-text engines run entirely on-device with no network dependency. engine.rs
| Engine | Model | Notes |
|---|---|---|
| Parakeet MLX | parakeet-tdt-0.6b-v3-mlx | Metal GPU acceleration on Apple Silicon. 25 languages. Fastest local option. |
| Parakeet CPU | parakeet-tdt-0.6b-v3 | OpenBLAS. Cross-platform. |
| Whisper Large v3 Turbo | ggml-large-v3-turbo.bin | Default. 99 languages. Best accuracy/speed tradeoff. |
| Whisper Large v3 | ggml-large-v3.bin | Highest accuracy. Higher resource usage. |
| Whisper Tiny | ggml-tiny.bin | Lightweight. For resource-constrained devices. |
| Qwen3 ASR | qwen3-asr-0.6b-antirez | 0.6B multilingual model. |
| OpenAI-Compatible | Custom endpoint | Connect any OpenAI-compatible STT API (e.g., on-premise Whisper server). |
AI for Pipes (Automations)
Pipe execution supports multiple AI providers. Enterprise users can bring their own API keys or use fully on-premise models. pipes/mod.rs
| Provider | Type | Data Retention |
|---|---|---|
| Ollama (local) | On-premise | No data leaves device. Runs at localhost:11434. |
| Custom OpenAI-Compatible | On-premise / BYOK | Your infrastructure, your policies. Configurable endpoint, API key, headers. |
| OpenAI (BYOK) | Cloud — user's key | Subject to OpenAI's API data usage policy (not used for training with API keys). |
| Anthropic (BYOK) | Cloud — user's key | Subject to Anthropic's API data usage policy (not used for training with API keys). |
| Screenpipe Cloud | Cloud — managed | Zero data retention (ZDR). OpenRouter configured with data_collection: 'deny'. Vertex AI for Gemini (better retention terms). No prompts or outputs stored. |
Sources: openrouter.ts (ZDR config) · gemini.ts (Vertex routing)
Enterprise recommendation: For maximum data control, deploy with on-premise Ollama or a custom OpenAI-compatible endpoint. No data leaves your network. Cloud AI is entirely optional.
10. Testing & CI/CD Pipeline
Screenpipe maintains a multi-layered testing infrastructure across 14 CI/CD workflows, covering unit tests, integration tests, E2E tests, benchmarks, security audits, and longevity testing.
Testing & Release Pipeline
On Every PR / Push
cargo test
Unit + integration 12 crates, 3 platforms
cargo clippy + fmt
Linting + formatting All packages
cargo audit + deny
Supply chain scan Unused deps (machete)
E2E Tests (WebDriver IO + Mocha)
12 test scripts: app lifecycle, health check, search API, settings, timeline, WebSocket, MCP, onboarding. Platforms: macOS (arm64), Windows (x64), Linux. Video recording for debugging.
PII Removal Tests
27 pattern categories validated. Performance benchmarks for regex engine.
Scheduled
Longevity Test
4-hour stress run. Memory tracking CSV. Windows, every 4h.
Benchmarks (daily)
OCR: Apple / Tesseract / Win. STT: Whisper. DB: search accuracy, FTS perf.
Release
Desktop App: macOS (Intel + Apple Silicon) + Windows + Linux · CLI: cross-platform with LTO, codegen-units=1, strip · MCP Server: npm publish · Code Signing: SSL.com EV (Windows), Apple notarization (macOS)
Sources: ci.yml · style.yml (audit + lint) · e2e-test.yml · longevity-test.yml · benchmark.yml
Security-Specific Testing
| Test Category | Details |
|---|---|
| cargo audit | Dependency vulnerability scanning on every PR. Blocks merge on known CVEs. |
| cargo deny | License compliance + duplicate dependency detection. Prevents supply chain issues. |
| PII Redaction Tests | ONNX-based entity detection tests. Pattern matching validated across 27 categories. |
| Database Integrity | FTS contention tests, heavy read scenarios, FK constraint validation, audio reconciliation. |
| Longevity Testing | 4-hour continuous run on Windows (every 4h). Memory usage tracked via CSV. Detects leaks and resource exhaustion. |
| E2E Security | Settings persistence, API health endpoints, MCP integration, WebSocket stability. |
11. Compliance
| Standard | Status |
|---|---|
| SOC 2 Type II | Compliant. Continuous monitoring of security controls. Audit trail via enterprise policy system. |
| GDPR | Compliant — all data processed and stored locally by default. Full user control over collection, retention, and deletion. No cross-border transfers for core functionality. Data minimization via configurable retention (1–90 days). |
| HIPAA | Compliant — local-first architecture means PHI never leaves the device. On-premise AI eliminates BAA requirements with third-party processors. Configurable retention and access controls. PII detection engine covers healthcare identifiers. |
| CCPA | Compliant — no data sold or shared. Full user control. Deletion via retention settings or manual purge. |
| Open Source Audit | MIT licensed — full source code available for independent security review. 18,000+ stars on GitHub. |
Liability & Data Responsibility
Local-first architecture shifts data liability to the deploying organization. Because all data is processed and stored on the user's device (or the organization's managed devices), Screenpipe does not act as a data processor for core functionality.
Enterprise controls enable compliance ownership: Admin policies, MDM deployment, content filters, data retention, and AI provider selection are all configurable by the organization. The enterprise admin controls what data is captured, how long it is retained, and where (if anywhere) it is transmitted.
Zero data retention for cloud features: When optional cloud features are enabled (cloud AI, cloud sync), data is either encrypted end-to-end (sync) or processed under zero data retention policies (AI). Screenpipe employees cannot access user data at any point in the pipeline.
Open source transparency: All security claims are verifiable against the public source code. Organizations can audit the codebase, fork it, or run modified builds to meet specific compliance requirements.
12. Source Code Audit
Screenpipe is fully open source. The following modules are directly relevant to security review:
For documentation, see the Screenpipe Docs including the Pipes Guide, Pipe Permissions, Teams & Encryption, and Cloud Sync Architecture.
Security Contact
To report a vulnerability, request a security review for your organization, or discuss enterprise deployment: louis@screenpi.pe
Document version 1.2 · April 2026 · Screenpipe v2.3.x · All claims verified against source code