The problem
Legal professionals handle some of the most sensitive information in existence — medical records, financial statements, personal identifiers. The standard approach in most software is to send this data to the cloud for processing. We think that's the wrong model for this kind of work.
Qvault takes a different approach: every byte of every document stays on your computer. No cloud processing, no uploads, no server-side analysis. The entire pipeline — from document ingestion to PII detection to redacted export — runs locally.
This article explains how that works under the hood.
Key numbers
0 bytes sent to the cloud. 5 jurisdictions covered. 18 IPC commands in the bridge layer. <100ms typical scan time.
Local-first architecture
Qvault is built on Tauri v2, which pairs a Rust backend with a React 18 frontend rendered through native webviews. Unlike Electron, there's no bundled Chromium — the app uses the operating system's own webview, which keeps the installed size around 15-20 MB.
The Rust backend handles everything security-critical: document parsing, encryption, PII detection, and storage. The frontend is purely a presentation layer. Communication between the two happens through 18 IPC commands that cover:
- Document management (import, list, delete)
- PII scanning and detection
- Redaction review and approval
- Cross-document entity tracking
- Redacted document export
- License management
These are in-process function calls, not network requests. The serialization overhead is measured in microseconds.
Dual-layer PII detection engine
The detection system runs two independent passes over every document. Each layer catches different types of sensitive information, and their results are merged with overlap prevention.
Layer 1: Pattern-based regex scanner
The first layer uses compiled regular expressions to detect structured data with known formats:
- Email addresses and phone numbers
- Credit card numbers (with Luhn validation)
- IBAN numbers
- Dates in various formats
- Region-specific identifiers: US Social Security numbers, EU VAT numbers, Brazilian CPF/CNPJ, German tax IDs
The regex patterns are compiled to deterministic finite automata, which guarantees linear-time matching regardless of input size. These detections get a 0.95 confidence score — high, because structured patterns have very low false-positive rates.
Layer 2: Heuristic context scanner
The second layer handles unstructured data — primarily names and company entities, which don't have fixed formats. It runs six detection passes that analyze:
- Company legal suffixes (LLC, GmbH, S.A., Ltd, etc.)
- Name-like patterns using capitalization and word-boundary analysis
- Distribution tables for common first and last names
- Contextual signals from surrounding text
To minimize false positives, the heuristic layer maintains 25 stop phrases and 71 stop words — common terms that look like names but aren't (e.g., "General Court", "Supreme Court"). Confidence scores for heuristic detections range from 0.75 to 0.92 depending on the strength of the contextual signal.
Document processing pipeline
When a document enters Qvault, it goes through a fixed pipeline:
- Upload — the file is read from disk into memory
- Encrypt — immediately encrypted at rest before any processing
- Extract text — parsed to extract searchable text with position coordinates
- Scan PII — both detection layers run against the extracted text
- Review — the user approves, rejects, or edits each detection
- Export — a redacted copy is generated with black-box redactions
Text extraction
PDF parsing uses lopdf to walk the PDF object tree, while PDF.js provides coordinate-mapped text spans for precise overlay positioning. DOCX files are handled as ZIP archives with XML parsing to extract text runs and paragraph structure.
Encryption
All documents are encrypted at rest using AES-256-GCM. The encryption keys are machine-bound, derived from SHA-256 hashing of machine-specific identifiers. Each encryption operation uses a random 96-bit nonce generated from the OS cryptographic RNG, and a 16-byte authentication tag detects any tampering.
The key insight: even if someone copies the database file to another machine, the documents can't be decrypted because the key derivation is bound to the original hardware.
Storage
Everything is stored in a local SQLite database running in WAL mode for concurrent read/write access. Seven tables track documents, redactions, entities, page text, licenses, audit logs, and credits.
One particularly useful feature: the cross-document entity knowledge base. As you process more documents, Qvault builds a database of known entities (people, companies) across your document corpus. This means detection accuracy improves over time — if a name was confirmed in one document, it gets flagged with higher confidence in subsequent documents.
Frontend rendering
The frontend manages a multi-layer rendering system for the document viewer:
- PDF canvas rendering at 1.5x viewport scale for crisp display
- Text span extraction with dual coordinate systems (PDF space and screen space)
- Color-coded redaction overlays that the user can accept, reject, or manually adjust
- Coordinate mapping back to PDF space for accurate export
The challenge here is keeping two coordinate systems in sync. PDF coordinates are bottom-left origin with points as units. Screen coordinates are top-left origin with pixels. Every overlay position requires a transformation between these systems, accounting for zoom level, page offset, and DPI scaling.
What Qvault does not do
This is as important as what it does:
- No cloud uploads of any kind
- No telemetry or usage tracking
- No third-party processing APIs
- No external model downloads
- No temporary cloud storage
The trust model is simple: trust the machine, distrust the network. By eliminating all network-based processing, we eliminate an entire category of threat vectors. There are no API keys to leak, no cloud buckets to misconfigure, no data-in-transit to intercept.
Cross-platform distribution
Qvault ships native binaries for all major platforms:
- macOS: .dmg installers for both Apple Silicon and Intel
- Windows: .exe and .msi installers with multilingual support
- Linux: .deb packages and AppImage
Jurisdictional coverage
The PII detection engine covers five jurisdictions out of the box: global patterns (email, phone, credit cards), US-specific (SSN, state IDs), EU-specific (VAT, IBAN), Brazilian (CPF, CNPJ), and German (Steuer-ID, tax numbers). This enables firms with international practices to use a single tool across their entire document corpus.
About Qvault
Qvault is built by Santacroce SL in Madrid. For more information, visit qvault.tech.