Open Source · Apache 2.0

PII never reaches
your LLM

Drop-in Python SDK that detects and redacts sensitive data before it hits any AI model. Token-direct pipeline with full rehydration.

$ pip install veil-phantom
7
Detection Layers
19
PII Token Types
0
PII Leaked
<6ms
Overhead
Three lines to privacy

Redact PII, call your LLM with safe tokens, then rehydrate the response with original values. Real data never leaves your machine.

Token-direct mode replaces sensitive values with trackable tokens like [PERSON_1] so the AI understands structure without seeing real data.

app.py
from veil_phantom import VeilClient

veil = VeilClient()

transcript = """
Sarah Chen from Goldman Sachs discussed
the $25M acquisition.
Contact: sarah.chen@gs.com
"""

result = veil.redact(transcript)
# "[PERSON_1] from [ORG_1] discussed
#  the [AMOUNT_1] acquisition.
#  Contact: [EMAIL_1]"

# Send safe tokens to any LLM
ai_response = call_llm(result.sanitized)

# Restore original values
final = result.rehydrate(ai_response)
Built for production AI pipelines

The same privacy engine that powers Veil's meeting intelligence, packaged as a standalone SDK.

7-Layer Detection

Shade V7 NER, gazetteers, regex patterns, NLP entities, and contextual sensitivity analysis working in concert.

Token-Direct Pipeline

Tokens like [PERSON_1] preserve structure while hiding real values. AI understands context, never sees PII.

Full Rehydration

Restore original values in AI responses with one call. Token maps handle multi-turn conversations automatically.

Cross-Cultural Names

Detects Western, African, Asian, Arabic, and South African name patterns, including ASR-mangled transcriptions.

Zero Dependencies Mode

Regex-only mode works with no model downloads and no external dependencies. Add Shade NER when you need maximum coverage.

LLM Integrations

Drop-in wrappers for OpenAI, LangChain, and any LLM. One-liner privacy for existing AI pipelines.

The 7-layer privacy pipeline
Input
Raw Text
Transcript, email, chat
Layer 0
Shade V7 NER
PhoneticDeBERTa
Layer 1
Gazetteers
873 org entries
Layer 2-3
NLP + Regex
35 patterns
Layer 5
Contextual
Sensitivity analysis
Output
Safe Tokens
[PERSON_1], [ORG_1]
19 PII token types
[PERSON_1]
Person Names
Cross-cultural: Western, African, Asian, Arabic, SA names
[ORG_1]
Organizations
Companies, institutions, financial firms, compound names
[EMAIL_1]
Emails
Standard and spoken formats ("john at gmail dot com")
[PHONE_1]
Phone Numbers
International formats, spoken digits
[AMOUNT_1]
Monetary Values
USD, ZAR, verbal amounts ("twelve million")
[DATE_1]
Dates & Times
All formats, relative dates, spoken ordinals
[ADDRESS_1]
Addresses
Street addresses, locations, safe-domain filtering
[GOVID_1]
Government IDs
SSN, SA ID, passport, driver's license
[BANKACCT_1]
Bank Accounts
Account numbers, IBAN, routing numbers
[CARD_1]
Payment Cards
Credit/debit card numbers, partial matches
[IPADDR_1]
IP Addresses
IPv4 addresses
[CASE_1]
Legal Cases
Case numbers, docket references
98 scenarios. 8 verticals. Zero accuracy loss.

Benchmarked with Claude Haiku 4.5 across financial, healthcare, legal, HR, sales, support, communications, and multi-step agentic workflows. Averaged over 2 runs (784 API calls).

93.3%
Tool Accuracy
vs 91.5% without Veil (+1.9%)
885
PII Detected
Across 13 entity types
95.8%
PII Contained
37.5 of 885 leaked (4.2%)
6ms
Redaction Overhead
Avg per scenario
Without Veil
Raw PII sent to LLM
Tool Accuracy 91.5%
Args Quality 85.0%
Avg Latency 3.79s
PII Exposed All PII sent to model
With VeilPhantom
Token-direct pipeline
Tool Accuracy 93.3% (+1.9%)
Args Quality 84.8%
Avg Latency 3.93s (+6ms redact)
PII Exposed Tokens only, 95.8% contained

VeilPhantom actually improves tool accuracy by +1.9%. Token-structured input helps the model parse arguments more reliably. Args quality is virtually identical (84.8% vs 85.0%), with only 6ms redaction overhead.

Vertical Scenarios Raw Accuracy Veil Accuracy PII Found Leaked
Financial 13 84.6% 84.6% 97 2
Healthcare 12 91.7% 97.9% 79 1
Legal 12 95.8% 100% 130 2
HR 13 100% 92.3% 115 3
Sales 10 100% 100% 79 0
Support 8 100% 100% 64 10
Communications 12 87.5% 91.7% 115 5
Multi-Step 18 71.9% 80.9% 206 13
[PERSON]
285
32.2% of all PII
[EMAIL]
157
17.7% of all PII
[ORG]
114
12.9% of all PII
[AMOUNT]
114
12.9% of all PII
[DATE]
107
12.1% of all PII
[PHONE]
58
6.6% of all PII
[BANKACCT]
17
1.9% of all PII
[GOVID] +5 more
33
SSN, card, IP, address, case

Powered by Shade V7

Layer 0 of the pipeline uses Shade V7, a PhoneticDeBERTa model that learns name patterns, not just spellings. Handles ASR-mangled names across Western, African, Asian, and Arabic naming conventions.

Read the research →
97.1%
F1 Score
22M
Parameters
<50ms
Latency
Works with your stack
basic.py
from veil_phantom import VeilClient, VeilConfig

# Regex-only: no model download needed
veil = VeilClient(VeilConfig.regex_only())

result = veil.redact("Sarah sent $12.5M to sarah@gs.com")
print(result.sanitized)
# "[PERSON_1] sent [AMOUNT_1] to [EMAIL_1]"

# Wrap any function: redact → call → rehydrate
output = veil.wrap(text, llm_fn=my_llm)

Ship AI that respects privacy

Add PII protection to your pipeline in under 5 minutes. Open source, Apache 2.0.

View on GitHub