Open Source · Apache 2.0

PII never reaches
your LLM

Drop-in Python SDK that detects and redacts sensitive data before it hits any AI model. Token-direct pipeline with full rehydration.

$ pip install veil-phantom

GitHub Research

Detection Layers

PII Token Types

PII Leaked

<6ms

Overhead

Quick Start

Three lines to privacy

Redact PII, call your LLM with safe tokens, then rehydrate the response with original values. Real data never leaves your machine.

Token-direct mode replaces sensitive values with trackable tokens like [PERSON_1] so the AI understands structure without seeing real data.

app.py

from veil_phantom import VeilClient

veil = VeilClient()

transcript = """
Sarah Chen from Goldman Sachs discussed
the $25M acquisition.
Contact: sarah.chen@gs.com
"""

result = veil.redact(transcript)
# "[PERSON_1] from [ORG_1] discussed
#  the [AMOUNT_1] acquisition.
#  Contact: [EMAIL_1]"

# Send safe tokens to any LLM
ai_response = call_llm(result.sanitized)

# Restore original values
final = result.rehydrate(ai_response)

Capabilities

Built for production AI pipelines

The same privacy engine that powers Veil's meeting intelligence, packaged as a standalone SDK.

7-Layer Detection

Shade V7 NER, gazetteers, regex patterns, NLP entities, and contextual sensitivity analysis working in concert.

Token-Direct Pipeline

Tokens like [PERSON_1] preserve structure while hiding real values. AI understands context, never sees PII.

Full Rehydration

Restore original values in AI responses with one call. Token maps handle multi-turn conversations automatically.

Cross-Cultural Names

Detects Western, African, Asian, Arabic, and South African name patterns, including ASR-mangled transcriptions.

Zero Dependencies Mode

Regex-only mode works with no model downloads and no external dependencies. Add Shade NER when you need maximum coverage.

LLM Integrations

Drop-in wrappers for OpenAI, LangChain, and any LLM. One-liner privacy for existing AI pipelines.

Coverage

19 PII token types

[PERSON_1]

Person Names

Cross-cultural: Western, African, Asian, Arabic, SA names

[ORG_1]

Organizations

Companies, institutions, financial firms, compound names

[EMAIL_1]

Emails

Standard and spoken formats ("john at gmail dot com")

[PHONE_1]

Phone Numbers

International formats, spoken digits

[AMOUNT_1]

Monetary Values

USD, ZAR, verbal amounts ("twelve million")

[DATE_1]

Dates & Times

All formats, relative dates, spoken ordinals

[ADDRESS_1]

Addresses

Street addresses, locations, safe-domain filtering

[GOVID_1]

Government IDs

SSN, SA ID, passport, driver's license

[BANKACCT_1]

Bank Accounts

Account numbers, IBAN, routing numbers

[CARD_1]

Payment Cards

Credit/debit card numbers, partial matches

[IPADDR_1]

IP Addresses

IPv4 addresses

[CASE_1]

Legal Cases

Case numbers, docket references

Performance

98 scenarios. 8 verticals. Zero accuracy loss.

Benchmarked with Claude Haiku 4.5 across financial, healthcare, legal, HR, sales, support, communications, and multi-step agentic workflows. Averaged over 2 runs (784 API calls).

93.3%

Tool Accuracy

vs 91.5% without Veil (+1.9%)

885

PII Detected

Across 13 entity types

95.8%

PII Contained

37.5 of 885 leaked (4.2%)

6ms

Redaction Overhead

Avg per scenario

Without Veil

Raw PII sent to LLM

Tool Accuracy 91.5%

Args Quality 85.0%

Avg Latency 3.79s

PII Exposed All PII sent to model

With VeilPhantom

Token-direct pipeline

Tool Accuracy 93.3% (+1.9%)

Args Quality 84.8%

Avg Latency 3.93s (+6ms redact)

PII Exposed Tokens only, 95.8% contained

VeilPhantom actually improves tool accuracy by +1.9%. Token-structured input helps the model parse arguments more reliably. Args quality is virtually identical (84.8% vs 85.0%), with only 6ms redaction overhead.

Per-Vertical Breakdown

Vertical	Scenarios	Raw Accuracy	Veil Accuracy	PII Found	Leaked
Financial	13	84.6%	84.6%	97	2
Healthcare	12	91.7%	97.9%	79	1
Legal	12	95.8%	100%	130	2
HR	13	100%	92.3%	115	3
Sales	10	100%	100%	79	0
Support	8	100%	100%	64	10
Communications	12	87.5%	91.7%	115	5
Multi-Step	18	71.9%	80.9%	206	13

PII Detection by Type

[PERSON]

285

32.2% of all PII

[EMAIL]

157

17.7% of all PII

[ORG]

114

12.9% of all PII

[AMOUNT]

114

12.9% of all PII

[DATE]

107

12.1% of all PII

[PHONE]

6.6% of all PII

[BANKACCT]

1.9% of all PII

[GOVID] +5 more

SSN, card, IP, address, case

Powered by Shade V7

Layer 0 of the pipeline uses Shade V7, a PhoneticDeBERTa model that learns name patterns, not just spellings. Handles ASR-mangled names across Western, African, Asian, and Arabic naming conventions.

Read the research →

97.1%

F1 Score

22M

Parameters

<50ms

Latency

Integrations

Works with your stack

basic.py

from veil_phantom import VeilClient, VeilConfig

# Regex-only: no model download needed
veil = VeilClient(VeilConfig.regex_only())

result = veil.redact("Sarah sent $12.5M to sarah@gs.com")
print(result.sanitized)
# "[PERSON_1] sent [AMOUNT_1] to [EMAIL_1]"

# Wrap any function: redact → call → rehydrate
output = veil.wrap(text, llm_fn=my_llm)

with_openai.py

from openai import OpenAI
from veil_phantom.integrations.openai import veil_chat

client = OpenAI()

# One-liner: auto-redact, call GPT, rehydrate
response = veil_chat(
    client,
    messages=[{
        "role": "user",
        "content": "Summarize: Sarah sent $5M to jane@acme.com"
    }],
)
# response.content has originals restored
# PII never reached OpenAI's servers

with_langchain.py

from veil_phantom.integrations.langchain import VeilRunnable
from langchain_openai import ChatOpenAI

# Add privacy to any LangChain chain
chain = VeilRunnable() | ChatOpenAI() | VeilRunnable(rehydrate=True)

result = chain.invoke("Meeting notes with Sarah Chen...")
# Privacy at every step of the chain

agentic.py

from veil_phantom import VeilClient

veil = VeilClient()

# Multi-turn: tokens stay consistent
messages = [
    "I'm John Smith. SSN: 123-45-6789",
    "Transfer $50K from account 1234567890",
    "Schedule for March 15th at +1 555-0123",
]

all_tokens = {}
for msg in messages:
    r = veil.redact(msg)
    all_tokens.update(r.token_map)
    send_to_agent(r.sanitized)
    # [PERSON_1] stays [PERSON_1] across turns

PII never reachesyour LLM