Identify, Redact, and Replace Personally Identifiable Information in Unstructured Text

Your Data, Your Way

We apply the latest advancements in transformer architectures to pick up PII based entirely on context, which makes us particularly effective on semi-structured and unstructured data.

With Private AI data, security, and machine learning teams can:

Accurately redact data like ASR transcripts, chat logs, and electronic health records with less than half the error rate compared to alternatives.
Retain full ownership of your data; it’s never shared with us, and never leaves your infrastructure.
Identify and redact text in 52 languages (and growing).
Configure output extensively including turning entities on/off, add block or allow lists, and complying with privacy regulations like GDPR, LGPD, HIPAA, and more.
Custom entity types developed with the few shot learning techniques Private AI uses.

Download our whitepaper

Built by experts from:

How It Works

Private AI is deployed via a single container on-prem so you can easily add our powerful redaction capabilities to any data workflow. The container is accessed via a REST API and can be easily customizable depending on your team’s needs.

Try our web demo

PII IDENTIFICATION

Privacy is More Than a ML Model

Private AI detects more than 50 different entity types of personally identifiable information (PII) across 52 languages. Using our contextually aware ML models, we go beyond traditional entity detection to recognize many different kinds of direct and quasi-identifiers.

What is and isn’t PII gets complicated, and Private AI’s team of privacy experts ensures our system works in compliance with major legislation like GDPR, CPRA, and HIPAA.

Private AI can be easily implemented as a filter to screen for PII in any data flow or database.

Book a demo


{
  "result": "Hi [NAME_1], [NAME_2] this side. It's been a while since we last met in [LOCATION_CITY_1].",
  "result_fake": null,
  "pii": [
    {
      "marker": "NAME_1",
      "text": "John",
      "best_label": "NAME",
      "stt_idx": 3,
      "end_idx": 7,
      "labels": {
        "NAME": 0.8446
      }
    },
    {
      "marker": "NAME_2",
      "text": "Grace",
      "best_label": "NAME",
      "stt_idx": 9,
      "end_idx": 14,
      "labels": {
        "NAME": 0.8399
      }
    },
    {
      "marker": "LOCATION_CITY_1",
      "text": "Berlin",
      "best_label": "LOCATION_CITY",
      "stt_idx": 63,
      "end_idx": 69,
      "labels": {
        "LOCATION_CITY": 0.8778,
        "LOCATION": 0.8512
      }
    }
  ],
  "api_calls_used": 1,
  "output_checks_passed": true
}

TEXT DE-IDENTIFICATION

Redact at Higher Than Human Accuracy

Private AI can replace all the PII detected with unique identifiers (ie. NAME_1, CVV_3, CREDIT_CARD_2) to produce redacted transcripts or de-identified data. Alternatively, replace PII with a mask character. Look at our docs to learn more.

‍Unrivalled Accuracy

50+ entity types
52 languages (and growing)
Processes 70,000 words/second
Less than half the error rate compared to alternatives

No third party access
No regexes
Real-time redaction
Complies with GDPR, HIPAA, CPRA, and more

Book a demo

SYNTHETIC PII GENERATION

Never Use Transformers Without Privacy Mitigation

After PII is removed, Private AI can generate synthetic PII to replace all the PII found with fake data that fits the surrounding context.

The synthetic PII generator never sees the original data, eliminating sensitive data leakage. The resulting text further reduces re-identification risk, as an adversary must first identify what PII is real. Good luck finding a piece of straw in a pile of hay!

Taking production data and replacing all PII with synthetic data also minimizes data shift from the original data, which is highly beneficial when creating ML models.

Book a demo

TOKENIZATION & PSEUDONYMIZATION

Reverse PII Removal As Needed

Replace PII with encrypted tokens using Private AI’s tokenization feature. Sometimes referred to as pseudonymization, tokenization preserves the utility of the data while still protecting what’s sensitive.

Tokenization is reversible, allowing you to easily recover the original data. Contact us for documentation and access.

Book a demo

Address
Age
Bank Account
Blood Type
City
Condition
Coordinates
Country
Credit Card
Credit Card Expiration
CVV
Date
Date Interval

Date of Birth
Dose
Drivers License
Drug
Email Address
Event
Family Name
Filename
Gender Sexuality
Given Name
Healthcare Number
Injury
IP Address

language
Marital Status
Medical Process
Money
Name
Numerical PII
Occupation
Organization
Origin
Passport Number
Password
Phone Number
Physical Attribute

Political Affiliation
Religion
Routing Number
SSN
State
Statistics
Time
URL
Username
Vehicle ID
ZIP
Zodiac Sign

For a detailed listing of entities complete with descriptions, examples, and regulatory compliance information visit our docs.

English
French
German
Hindi
Italian
Korean
Portuguese
Russian
Spanish
Tagalog
Ukranian
Arabic
Bengali

Belarusian
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hebrew
Hungrian

Icelandic
Indonesian
Khmer
Latvian
Lithuanian
Luxembourgish
Malay
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Punjabi
Romanian

Slovak
Slovenian
Swahili
Swedish
Tamil
Thai
Turkish
Vietnamese
Mandarin (simplified)
Japanese
Cantonese (traditional)*
Haitian Creole*
Mandarin (Traditional)*

For a detailed listing of languages complete with support levels, ISO code, and release details visit our docs.
* Coming soon

Try It Free Today

Get started now

Identify, Redact, and Replace Personally Identifiable Information in Unstructured Text

Your Data, Your Way

How It Works

Privacy is More Than a ML Model​

Redact at Higher Than Human Accuracy

Never Use Transformers Without Privacy Mitigation​​

Reverse PII Removal As Needed

Try It Free Today

Privacy is More Than a ML Model

Never Use Transformers Without Privacy Mitigation