Train your ML Models Without Compromising Privacy

The Problem:

Using Production Data to Train ML Models Puts Customer Data at Risk

Machine Learning is all about the data and any ML model is only as good as the data it’s trained on, hence the voracious need for production data.

Unfortunately, using production data to train chatbots or other ML projects is frowned upon by data protection regulators as it can end up exposing users’ personally identifiable information (PII) to a broad audience, as this Korean lovebot started doing, or it can even create murderous toasters. 

Enter Private AI:

Preventing Downstream Accuracy Loss with Synthetic PII Generation

Private AI can generate synthetic PII that fits the context of the surrounding text. Taking production data and replacing all PII with contextually relevant synthetic data is an excellent way to get the data needed to train your models without compromising the privacy of all the user data within those datasets.

And it’s highly secure 

In the event of a breach, it’s nearly impossible to distinguish synthetic PII from real PII, so the risk of identifying any accidentally-exposed PII is minimal. Additionally, the ML-powered PII generator never sees the original PII, providing a simple privacy guarantee without a lot of math.

 

Designed for Developers

Our system is packaged in a single Docker container and is deployed in your systems with just a few lines of code so you can quickly add privacy protection to your data pipeline. Read more about installation in our docs.

Private AI plugs seamlessly into your existing infrastructure.

Why Private AI

Unrivalled Accuracy

Private AI uses the latest advancements in machine learning to achieve remarkable accuracy out of the box. See how we stack up against our competitors in our technical whitepaper

Private AI
Major Cloud Provider 2
Open Source Software 2
Open Source Software 1
Major Cloud Provider 1
Major Cloud Provider 3
0.80 0.90 1

Try it yourself on your own data:

From all of the PII redaction products we’ve seen out there (and believe me, we’ve seen all of them), Private AI is the best one by far in terms of accuracy, types of data that can be redacted, and flexibility of their models. After doing a side by side comparison it quickly became clear to us that we couldn’t go back to using something like AWS Comprehend.

Sebastian Jimenez
Founder, Rilla Voice

Recall

Tested on a dataset composed of messy conversational data containing sensitive health information. Download our whitepaper for further details, as well as how we perform on precision and F1-score or contact us to get a copy of the evaluation code.

Download the Free Report

Request an API Key

Fill out the form below and we’ll send you a free API key for 500 calls (approx. 50k words). No commitment, no credit card required!

Language Packs

Expand the categories below to see which languages are included within each language pack.
Note: English capabilities are automatically included within the Enterprise pricing tier. 

French
Spanish
Portuguese

Arabic
Hebrew
Persian (Farsi)
Swahili

French
German
Italian
Portuguese
Russian
Spanish
Ukrainian
Belarusian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hungarian
Icelandic
Latvian
Lithuanian
Luxembourgish
Polish
Romanian
Slovak
Slovenian
Swedish
Turkish

Hindi
Korean
Tagalog
Bengali
Burmese
Indonesian
Khmer
Japanese
Malay
Moldovan
Norwegian (Bokmål)
Punjabi
Tamil
Thai
Vietnamese
Mandarin (simplified)

Arabic
Belarusian
Bengali
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Khmer
Korean
Latvian
Lithuanian
Luxembourgish
Malay
Mandarin (simplified)
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Thai
Turkish
Ukrainian
Vietnamese

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.