What’s New in Version 3.8

Share This Post

Hello, dear community!

We continue to see exciting improvements released since 3.7. Here is a synopsis of highlights from the 3.8 release.

Translated Redaction Labels

Private AI supports text processing in multiple languages, and redaction markers are now also available in multiple languages. See the Supported Language documentation for more information on which languages support translated labels.

WebSocket Context

The Private AI model now maintains context between WebSocket messages. Similar to link_batch, text inputs are processed together as a single input. This helps deliver more consistent redaction markers across a series of interactions, like an SMS chat conversation. 

Black Box Image Redaction

Private AI offers the ability to perform black-box redaction on images. For more information, please visit the image processing documentation.

Examples, guides and more

We continue to improve our docs site. We’ve introduced a new Getting Started guide, additional documentation around API fundamentals, and additional guides.

New Language Support

Extended support for the Georgian language has been introduced, expanding the model’s linguistic capabilities to 53 languages.

New Entity Type

A new entity, `LOCATION_ADDRESS_STREET`, has been added. This entity captures street names and numbers, including unit numbers, providing more granularity than the existing `LOCATION_ADDRESS` entity, which captures complete addresses.

New Ram Requirements

The container now requires 64 GB of RAM to ensure reliable and performant operation when processing files. A warning message will be presented if the system does not have sufficient memory.

Model Improvements

These updates signify a substantial step forward in the model’s functionality and security, broadening its applicability and accuracy across various languages and data types.

Model updates since 3.7 include:

  • Enhanced detection of numerical entities in multiple languages (French, Spanish, English, Japanese, Portuguese, and Dutch), including specific improvements for social security numbers (SSN) and credit card numbers.
  •  
  • Improved identification of partial credit card and SSN numbers, especially spoken or transcribed, across several languages (Spanish, Dutch, Korean, German, Italian).
  •  
  • Better detection of `BANK_ACCOUNT`, `MEDICAL_PROCESS`, `TIME`, and `MONEY` entities in various languages, with specific enhancements for PII detection in Mandarin (simplified script) and Spanish, focusing on regional equivalents of identifiers.
  •  
  • General improvements in detecting numerical entities written as words, benefiting multilingual text processing.

 Cheers,

The Private AI Product Team

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore

Download the Free Report

Request an API Key

Fill out the form below and we’ll send you a free API key for 500 calls (approx. 50k words). No commitment, no credit card required!

Language Packs

Expand the categories below to see which languages are included within each language pack.
Note: English capabilities are automatically included within the Enterprise pricing tier. 

French
Spanish
Portuguese

Arabic
Hebrew
Persian (Farsi)
Swahili

French
German
Italian
Portuguese
Russian
Spanish
Ukrainian
Belarusian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hungarian
Icelandic
Latvian
Lithuanian
Luxembourgish
Polish
Romanian
Slovak
Slovenian
Swedish
Turkish

Hindi
Korean
Tagalog
Bengali
Burmese
Indonesian
Khmer
Japanese
Malay
Moldovan
Norwegian (Bokmål)
Punjabi
Tamil
Thai
Vietnamese
Mandarin (simplified)

Arabic
Belarusian
Bengali
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Khmer
Korean
Latvian
Lithuanian
Luxembourgish
Malay
Mandarin (simplified)
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Thai
Turkish
Ukrainian
Vietnamese

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.