At Private AI, we pride ourselves on making scalable solutions for large companies, small businesses, and government institutions that need to securely handle the potentially sensitive data and information that is flowing through their infrastructure. We understand the importance of maintaining data privacy and meeting regulatory compliance standards, and our goal is to ensure that migrating to our platform will be a smooth and straightforward process.
This guide will cover the following topics:
- – Purpose and Intended Audience
- – Prerequisites
- – Useful Links
- – Terms Used in This Guide
- – Why Migrate From Amazon AWS Comprehend to Private AI?
- – Examining a Real World Scenario: Customer Service Chat Messages
- – Identifying Sensitive Data with AWS Comprehend
- – Identifying Sensitive Data with the Private AI Process Text Service
- – Locating Sensitive Data with AWS Comprehend
- – Locating Sensitive Data with the Private AI Process Text Service
- – Redacting Sensitive Data with AWS Comprehend
- – Redacting Sensitive Data with the Private AI Process Text Service
- – Conclusion
Purpose and Intended Audience
The purpose of this guide is for managers, CIOs, and CISOs to understand the benefits of using Private AI instead of AWS Comprehend. We have various features and capabilities which set us apart from other solutions and is the reason why Private AI has been recognized by Gartner as a Cool Vendor in their “Cool Vendors in Privacy, 2023” report.
This guide will also show application developers who are responsible for maintaining applications and services that have already been deployed using Amazon AWS Comprehend. This guide will show you all the steps necessary to migrate your existing workflows to Private AI for privacy, regulatory compliance, and smart redaction of your potentially sensitive data while maintaining the integrity and utility of your information.
Prerequisites
In order to fully complete all the steps provided in this guide, you need to have the following:
- Knowledge of an already existing application or service that utilizes AWS Comprehend
- A free account on the Private AI Portal
- A free Private AI API key (login to the portal to create one)
- Access to the Private AI Cloud API (or) completed the steps of the Installation Guide
Useful Links
In this guide we’re going to cover the steps necessary to migrate your workflows from AWS Comprehend to Private AI. Therefore, the following resource will be useful for anyone following the steps in this guide:
The Private AI Process Text API documentation
Terms Used in This Guide
Below is a list of terms that will be used in this guide.
Why Migrate from Amazon AWS Comprehend to Private AI?
For compliance with various regulations like HIPAA, PCI DSS, and GDPR, having robust tools for data privacy and security is non-negotiable. Although AWS Comprehend offers basic capabilities for detecting and redacting PII, below is a short list of features of why customers are migrating their privacy workflows to Private AI.
— Private AI supports both on-prem and cloud deployments. Simply stated, this means that Private AI gives you the option to keep your data totally within your network infrastructure. This greatly enhances your control over data security and privacy, ensuring that sensitive information remains within your secure environment. However, if you are currently using cloud providers to host your infrastructure (such as Amazon AWS, Google, or others), you can easily migrate your privacy workflows to Private AI with a few lines of code.
— Private AI can accomplish in a single HTTP request what AWS Comprehend needs to do in 3 separate requests. Using Private AI you can label, locate, and redact sensitive information in a single HTTP request. Period. Using AWS Comprehend, all of these operations are 3 different HTTP requests. This equates to more CPU time, network bandwidth, and expense spent on their services compared to Private AI’s platform.
— Private AI doesn’t have rights to your data. Your data is your data. Private AI doesn’t store, access, or use your data for any other purpose than to process your sensitive information. This is by-default with our on-prem solution, but also applies to our cloud service as well.
— Private AI can process your data in over 50 languages. The world is a big place and PII can be hidden in other languages besides English. Private AI’s solution supports over 50 languages and enables you to support regulatory compliance requirements in multiple global regions. See the full list of supported languages here.
— Private AI supports over 50 entity types. Among the languages that are supported, Private AI can identify over 50 entity types of PII. This includes names, addresses, credit card data, and much more. See the full list of supported entity types here.
— Private AI doesn’t try to “vendor lock-in” you to other services that you don’t need. Other services require you to pay for cloud storage in order to process your sensitive information. Private AI provides you the ability to deploy a Docker container within your own infrastructure, therefore eliminating unnecessary monthly storage costs and paying for services that you don’t need.
— Private AI can process data within text and binary file formats such as PDF, Word, and Excel. If you send Private AI a supported file format, then our API can send you the same file type back. This greatly reduces the complexity of integrating our services into your existing workflows and saves you time in processing document formats. See the full list of supported document types here.
— Private AI can process data within images. We can identify PII as text in an image. This means that even visual data is not beyond the scope of privacy protection, extending the capabilities of PII detection and redaction to a broader range of media formats. See the full list of supported image types here.
— Private AI can process data within audio files. If you have customer support calls with PII, then Private AI can analyze and redact sensitive information, ensuring that audio recordings comply with privacy regulations while maintaining the quality and utility of the data for customer service and analysis. See the full list of supported audio file types here.
Migrating From AWS Comprehend to Private AI
Now that we’ve seen a list of the major benefits of Private AI compared to other services, let’s use a real world example of PII in order to show how to migrate the three following capabilities of AWS Comprehend to Private AI:
- – Identifying Sensitive Data
– Locating Sensitive Data
– Redacting Sensitive Data
Examining a Real-World Scenario: Customer Service Chat Messages
Listing 1 below contains an excerpt of a fictitious interaction of a customer with a bank representative over a text chat window. Paulo, the customer, is chatting with a customer service representative to inquire about his recent credit card statement. In this situation, the customer service agent typed the following response.
Now let’s compare and contrast how the same input is processed using AWS Comprehend versus Private AI.
Identifying Sensitive Data with AWS Comprehend
Using AWS Comprehend with the example text in Listing 1 above, you will be provided with a JSON response as shown in Listing 2 below.
{
"Labels": [
{
"Name": "NAME",
"Score": 0.9149109721183777
},
{
"Name": "CREDIT_DEBIT_NUMBER",
"Score": 0.5698626637458801
};’[
{
"Name": "ADDRESS",
"Score": 0.9951046109199524
}
]
}
Listing 2. The AWS Comprehend JSON Result of Identifying Sensitive Data From Listing 1
With AWS Comprehend you are provided with the PII Entity type and a score that gives you its estimation of the accuracy of the identifying process for each entity type. Now let’s see how to process the same text in Listing 1 above using Private AI.
Identifying Sensitive Data with the Private AI Process Text Service
The Private AI Process Text Service provides 3 capabilities in a single call to the service: identifying, locating, and smart redaction. For more information about how to invoke the Process Text Service (including all the options and parameters available), please refer to the Process Text API documentation.
Listing 3 below is the cURL command necessary to invoke the Process Text Service.
curl -i -X POST \
--location 'https://api.private-ai.com/deid/v3/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: $API-KEY' \
--data '{
"text": [
"Hello Paulo Santos. The latest statement for your credit card account 1111-0000-1111-0000 was mailed to 123 Any Street, Seattle, WA 98109."
]
}'
Listing 3. The cURL Command to Invoke the Process Text Service
As you can see, all you need to do is to provide a valid API key as well as the text that you want to be labeled. Additional parameters, such as the link_batch or the entity_detection can be specified as optional parameters in the request.
After executing the request to the service, the JSON response is shown in Listing 4 below.
[
{
"processed_text": "Hello [NAME_1]. The latest statement for your credit card account [CREDIT_CARD_1] was mailed to [LOCATION_ADDRESS_1].",
"entities": [
{
"processed_text": "NAME_1",
"text": "Paulo Santos",
"location": {
"stt_idx": 6,
"end_idx": 18,
"stt_idx_processed": 6,
"end_idx_processed": 14
},
"best_label": "NAME",
"labels": {
"NAME": 0.9226,
"NAME_GIVEN": 0.4508,
"NAME_FAMILY": 0.4557
}
},
{
"processed_text": "CREDIT_CARD_1",
"text": "1111-0000-1111-0000",
"location": {
"stt_idx": 70,
"end_idx": 89,
"stt_idx_processed": 66,
"end_idx_processed": 81
},
"best_label": "CREDIT_CARD",
"labels": {
"CREDIT_CARD": 0.9176
}
},
{
"processed_text": "LOCATION_ADDRESS_1",
"text": "123 Any Street, Seattle, WA 98109",
"location": {
"stt_idx": 104,
"end_idx": 137,
"stt_idx_processed": 96,
"end_idx_processed": 116
},
"best_label": "LOCATION_ADDRESS",
"labels": {
"LOCATION_ADDRESS": 0.9415,
"LOCATION_ADDRESS_STREET": 0.309,
"LOCATION": 0.9024,
"LOCATION_CITY": 0.1033,
"LOCATION_STATE": 0.1048,
"LOCATION_ZIP": 0.211
}
}
],
"entities_present": true,
"characters_processed": 138,
"languages_detected": {
"en": 0.8629347681999207
}
}
]
Listing 4. The Private AI Process Text JSON Result of the Sensitive Data From Listing 1
Let’s delve deeper and analyze Listing 4 above. The Private AI Process Text Service performs identifying, locating, and smart redaction in a single JSON response.
The entities Array will contain all PII entities found in the original text. Please note that the entities[].best_label property will contain the PII Label such as “NAME”, “LOCATION_ADDRESS, or “CREDIT_CARD”.
Upon further analysis of the results, you can see that the entities[].labels property not only provides a number regarding the accuracy of the identifying process, it also provides more details about the PII entity itself. Specifically, in this case:
"text": "Paulo Santos",
...
"best_label": "NAME",
"labels": {
"NAME": 0.9226,
"NAME_GIVEN": 0.4508,
"NAME_FAMILY": 0.4557
}
The Private AI Process Text Service identifies the text, “Paulo Santos” as a first name and a last name.
Now let’s compare and contrast how to locate sensitive information with AWS Comprehend vs Private AI.
Locating Sensitive Data Entities with AWS Comprehend
Let’s use the same example scenario of Paulo Santo interacting with a customer service agent shown in Listing 1, above. Listing 5 below shows the JSON result when using AWS Comprehend to locate sensitive information.
{
"Entities": [
{
"Score": 0.9999669790267944,
"Type": "NAME",
"BeginOffset": 6,
"EndOffset": 18
},
{
"Score": 0.8905550241470337,
"Type": "CREDIT_DEBIT_NUMBER",
"BeginOffset": 69,
"EndOffset": 88
},
{
"Score": 0.9999889731407166,
"Type": "ADDRESS",
"BeginOffset": 103,
"EndOffset": 138
}
]
}
Listing 5. The AWS Comprehend JSON Result of Locating Sensitive Data From Listing 1
Within the JSON response, the Entities Array contains properties such as Type, BeginOffset, and EndOffset to show you where to locate within your original text the sensitive information.
Locating Sensitive Data with the Private AI Process Text Service
As previously stated, the Private AI Process Text Service provides identifying, locating, and smart redaction as a single HTTP request. For more information about how to invoke the Process Text Service (including all the options and parameters available), please refer to the Process Text API documentation.
Refer to Listing 3 above for the cURL command necessary to invoke the Process Text Service, as well as Listing 4 for the full JSON response from the service.
Again, the entities Array will contain all PII entities found in the original text. Please note that the entities[].location Array will contain the starting and the ending indices of the PII information in stt_idx and end_idx properties, respectively.
"entities": [
{
"processed_text": "NAME_1",
"text": "Paulo Santos",
"location": {
"stt_idx": 6,
"end_idx": 18,
"stt_idx_processed": 6,
"end_idx_processed": 14
}
...
So, in our example the name, “Paulo Santos”, starts at position 6 and ends at position 18 in the original text. As you can see from the code snippet above, we also provide the starting and ending indices of the anonymized text as elements in the Array.
Now let’s wrap things up and compare and contrast how to redact sensitive information with AWS Comprehend vs Private AI.
Redacting Sensitive Data with AWS Comprehend
Again, we’re going to use the same example scenario of Paulo Santo interacting with a customer service agent shown in Listing 1, above. Listing 6 below shows the JSON result when using AWS Comprehend to redact sensitive information.
{
Hello ***** ******. The latest statement for your credit card account ******************* was mailed to *** *** ******* ******** ** *****
}
Listing 6. The AWS Comprehend JSON Result of Redacting Sensitive Data From Listing 1
Redacting Sensitive Data with the Private AI Process Text Service
Let’s now turn our attention to redacting using the Private AI Process Text Service. Refer to Listing 3 above for the cURL command necessary to invoke the Process Text Service, as well as Listing 4 for the full JSON response from the service.
Now in this particular case, we’re only interested in knowing what text was redacted, which can easily be found in the processed_text property of the JSON result, as shown in the code snippet below:
{
"processed_text": "Hello [NAME_1]. The latest statement for your credit card account [CREDIT_CARD_1] was mailed to [LOCATION_ADDRESS_1].",
...
}
Sanitizing Data, Without Sterilizing the Valuable Information
As you can see from the response above, the Private AI Process Text Service not only fully redacted the sensitive information, but also provided the types of the PII in the response.
Our offering allows you to sanitize any data that is flowing through your infrastructure, without the need to completely sterilize the information which you can use at a later stage for analytical purposes.
Redacting with Masking the Sensitive Information
We also provide our customers the option to redact sensitive information by completely masking any PII by adding the “type”: “MASK”, parameter and the specific mask character that you prefer. In this example, we’re going to specify that we want “#” to replace any PII, therefore we will also add, “mask_character”: “#” to the request. The updated cURL command for the Process Text Service is shown in Listing 7, below.
curl -i -X POST \
--location 'https://api.private-ai.com/deid/v3/process/text' \
--header 'Content-Type: application/json' \
--header 'x-api-key: $API-KEY' \
--data '{
"text": [
"Hello Paulo Santos. The latest statement for your credit card account 1111-0000-1111-0000 was mailed to 123 Any Street, Seattle, WA 98109."
],
"processed_text": {
"type": "MASK",
"mask_character": "#"
}
}’
Listing 7. The Updated cURL Command to Invoke the Process Text Service to Specify a Mask Character
After invoking the Process Text Service, Listing 8 below shows all PII replaced with the desired mask character.
[
{
"processed_text": "Hello ############. The latest statement for your credit card account ################### was mailed to #################################.",
"entities": [
{
"processed_text": "############",
"text": "Paulo Santos",
"location": {
"stt_idx": 6,
"end_idx": 18,
"stt_idx_processed": 6,
"end_idx_processed": 18
},
"best_label": "NAME",
"labels": {
"NAME": 0.9226,
"NAME_GIVEN": 0.4508,
"NAME_FAMILY": 0.4557
}
},
{
"processed_text": "###################",
"text": "1111-0000-1111-0000",
"location": {
"stt_idx": 70,
"end_idx": 89,
"stt_idx_processed": 70,
"end_idx_processed": 89
},
"best_label": "CREDIT_CARD",
"labels": {
"CREDIT_CARD": 0.9176
}
},
{
"processed_text": "#################################",
"text": "123 Any Street, Seattle, WA 98109",
"location": {
"stt_idx": 104,
"end_idx": 137,
"stt_idx_processed": 104,
"end_idx_processed": 137
},
"best_label": "LOCATION_ADDRESS",
"labels": {
"LOCATION_ADDRESS": 0.9415,
"LOCATION_ADDRESS_STREET": 0.309,
"LOCATION": 0.9024,
"LOCATION_CITY": 0.1033,
"LOCATION_STATE": 0.1048,
"LOCATION_ZIP": 0.211
}
}
],
"entities_present": true,
"characters_processed": 138,
"languages_detected": {
"en": 0.8629347681999207
}
}
]
Listing 8. The Private AI Process Text JSON Result of the Sensitive Data From Listing 1 Using a Character Mask
Conclusion
The Private AI Process Text Service is a straightforward and easy to use service that’s versatile to provide multiple privacy protecting features in a single HTTP request (or API call). If you already have privacy workflows that utilize AWS Comprehend, this guide has shown you how to simplify your development effort when processing sensitive data, all while keeping the value of the information itself. With support for over 50 languages and on-prem deployments, we offer a robust and scalable solution that makes it easy for your organization to comply with global privacy regulations.
Sign up for our Community API
The “get to know us” plan. Our full product, but limited to 75 API calls per day and hosted by us.
Get Started Today