The CIO’s and CISO’s Guide for Proactive Reporting and DLP with Private AI and Elastic

Share This Post

Being able to manage the data and information within a company’s infrastructure is critical for properly assessing when sensitive information is either being mismanaged or to report an “all clear” when company policies are being followed as intended.

As you may be already aware, Private AI provides PII detection and redaction services to enable companies to protect their sensitive information. However, when leveraged with the ELK Stack (a suite of tools tailored for searching, managing, analyzing, and visualizing data within the corporate infrastructure) we provide CIOs and CISOs with the ability to have complete situal awareness of your company’s infrastructure in regards to company policy compliance and security practices.

Here is a list of topics covered in this guide:

  • – Purposes and Intended Audience
  • – Useful Links
  • – Terms Used in This Guide
  • – Quick Refresher: What is Elastic and the “ELK Stack”?
  • – Adequate DLP Does Not Need to Be Overly Complicated
  • – Example Scenarios Where You Need PII Detection/Redaction with Powerful Indexing 
    • – Scenario #1 – Company A: Full Credit Card Data is Stored Within a Spreadsheet
    • – Scenario #2 – Company B: Sensitive Information Being Shared in Customer Support Chat Windows is Not Being Redacted
  • – Conclusion

Purposes and Intended Audience

The purpose of this guide is for managers, CIOs, and CISOs to understand the benefits of using Private AI with Elastic integrations. When used together, we can help you visualize your datasets and index sensitive information within your organization so you can understand three critical things: 

  • – What is being stored?
  • – Is it being stored properly according to company policies and compliance?
  • – Where is it located?

Useful Links

In this guide we’re going to explain how Private AI integrates with Elastic for various proactive reporting use cases, therefore, please refer to our ELK Reporting Integration Guide on Elastic for more details on the integration process and to see a list of the configurable parameters. 

Terms Used in This Guide

Below is a list of terms that will be used in this guide.

Term

Definition

DLP

Data Loss Prevention

ELK

Elastic Logstash Kibana

Entity

Any specific piece of information within a document or dataset that can be classified as sensitive data, such as PII and PCI

On-prem

Application software or services that are run within a network on infrastructure that is controlled by an organization

PCI

Payment Card Industry

PII

Personally Identifiable Information

Quick Refresher: What is Elastic and the “ELK Stack”?

The “ELK Stack” refers to a powerful combination of three products from Elastic: Elasticsearch, Logstash, and Kibana. Together, they provide an integrated solution for searching, analyzing, and visualizing the data.

Elasticsearch is the core component. It works as a highly scalable search + analytics engine. Although it excels in data indexing and storage, it lacks the capability (on its own) to import the data to be indexed.

Logstash is integral for the data ingestion and processing into Elastic. Logstash acts as a dynamic data processing pipeline, which can transform and transport data from various sources before it reaches Elasticsearch. 

Kibana, the final piece of the stack, allows managers, CIOs, and CISOs to visualize the data indexed by Elastic. It features interactive dashboards that allow users to intuitively explore and interpret complex datasets, while facilitating a clear understanding of data distribution and status within an organization’s infrastructure.

Combined with Private AI, the ELK Stack offers extensive capabilities for real-time security monitoring and it can be leveraged to help streamline compliance by centralizing logging and auditing functions. Together, we can assist in threat detection and incident response, ensuring that all aspects of organizational technology infrastructure are both secure and efficient.

Adequate DLP Does Not Need to Be Overly Complicated

As a CIO or CISO, you need a DLP (data loss prevention) plan for your organization. If an ounce of prevention is worth a pound of cure, then you understand the value of a DLP plan. In order to provide PII (Personally Identifiable Information) detection that’s flexible for various corporate infrastructures, we give you the option to deploy our solution as a Docker container or use us in the cloud – whichever is your preference. 

We believe that an adequate DLP plan does not need to be overly complicated. Therefore, your locally deployed Private AI container can be configured to send reporting metrics to a Logstash server simply by adding an environment variable to your container runtime  as shown below.

				
					docker run --rm -p 8080:8080 --mount type=bind,src=$PWD/tests/fixtures/licenses/license.json,dst=/app/license/license.json -e PAI_ENABLE_REPORTING=true -e LOGSTASH_HOST=http://hostname.org -e LOGSTASH_PORT=50000 -e PAI_REPORT_ENTITY_COUNTS=true -it deid:image-name

				
			

As a result, this makes it very simple for your DevOps and server team to write the scripts necessary to configure your instance of Private AI to work seamlessly with the ELK Stack.

For more information, see our ELK Reporting Integration Guide on Elastic.

Example Scenarios Where You Need PII Detection/Redaction with Powerful Indexing 

Let’s delve into some real-world examples of how IT and security managers can use Private AI integrated with the ELK Stack for common DLP-related situations.

Scenario #1 – Company A: Full Credit Card Data is Stored Within a Spreadsheet

Company A is a tech startup specializing in software development for the insurance industry. As a part of doing business, the sales team travels frequently to their customers’ offices and events to demonstrate the new capabilities of their software tools. In order to make expense reports easy to manage, each sales team member is provided with a company credit card in the company’s name. 

Company A has a DLP plan to conduct monthly internal security audits. The IT manager uses a custom script to scan computers on the corporate infrastructure to ensure that all employees are using best practices for cybersecurity. The script simply takes a batch of files and submits them to their locally installed Private AI instance running in a Docker container. Figure 1, below shows the flow of how the integration is accomplished.

Figure 1. Private AI Data Discovery with ELK

The IT manager and the CIO are now able to login to Elastic and view a Kibana dashboard that shows all the PII that has been found by the on-prem Private AI container, as shown in Figure 2 below.

Figure 2. An Elastic Stack Kibana Dashboard After Company Data Has Been Indexed by Elasticsearch

As you can see above in the cards in the Kibana dashboard, the on-prem Private AI container scanned nearly 12k files and found over 25M occurrences of identifiable PII. 

Due to the fact that the Private AI container is installed on-prem, company files can be scanned safely and securely within the company’s corporate infrastructure, which provides the following benefits:

  • – Company files with intellectual property are not scanned outside the corporate network infrastructure, which eliminates the possibility of a data breach due to the scanning process itself
  • – Scanning time (and CPU usage) are greatly reduced by using the Private AI container installed on-prem compared to cloud-based solutions
  • – External bandwidth costs are eliminated due to the fact that all scanning is performed within the corporate infrastructure

Root Cause Analysis

In the example above, within the nearly 12k files scanned, the IT manager has found that sales members are storing their full credit card information in unencrypted spreadsheets in order to make the process easy to fill out credit card information when booking travel online.

Potential Next Steps

The IT manager, CIO, and CISO have quantifiable information about the extent of PII stored within the computers on the corporate infrastructure. Potential next steps include:

  • – Providing the proper training and tools to company employees to allow for secure/encrypted data storage. Many password managers allow for the secure storage of credit card information with copy/paste features for online sites
  • – Providing training to company employees on the company policies for secure data storage
  • – Performing subsequent monthly audits to ensure that the number of credit cards (or other PII) are no longer stored in an insecure manner

Scenario #2 – Company B: Sensitive Information Being Shared in Customer Support Chat Windows is Not Being Redacted

Company B is in the financial services industry, and they have recently implemented an AI-enabled chat bot on their consumer-facing website as a cost savings measure to provide Level-1 tech support for their customers. 

It’s a very common occurrence for banking customers to feel safe to share their name, financial information, and/or social security number with their financial institution because the company has already established a level of trust with the customer with these types of information. 

Similar to Company A, Company B has a DLP plan to conduct monthly internal security audits. In this case, the CISO uses a custom script to send all log files from public facing websites to the Private AI container in their corporate infrastructure. The integration of the Private AI container and the ELK stack is the same as in Figure 1 located above. Additionally, any record of PII being stored within server logs can be visualized in the Kibana dashboard, very similar to the Kibana dashboard shown in Figure 2 above.

Root Cause Analysis

The CISO has found that customers are entering both PII and PCI (Payment Card Industry data) in the chatbot window. This information is stored in plain-text format in the log files for the chatbot and is readable by anyone with access (authorized or unauthorized) to the files.

Potential Next Steps

In this scenario, armed with this information, the CISO has taken the proactive step to enhance their custom service to use Private AI for Smart Redaction, as shown in Figure 3 below.

Figure 3. Using Private AI for Smart Redaction to Process Sensitive Documents

The custom service now has two major functions:

  • – All log files from public websites are fed into Logstash for analysis of any sensitive information (same as before)
  • – Before any chat messages with a customer are stored to disk, it is processed with the local Private AI container to redact any sensitive information. The redacted chat messages are then sent to Elastic so that the conversation can be searched by the tech support team.

Conclusion

By combining ELK Stack’s robust tools for data management, analytics, and visualization with Private AI’s expertise in privacy-preserving techniques, organizations can unlock actionable insights from their data while safeguarding sensitive information and ensuring compliance with regulatory requirements. This guide provided a few practical case examples to show how easily PII and PCI can be found anywhere in your corporate infrastructure. 

Using the ELK Stack with Private AI, CIOs and CISOs have the tools to create a robust DLP plan and perform root-cause analysis to strengthen the integrity of the corporate infrastructure. 

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore

Download the Free Report

Request an API Key

Fill out the form below and we’ll send you a free API key for 500 calls (approx. 50k words). No commitment, no credit card required!

Language Packs

Expand the categories below to see which languages are included within each language pack.
Note: English capabilities are automatically included within the Enterprise pricing tier. 

French
Spanish
Portuguese

Arabic
Hebrew
Persian (Farsi)
Swahili

French
German
Italian
Portuguese
Russian
Spanish
Ukrainian
Belarusian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hungarian
Icelandic
Latvian
Lithuanian
Luxembourgish
Polish
Romanian
Slovak
Slovenian
Swedish
Turkish

Hindi
Korean
Tagalog
Bengali
Burmese
Indonesian
Khmer
Japanese
Malay
Moldovan
Norwegian (Bokmål)
Punjabi
Tamil
Thai
Vietnamese
Mandarin (simplified)

Arabic
Belarusian
Bengali
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Khmer
Korean
Latvian
Lithuanian
Luxembourgish
Malay
Mandarin (simplified)
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Thai
Turkish
Ukrainian
Vietnamese

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.