Adding Privacy to LangChain with Private AI

Share This Post

LangChain is a powerful tool that allows you to setup a conversation with your favourite LLMs with ease. If you’re worried about what that LLM is doing with your information, Private AI makes it easy to integrate a privacy layer into your conversation, keeping your sensitive information safe.

Getting Started

If you don’t already have access to the Private AI de-identification service, see our guide on getting setup with AWS or request a free api key.

We’ll be coding this solution with Python. If you don’t have a python environment setup, see the official Python for Beginners guide to get setup quick and easy.

Once your environment is setup, LangChain can be installed with pip:

				
					pip install LangChain
				
			

Now the coding can begin. First, a LangChain agent should be added to manage the conversation. This includes creating the object related to the LLM we want to chat with and a memory buffer to store the conversation’s history.

				
					import os
from langchain.agents import AgentType, initialize_agent
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationBufferMemory



OPENAI_API_KEY = os.environ['OPENAI_API_KEY']

def main():
  # create our llm object
  llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, temperature=0)
  # add a memory buffer
  history = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
  # create a new agent to chat with
  agent_chain = initialize_agent([], llm, agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION, memory=history, verbose=True)
  while True: 
    message = input("Add message to send: ")
    if message == "quit":
      break
    agent_chain.run(input=message)
    
  
if __name__ == "__main__":
    main()
				
			
				
					#Sample 
Add message to send: hey!


> Entering new AgentExecutor chain...
Thought: Do I need to use a tool? No
AI: Hello! How can I assist you today?

> Finished chain.
				
			

Now the base conversation flow is created, it’s time to add a privacy layer!

Adding De-identification

NOTE: Make sure to grab Private AI’s client for seamless integration:

				
					pip install privateai_client

				
			

Let’s add a Private AI client to the flow and make sure it’s accessible for future de-identification requests.

				
					...
from privateai_client import PAIClient
from privateai_client import request_objects as rq

def main():
    
    # Initialize the private-ai client
    pai_client = PAIClient(url="http://localhost:8080")
    # Test the client is responsive
    pai_client.ping()
....
				
			

Output

				
					True
				
			

Once a connection to the client is established, text prompts can be de-identified before they are sent to the LLM. The conversation needs an update to add this new feature.

				
					def chat_privately(message: str, agent_chain: AgentExecutor, pai_client: PAIClient):
    deid_request = rq.process_text_obj(text=[message])
    deid_response = pai_client.process_text(deid_request)
    deid_message = deid_response.processed_text[0]
    # Send the message
    agent_chain.run(input=deid_message)

def main():
...
    while True:
        message = input("Add message to send: ")
        if message == "quit":
            break
        chat_privately(message, agent_chain, pai_client, history)
				
			

Now the LLM is working with anonymized data!

Output

				
					Add message to send: Hey! My name is Adam and I work at Private AI.


> Entering new AgentExecutor chain...
Thought: Do I need to use a tool? No

AI: Hi [NAME_GIVEN_1], it's nice to meet you! What kind of work do you do at [ORGANIZATION_1]?

> Finished chain.
				
			

While this conversation flow works, it isn’t really great for human interpretation. Keeping tabs on who NAME_1 is as the conversation progresses can become tedious.

To solve this, re-identification needs to be added to the conversation flow. By re-identifying the conversation history, the whole conversation (including the new prompt) can be run through de-identification as a single request.

Re-identification needs some extra information to work properly:

  • – The conversation history
  • – A collection of the entities de-identified so the service can replace them to their original state 

A couple extra functions need to be added to handle this.

				
					def get_conversation_history(history: ConversationBufferMemory):
    return [row.content for row in history.chat_memory.messages]
    
def get_entities(processed_entities: List[Dict[str, str]]):
    entities = {}
    for entity in processed_entities:
        for row in entity:
            entities[row["processed_text"]] = row["text"]
    return entities

				
			

Now re-identification can be added! Let’s add a function to handle the request. The de-identification request should move to its own function as well, for readability, and the chat function needs updating so the entity list and conversation history can be managed.

				
					def deidentify_conversation(pai_client: PAIClient, message: str, convo_history: List[str]=[]):
    # Add the current message to the conversation history
    full_conversation = convo_history + [message]
    # Create a process_text_request object
    req = rq.process_text_obj(text=full_conversation, link_batch=True)
    # Return the response
    return pai_client.process_text(req)

def reidentify_conversation(pai_client: PAIClient, convo_history: List[str], entities: Dict[str, str]):
    if not convo_history:
        return []
    # Create a reidentify_text object
    formatted_entities = [rq.entity(key, value) for key, value in entities.items()]
    req = rq.reidentify_text_obj(convo_history, formatted_entities)
    return pai_client.reidentify_text(req).body
    
def chat_privately(message: str, agent_chain: AgentExecutor, pai_client: PAIClient, history: ConversationBufferMemory, entities:Dict[str,str]):
    convo_history = get_conversation_history(deid_history)
    original_convo = reidentify_conversation(pai_client, convo_history, entities)
    response = deidentify_conversation(pai_client, message, original_convo)
    # Grab only the latest message from the deidentified text
    deid_message = response.processed_text[-1]
    agent_chain.run(input=deid_message)
    return get_entities(response.entities)
				
			

Putting It All Together

Now we have a conversation flow that will de-identify any sensitive information, allowing us to safely pass the conversation history to the LLM! Let’s tie it all together with a nicer output so we can see what the LLM sees, as well as our re-identified version of the conversation:

				
					import os
from typing import Dict, List

from langchain.agents import AgentExecutor, AgentType, initialize_agent
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from privateai_client import PAIClient
from privateai_client import request_objects as rq

OPENAI_API_KEY = os.environ['OPENAI_API_KEY']


def get_conversation_history(history: ConversationBufferMemory):
    return [row.content for row in history.chat_memory.messages]
    
def get_entities(processed_entities: List[Dict[str, str]]):
    entities = {}
    for entity in processed_entities:
        for row in entity:
            entities[row["processed_text"]] = row["text"]
    return entities

def deidentify_conversation(pai_client: PAIClient, message: str, convo_history: List[str]=[]):
    # Add the current message to the conversation history
    full_conversation = convo_history + [message]
    # Create a process_text_request object
    req = rq.process_text_obj(text=full_conversation, link_batch=True)
    # Return the response
    return pai_client.process_text(req)

def reidentify_conversation(pai_client: PAIClient, convo_history: List[str], entities: Dict[str, str]):
    if not convo_history:
        return []
    # Create a reidentify_text object
    formatted_entities = [rq.entity(key, value) for key, value in entities.items()]
    req = rq.reidentify_text_obj(convo_history, formatted_entities)
    return pai_client.reidentify_text(req).body
    
def chat_privately(message: str, agent_chain: AgentExecutor, pai_client: PAIClient, history: ConversationBufferMemory, entities:Dict[str,str]):
    convo_history = get_conversation_history(deid_history)
    original_convo = reidentify_conversation(pai_client, convo_history, entities)
    response = deidentify_conversation(pai_client, message, original_convo)
    # Grab only the latest message from the deidentified text
    deid_message = response.processed_text[-1]
    agent_chain.run(input=deid_message)
    return get_entities(response.entities)

def main():
    # Initialize the private-ai client
    pai_client = PAIClient(url="http://localhost:8080")
    # Test the client is responsive
    pai_client.ping()

    # Create a memory buffer for the conversation history
    deid_history = ConversationBufferMemory(
        memory_key="chat_history", return_messages=True
    )
    # Add the llm to converse with
    llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, temperature=0)
    # Setup a conversation chain
    agent_chain = initialize_agent([], llm, agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION, memory=deid_history)

    entity_list = {}
    while True:
        # Now start chating away with your llm!
        message = input("Add message to send: ")
        if message == "quit":
            break
        entity_list = chat_privately(
            message, agent_chain, pai_client, deid_history, entity_list
        )
        # Get the llm's response
        history = get_conversation_history(deid_history)
        print("--Actual conversation --")
        print(f"Human: {history[-2]}")
        print(f"AI: {history[-1]}\n")
        # Print it in a readable format
        print("--Readable conversation --")
        print(f"Human: {message}")
        print(
            f"AI: {reidentify_conversation(pai_client, [history[-1]], entity_list)[0]}\n"
        )

if __name__ == "__main__":
    main()
				
			

Output:

				
					Add message to send: Hey! My name is Adam and I work at Private AI.
--Actual conversation --
Human: Hey! My name is [NAME_GIVEN_1] and I work at [ORGANIZATION_1].
AI: Hi [NAME_GIVEN_1], it's nice to meet you! What kind of work do you do at [ORGANIZATION_1]?

--Readable conversation --
Human: Hey! My name is Adam and I work at Private AI.
AI: Hi Adam, it's nice to meet you! What kind of work do you do at Private AI?

Add message to send: Can you tell me where I work?
--Actual conversation --
Human: Can you tell me where I work?
AI: Of course! You work at [ORGANIZATION_1].

--Readable conversation --
Human: Can you tell me where I work?
AI: Of course! You work at Private AI.

Add message to send: And my name?
--Actual conversation --
Human: And my name?
AI: Your name is [NAME_GIVEN_1].

--Readable conversation --
Human: And my name?
AI: Your name is Adam.
				
			

And that’s it! We can rest easy that our sensitive information is hidden, thanks to Private AI. 

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore

Blog

End-to-end Privacy Management

End-to-end privacy management refers to the process of protecting sensitive data throughout its entire lifecycle, from the moment it is collected to the point where

Read More »

Download the Free Report

Request an API Key

Fill out the form below and we’ll send you a free API key for 500 calls (approx. 50k words). No commitment, no credit card required!

Language Packs

Expand the categories below to see which languages are included within each language pack.
Note: English capabilities are automatically included within the Enterprise pricing tier. 

French
Spanish
Portuguese

Arabic
Hebrew
Persian (Farsi)
Swahili

French
German
Italian
Portuguese
Russian
Spanish
Ukrainian
Belarusian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hungarian
Icelandic
Latvian
Lithuanian
Luxembourgish
Polish
Romanian
Slovak
Slovenian
Swedish
Turkish

Hindi
Korean
Tagalog
Bengali
Burmese
Indonesian
Khmer
Japanese
Malay
Moldovan
Norwegian (Bokmål)
Punjabi
Tamil
Thai
Vietnamese
Mandarin (simplified)

Arabic
Belarusian
Bengali
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Khmer
Korean
Latvian
Lithuanian
Luxembourgish
Malay
Mandarin (simplified)
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Thai
Turkish
Ukrainian
Vietnamese

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.