Adding Privacy to LangChain with Private AI

Oct 19, 2023

Share this post

LangChain is a powerful tool that allows you to setup a conversation with your favourite LLMs with ease. If you’re worried about what that LLM is doing with your information, Private AI makes it easy to integrate a privacy layer into your conversation, keeping your sensitive information safe.

Getting Started

If you don’t already have access to the Private AI de-identification service, see our guide on getting setup with AWS or request a free api key.

We’ll be coding this solution with Python. If you don’t have a python environment setup, see the official Python for Beginners guide to get setup quick and easy.

Once your environment is setup, LangChain can be installed with pip:

pip install LangChain

Now the coding can begin. First, a LangChain agent should be added to manage the conversation. This includes creating the object related to the LLM we want to chat with and a memory buffer to store the conversation’s history.

import os from langchain.agents import AgentType, initialize_agent from langchain.chat_models import ChatOpenAI from langchain.memory import ConversationBufferMemory OPENAI_API_KEY = os.environ['OPENAI_API_KEY'] def main(): # create our llm object llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, temperature=0) # add a memory buffer history = ConversationBufferMemory(memory_key="chat_history", return_messages=True) # create a new agent to chat with agent_chain = initialize_agent([], llm, agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION, memory=history, verbose=True) while True: message = input("Add message to send: ") if message == "quit": break agent_chain.run(input=message) if __name__ == "__main__": main()

#Sample Add message to send: hey! > Entering new AgentExecutor chain... Thought: Do I need to use a tool? No AI: Hello! How can I assist you today? > Finished chain.

Now the base conversation flow is created, it’s time to add a privacy layer!

Adding De-identification

NOTE: Make sure to grab Private AI’s client for seamless integration:

pip install privateai_client 

Let’s add a Private AI client to the flow and make sure it’s accessible for future de-identification requests.

... from privateai_client import PAIClient from privateai_client import request_objects as rq def main(): # Initialize the private-ai client pai_client = PAIClient(url="http://localhost:8080") # Test the client is responsive pai_client.ping() ....

Output

True

Once a connection to the client is established, text prompts can be de-identified before they are sent to the LLM. The conversation needs an update to add this new feature.

def chat_privately(message: str, agent_chain: AgentExecutor, pai_client: PAIClient): deid_request = rq.process_text_obj(text=[message]) deid_response = pai_client.process_text(deid_request) deid_message = deid_response.processed_text[0] # Send the message agent_chain.run(input=deid_message) def main(): ... while True: message = input("Add message to send: ") if message == "quit": break chat_privately(message, agent_chain, pai_client, history)

Now the LLM is working with anonymized data!

Output

Add message to send: Hey! My name is Adam and I work at Private AI. > Entering new AgentExecutor chain... Thought: Do I need to use a tool? No AI: Hi [NAME_GIVEN_1], it's nice to meet you! What kind of work do you do at [ORGANIZATION_1]? > Finished chain.

While this conversation flow works, it isn’t really great for human interpretation. Keeping tabs on who NAME_1 is as the conversation progresses can become tedious.

To solve this, re-identification needs to be added to the conversation flow. By re-identifying the conversation history, the whole conversation (including the new prompt) can be run through de-identification as a single request.

Re-identification needs some extra information to work properly:

- The conversation history
- A collection of the entities de-identified so the service can replace them to their original state

A couple extra functions need to be added to handle this.

def get_conversation_history(history: ConversationBufferMemory): return [row.content for row in history.chat_memory.messages] def get_entities(processed_entities: List[Dict[str, str]]): entities = {} for entity in processed_entities: for row in entity: entities[row["processed_text"]] = row["text"] return entities 

Now re-identification can be added! Let’s add a function to handle the request. The de-identification request should move to its own function as well, for readability, and the chat function needs updating so the entity list and conversation history can be managed.

def deidentify_conversation(pai_client: PAIClient, message: str, convo_history: List[str]=[]): # Add the current message to the conversation history full_conversation = convo_history + [message] # Create a process_text_request object req = rq.process_text_obj(text=full_conversation, link_batch=True) # Return the response return pai_client.process_text(req) def reidentify_conversation(pai_client: PAIClient, convo_history: List[str], entities: Dict[str, str]): if not convo_history: return [] # Create a reidentify_text object formatted_entities = [rq.entity(key, value) for key, value in entities.items()] req = rq.reidentify_text_obj(convo_history, formatted_entities) return pai_client.reidentify_text(req).body def chat_privately(message: str, agent_chain: AgentExecutor, pai_client: PAIClient, history: ConversationBufferMemory, entities:Dict[str,str]): convo_history = get_conversation_history(deid_history) original_convo = reidentify_conversation(pai_client, convo_history, entities) response = deidentify_conversation(pai_client, message, original_convo) # Grab only the latest message from the deidentified text deid_message = response.processed_text[-1] agent_chain.run(input=deid_message) return get_entities(response.entities)

Putting It All Together

Now we have a conversation flow that will de-identify any sensitive information, allowing us to safely pass the conversation history to the LLM! Let’s tie it all together with a nicer output so we can see what the LLM sees, as well as our re-identified version of the conversation:

import os from typing import Dict, List from langchain.agents import AgentExecutor, AgentType, initialize_agent from langchain.chat_models import ChatOpenAI from langchain.memory import ConversationBufferMemory from privateai_client import PAIClient from privateai_client import request_objects as rq OPENAI_API_KEY = os.environ['OPENAI_API_KEY'] def get_conversation_history(history: ConversationBufferMemory): return [row.content for row in history.chat_memory.messages] def get_entities(processed_entities: List[Dict[str, str]]): entities = {} for entity in processed_entities: for row in entity: entities[row["processed_text"]] = row["text"] return entities def deidentify_conversation(pai_client: PAIClient, message: str, convo_history: List[str]=[]): # Add the current message to the conversation history full_conversation = convo_history + [message] # Create a process_text_request object req = rq.process_text_obj(text=full_conversation, link_batch=True) # Return the response return pai_client.process_text(req) def reidentify_conversation(pai_client: PAIClient, convo_history: List[str], entities: Dict[str, str]): if not convo_history: return [] # Create a reidentify_text object formatted_entities = [rq.entity(key, value) for key, value in entities.items()] req = rq.reidentify_text_obj(convo_history, formatted_entities) return pai_client.reidentify_text(req).body def chat_privately(message: str, agent_chain: AgentExecutor, pai_client: PAIClient, history: ConversationBufferMemory, entities:Dict[str,str]): convo_history = get_conversation_history(deid_history) original_convo = reidentify_conversation(pai_client, convo_history, entities) response = deidentify_conversation(pai_client, message, original_convo) # Grab only the latest message from the deidentified text deid_message = response.processed_text[-1] agent_chain.run(input=deid_message) return get_entities(response.entities) def main(): # Initialize the private-ai client pai_client = PAIClient(url="http://localhost:8080") # Test the client is responsive pai_client.ping() # Create a memory buffer for the conversation history deid_history = ConversationBufferMemory( memory_key="chat_history", return_messages=True ) # Add the llm to converse with llm = ChatOpenAI(openai_api_key=OPENAI_API_KEY, temperature=0) # Setup a conversation chain agent_chain = initialize_agent([], llm, agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION, memory=deid_history) entity_list = {} while True: # Now start chating away with your llm! message = input("Add message to send: ") if message == "quit": break entity_list = chat_privately( message, agent_chain, pai_client, deid_history, entity_list ) # Get the llm's response history = get_conversation_history(deid_history) print("--Actual conversation --") print(f"Human: {history[-2]}") print(f"AI: {history[-1]}n") # Print it in a readable format print("--Readable conversation --") print(f"Human: {message}") print( f"AI: {reidentify_conversation(pai_client, [history[-1]], entity_list)[0]}n" ) if __name__ == "__main__": main()

Output:

Add message to send: Hey! My name is Adam and I work at Private AI. --Actual conversation -- Human: Hey! My name is [NAME_GIVEN_1] and I work at [ORGANIZATION_1]. AI: Hi [NAME_GIVEN_1], it's nice to meet you! What kind of work do you do at [ORGANIZATION_1]? --Readable conversation -- Human: Hey! My name is Adam and I work at Private AI. AI: Hi Adam, it's nice to meet you! What kind of work do you do at Private AI? Add message to send: Can you tell me where I work? --Actual conversation -- Human: Can you tell me where I work? AI: Of course! You work at [ORGANIZATION_1]. --Readable conversation -- Human: Can you tell me where I work? AI: Of course! You work at Private AI. Add message to send: And my name? --Actual conversation -- Human: And my name? AI: Your name is [NAME_GIVEN_1]. --Readable conversation -- Human: And my name? AI: Your name is Adam.

And that’s it! We can rest easy that our sensitive information is hidden, thanks to Private AI.