Risks of Noncompliance and Challenges around Privacy-Preserving Techniques (Part 2)

Safeguarding Health Data

Share This Post

Risks of Noncompliance and Challenges around Privacy-Preserving Techniques

Despite the promising signs of AI used in healthcare that we explored in Part 1 of this series, ethical concerns persist regarding the potential misuse of these innovations and safeguarding health data. For instance, drug discovery AI systems have demonstrated remarkable efficiency aiding harmful discoveries as well, identifying 40,000 potentially lethal molecules and the most potent nerve agents in just six hours. 

The first part of our blog series on safeguarding health data used for machine-learning set the stage by looking at the different sources of health data and the risks associated with noncompliance with important privacy regulations. In this second part, we cover the risks of noncompliance with data protection laws and regulations, potential misuses of health data facilitated by AI, as well as the challenges around privacy-preserving techniques. Part 3 dives into various attacks launched against AI models and the data used for their training.

Risks associated with Noncompliance

In order for AI to be accurate and helpful, the current wisdom is that vast amounts of data are required to train the model on relevant examples from which it can learn. Data protection laws such as the General Data Protection Regulation (GDPR) in the EU and the Health Insurance Portability and Accountability Act (HIPAA) in the US, to name two popular examples, erect notable hurdles for AI developers to collect and utilize the required data for training purposes insofar as they contain personally identifiable data (PII) or protected health information (PHI). 

Noncompliance could lead to significant fines for businesses. For example, while Google DeepMind recently got a class action lawsuit dismissed where the UK data protection authority had found that DeepMind’s partner lacked a legal basis for providing it with patient data to develop an app for detecting acute kidney disease, the lawsuit was dismissed not because the data protection concern was invalid, but because a class action lawsuit requirement was not fulfilled.  

Such hurdles do not exist everywhere though. The EU’s Collective Redress Directive came into force in 2023 and facilitates class action lawsuits such as the one against Google. 

In addition to noncompliance costs to businesses, the World Health Organization (WHO) explains in its AI Ethics and Governance Guidance for Large Multi-Modal Models operating in the Health Sector that irresponsible data handling practices could lead to eroding trust in the healthcare system with most undesirable consequences for the industry as well as individuals. Many more risks are set out in this Guidance. 

Potential misuses of health data facilitated by AI

There are many risks surrounding the use of AI in healthcare, as the WHO set out comprehensively in its Guidance. We provide a brief list of the key risks here:

  • Unequal access to healthcare, insurance, employment, social services, funds, etc. as a result of biases contained in AI systems
  • System-wide biases can become more prevalent through the broad reliance on AI systems
  • Inaccuracy of health-related diagnosis and treatment advice
  • Cybersecurity risks could undermine trust and broadly impact the healthcare sector by rendering key systems unavailable
  • Overestimation and overreliance on AI leading to skill degradation 
  • Privacy concerns

When looking at the risks that arise from the use of AI in healthcare, it is important to assess them in comparison to the risks that would exist even in the absence of this new technology. Otherwise a distorted picture would emerge exaggerating the negative impact of AI.

With regard to privacy, many of the concerns are not new. However, AI may exacerbate existing risks related to safeguarding health data privacy and security. The integration of AI systems into healthcare workflows increases the volume, velocity, and variety of data processed, creating new opportunities for unauthorized access, data breaches, and privacy violations such as disregard of consent and transparency requirements and data subject access and deletion rights. Without adequate safeguards, such as robust encryption, access controls, and data governance frameworks, AI can amplify the risks of data misuse, potentially compromising patient privacy and confidentiality.

Challenges around Privacy-Preserving Techniques

To combat the privacy risks we encountered in the previous section, various techniques are being developed. However, the deployment of privacy-preserving machine learning (PPML) techniques, though crucial, encounters several challenges. This section addresses adaptability, scalability, transparency, accuracy and AI ethics trade-offs, data integrity, and a possible trade-off between privacy and utility in the context of AI development.

Adaptability: One Size Doesn’t Fit All

One major hurdle is that privacy-preserving methods are often designed with specific AI algorithms in mind, making them hard to apply universally. As AI technology grows and new algorithms emerge, there’s a continuous need to develop privacy approaches that can keep up. While techniques like local differential privacy have shown promise, their application is limited by this need for customization.

Balancing Efficiency and Privacy

As AI models handle larger data sets, the computational demand of privacy techniques can become a bottleneck. Methods like homomorphic encryption, which offer strong data protection, also require significant processing power. Finding scalable solutions that ensure privacy without compromising processing efficiency is a critical challenge in AI applications.

The Accuracy-Ethics Trade-off

Ethical considerations in AI, such as fairness and transparency, often conflict with the goal of achieving the highest accuracy.

AI algorithms learn from historical data. If this data contains biases, the AI models will likely replicate or even amplify these biases. Striving for high accuracy without addressing these biases means the models may perform well according to their training data but do so unfairly by discriminating against certain groups. Ethical considerations require actively correcting for these biases, which might reduce the model’s performance on biased historical data but increase fairness. Recent research has shown that de-biased data can also increase accuracy.

Highly accurate AI models, especially those using complex techniques like deep learning, can be difficult to interpret. While these models might achieve high accuracy, their “black box” nature makes it hard to understand how decisions are made, conflicting with the ethical requirement for transparency. In contrast, simpler models might offer less accuracy but are more transparent and understandable, making it easier to ensure they’re making decisions for the right reasons.

In fields like healthcare, where patient data is sensitive, this trade-off is particularly pronounced. Ensuring AI decisions are both ethically sound and effective requires careful navigation of these ethical dilemmas.

Guarding Against Tampering

The reliability of decisions made by AI systems is fundamentally linked to the integrity of their data. It’s vital to shield data against unauthorized alterations or poisoning to ensure that the information remains true to its original form. This challenge underscores the importance of implementing rigorous protections that prevent any unauthorized changes to data, thereby preserving its accuracy and trustworthiness for decision-making processes.

Privacy vs. Utility

A key challenge in privacy-preserving AI is balancing the need to protect user data with the desire to maintain the usefulness of that data. Developing strategies that minimize data exposure while preserving the value of the information is crucial for the ethical use of AI. For example, studies on human mobility patterns using location data from smartphones or GPS devices require balancing privacy with the granularity of data needed for accurate analysis. Aggregating data to protect individual locations reduces the risk of re-identification but can also smooth out important patterns or anomalies in the mobility data, impacting the study’s outcomes.

Conclusion

As we conclude this part of our series on safeguarding health data in AI, it’s evident that the intersection of AI’s potential with privacy concerns presents complex challenges, from a regulatory compliance perspective and as a result of the technology itself. The exploration into these challenges underlines the importance of transparency, robust ethical practices, and the development of scalable, effective privacy measures. Moving forward, addressing these issues will be critical in nurturing trust and maximizing the benefits of AI in healthcare. Our journey into understanding the full scope of these concerns and the solutions to them continues in Part 3.

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore

Download the Free Report

Request an API Key

Fill out the form below and we’ll send you a free API key for 500 calls (approx. 50k words). No commitment, no credit card required!

Language Packs

Expand the categories below to see which languages are included within each language pack.
Note: English capabilities are automatically included within the Enterprise pricing tier. 

French
Spanish
Portuguese

Arabic
Hebrew
Persian (Farsi)
Swahili

French
German
Italian
Portuguese
Russian
Spanish
Ukrainian
Belarusian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hungarian
Icelandic
Latvian
Lithuanian
Luxembourgish
Polish
Romanian
Slovak
Slovenian
Swedish
Turkish

Hindi
Korean
Tagalog
Bengali
Burmese
Indonesian
Khmer
Japanese
Malay
Moldovan
Norwegian (Bokmål)
Punjabi
Tamil
Thai
Vietnamese
Mandarin (simplified)

Arabic
Belarusian
Bengali
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Khmer
Korean
Latvian
Lithuanian
Luxembourgish
Malay
Mandarin (simplified)
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Thai
Turkish
Ukrainian
Vietnamese

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.