Japan’s Health Data Anonymization Act: Enabling Large-Scale Health Research

Share This Post

Anonymized and pseudonymized medical data are at the heart of cutting-edge research and innovation in healthcare. By stripping away personal identifiers and adding additional privacy-preserving measures, these data allow for advanced studies without compromising the privacy of individuals.

In Japan, as elsewhere, the path to leveraging this valuable resource has been complex due to the need to balance large-scale data use with privacy protection. Thus, under the Act on the Protection of Personal Information (APPI) healthcare providers have faced challenges in sharing and processing medical data for research and innovation purposes, primarily due to strict consent requirements.

This article explores how Japan’s Next-Generation Medical Infrastructure Law addresses these challenges and how Private AI can help organizations navigate compliance while enabling secure health data use for research.

Sharing Health Data under APPI

Under APPI, sharing health data with third parties generally requires explicit opt-in consent from each individual patient, although an exception exists for research purposes. Commercial development of healthcare products and services, however, does not fall under this exception. Building comprehensive medical databases, linking patient data across different institutions, and conducting broad-based medical research have been hampered by the need to obtain individual consent from every patient involved. 

One way out of having to comply with the onerous consent requirements is to anonymise the data. However, this can be a complex endeavour as well, especially when the data are supposed to be linked across different institutions to build large-scale databases to be accessed by medical researchers and innovators. Such data linkage may increase the re-identification risk for individuals. It is also challenging to meaningfully combine data once it already has been anonymized because it cannot be determined whether certain health information pertains to the same individual. But prior to anonymization, the sharing of the data is restricted.

Next-Generation Medical Infrastructure Law

To address these challenges while maintaining robust privacy protections, Japan enacted the Act on Anonymized Medical Data That Contributes to Research and Development in the Medical Field, commonly known as “Next-Generation Medical Infrastructure Law” (“NGMIL”). This legislation creates a novel framework that balances the imperatives of medical research with individual privacy rights.

At the heart of the Medical Data Act is the creation of “Authorized De-identified Medical Information Preparers” – certified entities that serve as trusted intermediaries in the medical data ecosystem. These certified entities play a crucial role in transforming sensitive medical information into valuable research assets. They receive identifiable medical data from healthcare providers, standardize data formats across institutions, link patient records where appropriate, and ultimately create anonymized datasets that can be used for research purposes.

New Consent Mechanism

A significant innovation of the Act lies in its consent mechanism. While APPI typically requires opt-in consent for sharing health data, the NGMIL establishes a carefully structured opt-out framework. Healthcare providers can share identifiable medical data with certified entities after notifying patients and giving them the opportunity to opt out. This seemingly subtle shift from opt-in to opt-out consent has profound implications for medical research, making it feasible to build the large-scale datasets necessary for meaningful healthcare innovation.

Regulatory Improvements and Real-World Application

The recent amendment to the NGMIL passed in May 2023 and took effect in 2024, introduces a new category of data known as pseudonymized medical information. Unlike fully anonymized data, which is irreversibly processed to prevent identification, pseudonymized medical data allows for the possibility of re-identification if matched with other information. This change addresses a challenge in the medical and pharmaceutical industries, where anonymized data’s strict limitations have hindered its utility in regulatory submissions. Under the revised framework, certified users of pseudonymized medical data can now submit such information to the Pharmaceuticals and Medical Devices Agency (PMDA) when seeking regulatory approval, without the need to remove outliers or rare disease identifiers. Additionally, the PMDA can request access to the original data from certified providers, enhancing the reliability and applicability of medical data in research and drug development.

Regarding its real-world success, the NGMIL has made strides in enabling the use of anonymized medical data for research while balancing privacy concerns, but progress has been slow overall. Although certified producers and enterprises now facilitate data collection and anonymization, the number of participating medical institutions remains insufficient, limiting the richness of available datasets. The recent introduction of pseudonymized medical information aims to address some of the previous law’s shortcomings but since both the preparation and the handling of pseudonymized medical information requires certification, a barrier to entry may persist. 

In addition, Japan has multiple medical information databases, but data from medical insurers and government-held medical records remain separate. While the universal health insurance system allows for nationwide data collection, insurers maintain independent databases, making it difficult for researchers to access comprehensive information. A system to integrate and link these disparate data sources would enhance research and data analysis.

While more than 20 studies have been initiated using NGMIL-authorized datasets, high costs and limited funding for academic research hinder broader adoption. Further cooperation among medical institutions, regulatory bodies, and industry stakeholders is essential for the law to fully achieve its goal of advancing medical R&D while safeguarding patient privacy.

Reduced Compliance Obligations for Anonymized Data

Aside from its benefits for medical research and innovation, processing anonymized data has further advantages from a compliance perspective. It is important to note that while neither the APPI nor the NGMIL explicitly say as much, the definition personal information and anonymized information seem to imply that anonymized information does not fall under the definition of personal information. This would generally have the consequence that all provisions that pertain to personal information do not apply to anonymized information. Nevertheless, the NGMIL specifically lists only a select few of the APPI provisions that do not apply to anonymized information, namely the following: 

  1. Data subject requests
    Businesses handling anonymized medical data are exempt from Article 37 APPI, meaning they are not required to process disclosure or other handling requests from individuals regarding such data. Specifically, they do not have to establish procedures for individuals to request access, correction, deletion, or cessation of third-party provision of anonymized medical data, nor do they have to facilitate these requests through specific methods, ensure ease of submission, or allow requests via an agent. The provision determining fees for responses to such requests also do not apply, and lawsuits and other legal proceedings cannot proceed on the same grounds as those applicable to personal information.
  2. Explaining decisions
    Under Article 36 of the APPI, businesses must “endeavor to explain” their reasons when they refuse requests related to personal data, such as disclosure, correction, or deletion. However, when handling anonymized medical data, this obligation is removed.

It is noteworthy that Articles 43 through 46, in particular, are not listed among the APPI provisions that no longer apply, although the majority of the obligations that remain do not make a lot of sense when the anonymization process is outsourced:

  1. Anonymization Standards: Businesses must process personal information according to Personal Information Protection Commission standards to ensure individuals cannot be identified or data restored.
  2. Security Measures: They must implement safeguards to prevent leaks of deleted identifiers and processing methods used in anonymization.
  3. Public Disclosure: The categories of anonymized data must be publicly disclosed after anonymization.
  4. Third-Party Provision: Before sharing anonymized data, businesses must declare it as anonymized, disclose the information categories and provision methods, and inform recipients explicitly.
  5. Re-Identification Ban: Businesses are prohibited from cross-referencing anonymized data with other information to identify individuals.
  6. Proper Handling: Companies must take necessary steps for security, complaint resolution, and compliance, striving to publicly disclose these measures.

How Private AI Can Help with Compliance Under both Frameworks

As we have seen, there are two different ways businesses in Japan can go about anonymizing health data, either on their own under the APPI or under the NGMIL. The NGMIL introduces an additional layer of privacy protection for medical data that is to be combined into large datasets fed into by several institutions. By entrusting the process of anonymization to certified experts, this complex task is streamlined. Yet, slow uptake and certification regimes remain obstacles.

To support organizations with anonymization or pseudonymization, Private AI’s advanced privacy-enhancing technology automates a crucial step of the process, the detection and redaction of direct and indirect identifiers.

With machine learning models trained to identify over 50 types of personal information across 53 languages, including Japanese, Private AI reduces an onerous process to a few lines of code. Specializing on unstructured data across various file formats, Private AI helps unlock the hidden value of health data for secure innovation. Importantly, the solutions can be deployed on premise, so that there are no concerns with cross-border data transfers.

Conclusion

Japan’s Next-Generation Medical Infrastructure Law represents a significant step forward in balancing medical research with individual privacy. By introducing certified entities to manage data anonymization and adopting an opt-out consent framework, the law facilitates large-scale health research while maintaining robust privacy protections. However, challenges remain, including slow adoption, certification barriers, and fragmented medical databases. The recent introduction of pseudonymized data aims to enhance data utility, particularly for regulatory submissions, but further integration and institutional participation are needed to fully realize the law’s potential. 

As Japan continues refining its approach, privacy-enhancing technologies like Private AI can play a key role in streamlining compliance, ensuring secure data processing, and unlocking the full value of health data for innovation.

Subscribe To Our Newsletter

Sign up for Private AI’s mailing list to stay up to date with more fresh content, upcoming events, company news, and more! 

More To Explore

Download the Free Report

Request an API Key

Fill out the form below and we’ll send you a free API key for 500 calls (approx. 50k words). No commitment, no credit card required!

Language Packs

Expand the categories below to see which languages are included within each language pack.
Note: English capabilities are automatically included within the Enterprise pricing tier. 

French
Spanish
Portuguese

Arabic
Hebrew
Persian (Farsi)
Swahili

French
German
Italian
Portuguese
Russian
Spanish
Ukrainian
Belarusian
Bulgarian
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
Greek
Hungarian
Icelandic
Latvian
Lithuanian
Luxembourgish
Polish
Romanian
Slovak
Slovenian
Swedish
Turkish

Hindi
Korean
Tagalog
Bengali
Burmese
Indonesian
Khmer
Japanese
Malay
Moldovan
Norwegian (Bokmål)
Punjabi
Tamil
Thai
Vietnamese
Mandarin (simplified)

Arabic
Belarusian
Bengali
Bulgarian
Burmese
Catalan
Croatian
Czech
Danish
Dutch
Estonian
Finnish
French
German
Greek
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Khmer
Korean
Latvian
Lithuanian
Luxembourgish
Malay
Mandarin (simplified)
Moldovan
Norwegian (Bokmål)
Persian (Farsi)
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Slovenian
Spanish
Swahili
Swedish
Tagalog
Tamil
Thai
Turkish
Ukrainian
Vietnamese

Rappel

Testé sur un ensemble de données composé de données conversationnelles désordonnées contenant des informations de santé sensibles. Téléchargez notre livre blanc pour plus de détails, ainsi que nos performances en termes d’exactitude et de score F1, ou contactez-nous pour obtenir une copie du code d’évaluation.

99.5%+ Accuracy

Number quoted is the number of PII words missed as a fraction of total number of words. Computed on a 268 thousand word internal test dataset, comprising data from over 50 different sources, including web scrapes, emails and ASR transcripts.

Please contact us for a copy of the code used to compute these metrics, try it yourself here, or download our whitepaper.