Artificial intelligence holds huge promise for healthcare, from predicting diagnoses to personalizing treatments. But as AI systems grow more powerful, so do concerns about patient privacy — especially when clinical models inadvertently memorize sensitive health data.

:
Image: Alex Ouyang/MIT Jameel Clinic, with Adobe Stock
Today, scientists at the Massachusetts Institute of Technology (MIT) are pushing the envelope on how researchers and regulators evaluate these risks. Their latest work, recently presented at NeurIPS 2025 in San Diego and featured in a new MIT News release, introduces a rigorous approach to detect and quantify memorization in clinical foundation models — a key step toward safer AI in medicine.
Why Memorization Matters in Clinical AI
Large AI models trained on electronic health records (EHRs) are designed to generalize: learning broad patterns across thousands of patient records to make useful predictions. But these same systems can also memorize — unintentionally recalling specific, patient-level information that was part of their training data.
“When a model memorizes individual patient details instead of underlying trends,” explains MIT postdoc Sana Tonekaboni, “it risks exposing sensitive information through targeted prompts.” In other words, even “de-identified” training data might be at risk if an attacker can coax a model into revealing unique health facts about a specific person.
Given the gravity of protecting patient privacy under medical ethics and law, this is not a theoretical worry — it’s a real-world privacy challenge with clinical consequence.
New Evaluation Framework for Privacy Risk
To tackle this, the MIT team developed a suite of black-box evaluation tests that probe for memorization in foundation models trained on structured EHR data. The tests operate without access to the model’s internal parameters, mimicking realistic attacker scenarios where only query access is available.
Key features of the framework include:
- Embedding- and generation-level probes: Methods to identify whether specific patient information is being memorized at different levels of the model’s behavior.
- Distinguishing generalization vs. harmful memorization: A clear analytical boundary between safe model learning and situations where sensitive details can leak.
- Open-source toolkit: To promote reproducibility and broader adoption, the team is releasing their testing tools so other researchers and clinicians can apply the same assessments.
This work builds on growing evidence that AI, especially foundation models, may inadvertently reveal parts of their training data if not carefully evaluated — an area of active concern in the ML community.
Patient Privacy: Practical Risk, Not Just Theory
A key insight from the research is that risk isn’t binary. Not all data leaks are equally harmful: revealing a patient’s general age group might be less severe than exposing highly sensitive details like HIV status or mental health conditions.
To measure this, the team’s evaluation setup ranks potential attacks by how much auxiliary information an attacker would need to succeed — helping to prioritize defenses where they matter most.
Researchers also emphasize that memorization risk is higher for vulnerable or unique patient subgroups, where distinctive patterns make records easier to “pick out.”
Towards Trustworthy Clinical AI
Beyond technical innovation, this work is part of a larger effort at MIT’s Abdul Latif Jameel Clinic for Machine Learning in Health to make AI both powerful and trustworthy in healthcare contexts. By laying a foundation for standardized privacy evaluations, the team hopes to influence how future clinical AI systems are vetted before deployment.
As AI continues its rapid integration into medical workflows, rigorous tools like this will be essential to ensuring patient trust and maintaining confidentiality — core principles of ethical care.
