Actualités RCTS

Synthetic Patient Data for Better ML in Healthcare

Published on 14/11/2025      Reading time : 5 minutes

 

Generating Synthetic Patient Data to Overcome Machine Learning Limitations in Healthcare Research

Authors: M. Swital, T. Porte, N. Sedmak, C. Bouvard, A. Gougeon, F. Roux, F. Mistretta, A. Lajoinie

Affiliations: RCTs, Lyon, France & Laboratory of Biometry and Evolutionary Biology, UMR 5558, CNRS, University of Lyon 1

Our poster was presented at the ISPOR Europe 2025 congress in Glasgow. This work explores how synthetic patient data can enhance machine learning applications in healthcare, offering promising solutions for data privacy, model robustness, and scalability in research.

 

POSTER ISPOR2025

 

MSW

 

Study Objective

Primary Objective

To explore current methods for generating synthetic patient data to enhance the performance of machine learning (ML) models in healthcare research.

Specific Focus

To analyze scenarios where datasets are small or imbalanced, which often limits the robustness of traditional ML approaches.

Methodology

Design: Systematic literature review
Database: MEDLINE search
Period: Studies published since 2020
Selection: 176 studies initially selected → 6 studies included after full-text review

Key findings

Key Analysis Points
  • Synthetic data improves ML model robustness
  • Preserves patient privacy while enriching datasets
  • Mitigates limitations due to small or biased datasets
  • Validated techniques: GANs, SMOTE, CTGAN, and Bayesian simulation
Data Sources

Electronic Health Records (n=3)
Clinical Registries (n=2)
Medical Imaging Datasets (n=1)

Identified Techniques

GANs (Generative Adversarial Networks)
SMOTE (Synthetic Minority Over-sampling)
CTGAN (Conditional Tabular GAN)
Bayesian Simulation

Demonstrated Benefits

Enriching datasets
• Improving model training
• Enhancing model evaluation
• Preserving patient privacy

Robustness Assessment

• Cross-validation
• Comparison with real-world data
• Sensitivity analyses
• Performance testing

Conclusion and Perspectives

Synthetic patient data generation is a promising strategy to enhance the reliability and performance of machine learning models in healthcare. It supports privacy-preserving model development and addresses data limitations. However, standardized evaluation frameworks and real-world implementation are essential to fully realize its potential in clinical decision-making and health technology assessment.

Vous souhaitez échanger
avec nos équipes

Contactez-nous

D’autres articles qui pourraient
vous intéresser

Machine Learning Imputation in Healthcare
Actualités RCTS - Publications - Autres - Facultatif

Machine Learning Imputation in Healthcare

La soumission réglementaire d’un essai clinique : le CTIS
Actualités RCTS - Facultatif

La soumission réglementaire d’un essai clinique : le CTIS

Participation de RCTs au FIRC 2025
Événements - Actualités RCTS - Facultatif

Participation de RCTs au FIRC 2025