An empirical evaluation of the risks of AI model updates using clinical data: stability, arbitrariness, and fairness

Apr 28, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 0 views

Artificial Intelligence and Machine Learning (AI/ML) models used in clinical settings are increasingly deployed to support clinical decision-making. However, when training data become stale due to changes in demographics, environment, or patient behaviors, model performance can degrade substantially. While updating models with new training data is necessary, such updates may also introduce new risks. We evaluated the proposed monitoring framework on four publicly available U.S.-based Type 1 Diabetes datasets containing high-resolution continuous glucose monitoring (CGM) data, comprising approximately 11,300 weekly observations from 496 participants under 20 years of age. All datasets included structured sociodemographic information. Using the prediction of severe hyperglycemia events in children with type 1 diabetes as a case study, we examine how different model update strategies can adversely affect model stability (e.g., by causing predictions to "flip" for a large number of cases after an update), increase arbitrariness in predictions, or worsen accuracy equity and the balance of error rates across subpopulations. We propose multiple dimensions for continuous monitoring to detect these issues and argue that such monitoring is essential for the development of trustworthy clinical decision support systems.

Original article

arXiv.org

Read full at arXiv.org →

Full article excerpt tap to expand

Computer Science > Artificial Intelligence arXiv:2604.23954 (cs) [Submitted on 27 Apr 2026] Title:An empirical evaluation of the risks of AI model updates using clinical data: stability, arbitrariness, and fairness Authors:Ioannis Bilionis, Ricardo C. Berrios, Luis Fernandez-Luque, Carlos Castillo View a PDF of the paper titled An empirical evaluation of the risks of AI model updates using clinical data: stability, arbitrariness, and fairness, by Ioannis Bilionis and 2 other authors View PDF HTML (experimental) Abstract:Artificial Intelligence and Machine Learning (AI/ML) models used in clinical settings are increasingly deployed to support clinical decision-making. However, when training data become stale due to changes in demographics, environment, or patient behaviors, model performance can degrade substantially. While updating models with new training data is necessary, such updates may also introduce new risks. We evaluated the proposed monitoring framework on four publicly available U.S.-based Type 1 Diabetes datasets containing high-resolution continuous glucose monitoring (CGM) data, comprising approximately 11,300 weekly observations from 496 participants under 20 years of age. All datasets included structured sociodemographic information. Using the prediction of severe hyperglycemia events in children with type 1 diabetes as a case study, we examine how different model update strategies can adversely affect model stability (e.g., by causing predictions to "flip" for a large number of cases after an update), increase arbitrariness in predictions, or worsen accuracy equity and the balance of error rates across subpopulations. We propose multiple dimensions for continuous monitoring to detect these issues and argue that such monitoring is essential for the development of trustworthy clinical decision support systems. Comments: Accepted to iEEE EMBC 2026. 4 pages, 3 figures Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2604.23954 [cs.AI] (or arXiv:2604.23954v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2604.23954 Focus to learn more arXiv-issued DOI via DataCite Submission history From: Ioannis Bilionis [view email] [v1] Mon, 27 Apr 2026 01:59:04 UTC (3,417 KB) Full-text links: Access Paper: View a PDF of the paper titled An empirical evaluation of the risks of AI model updates using clinical data: stability, arbitrariness, and fairness, by Ioannis Bilionis and 2 other authorsView PDFHTML (experimental)TeX Source view license Current browse context: cs.AI < prev | next > new | recent | 2026-04 Change to browse by: cs References & Citations NASA ADSGoogle Scholar Semantic Scholar export BibTeX citation Loading... BibTeX formatted citation × loading... Data provided by: Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Code, Data and Media Associated with this Article alphaXiv Toggle alphaXiv (What is alphaXiv?) Links to Code Toggle CatalyzeX Code Finder for Papers (What is CatalyzeX?) DagsHub Toggle DagsHub (What is DagsHub?) GotitPub Toggle Gotit.pub (What is GotitPub?) Huggingface Toggle Hugging Face (What is Huggingface?) ScienceCast Toggle ScienceCast (What is ScienceCast?) Demos Demos Replicate Toggle Replicate (What is…

This excerpt is published under fair use for community discussion. Read the full article at arXiv.org.

Anonymous · no account needed

Discussion

0 comments

An empirical evaluation of the risks of AI model updates using clinical data: stability, arbitrariness, and fairness

Discussion

More from arXiv.org