Spotting C. diff Risks with 'Hospital-Specific Approach' to Big Data
Researchers say the key is in a facility-specific model, rather than a one-size-fits-all approach.
A team of medical researchers trying to predict which hospital patients face the highest risk of contracting Clostridium difficile (C. diff) reviewed more than a quarter-million electronic health records (EHR) with a simple hypothesis.
Perhaps the key to understanding C. diff risk factors is context, they suggested. So the team of researchers from the University of Michigan, Massachusetts General Hospital, and the Massachusetts Institute of Technology (MIT) devised a project to test whether risk factors vary from one facility to the next.
Jenna Wiens, PhD, a senior author on the paper and an assistant professor of computer science and engineering at the University of Michigan in Ann Arbor, said the project threw out some of the overly generalized assumptions that inhibited past efforts to predict which patients would face the highest C. diff risk.
“When data are simply pooled into a one-size-fits-all model, institutional differences in patient populations, hospital layouts, testing and treatment protocols, or even in the way staff interact with the EHR can lead to differences in the underlying data distributions and ultimately to poor performance of such a model,” Wiens told the university’s Michigan Health Lab publication.
“To mitigate these issues, we take a hospital-specific approach, training a model tailored to each institution,” she added.
Wiens and her colleagues used big data techniques to analyze EHR entries from more than 190,000 adult admissions to the University of Michigan Hospitals (UM) and more than 65,000 adult admissions to Massachusetts General Hospital (MGH), according to the abstract.
“We extracted patient demographics, admission details, patient history, and daily hospitalization details, resulting in 4,836 features from patients at UM and 1,837 from patients at MGH,” the researchers wrote.
They used machine learning to identify two models, one for each facility. Although the two models bore some similarity in which factors predicted higher C. diff risk, many of the top-ranked factors differed between the two facilities.