Prediction of Inpatient Pressure Ulcers Based on Routine Healthcare Data Using Machine Learning Methodology
We conducted a single center cross-sectional study in a tertiary care setting. This study was performed in accordance with STROBE as a general guideline for observational studies26 and in particular STROSA for studies analyzing secondary data27.
We included all adult cases (≥ 19 years old) admitted and discharged between 2014 and 2018 at Carl Gustav Carus University Hospital Dresden. We excluded children/adolescents, cases with prevalent pressure ulcers, psychiatric treatment and length of stay
Results and covariates
The outcome/dependent variable was case-specific incident PU. To correctly identify prevalent and incident pressure ulcers, consistent assessment on admission is essential. Especially among nursing home residents, it is not always clear whether a pressure ulcer was already present on admission. Our internal standard requires a pressure ulcer assessment for high-risk cases (internal treatment, intensive care and surgery) within 24 hours of admission. Each PU detected within this time frame was marked as prevalent and excluded from our analysis.
We grouped the independent variables into case- and care-related characteristics.
Case characteristics include age, sex (male), and comorbidities. To define comorbidities (based on ICD-10) appropriately, we followed the German Quality Assurance Program for Inpatients. The German Inpatient Quality Assurance Indicator for UP considers type 2 diabetes mellitus, BMI ≥ 40, underweight and/or malnutrition, dementia and/or disturbances of alertness, infections, other serious illnesses, mobility and incontinence. Definitions based on ICD-10 are provided in Supplement S828.
Care-related characteristics include reasons for admission (emergency, transfer from another hospital), (duration of) surgical anesthesia, number of wards involved in care, and intensive care with or without ventilation.
We did not include the Braden score as a predictor in the models because it was used for preventive screening of pressure ulcers in the hospital. This implies that the probable cases of pressure ulcers indicated by the Braden score may have been prevented and do not occur in our data. Consequently, estimating the relationships between the observed PUs and the Braden score would lead to misleading results. Some publications also add length of hospital stay in risk-adjusted analyzes for pressure ulcers on the one hand29.30. On the other hand, several studies have shown that pressure ulcers prolong the duration of hospitalization31,32,33. This feedback effect drives the endogeneity of length of hospital stay as a predictor of pressure ulcers and could seriously bias the results of our risk factor analysis. Therefore, we decided not to consider length of stay as part of the main analysis. However, we included length of stay as a predictor for the sensitivity analysis (S9 Supplement). In the main analysis, case complexity was captured by a broad set of variables such as comorbidities, anesthesia, reason for admission, intensive care treatment, and ventilation.
We used four data sources:
PU screening standardized internally and collected routinely for the detection of incident PUs,
legally (§21 Krankenhausentgeltgesetz) required and predefined accounting data for age, gender, comorbidities, treatment in intensive care, ventilation and reasons for admission,
case-based surgical protocols for the duration of surgical anesthesia (from induction to awakening)
case-based ward stays for the number of affected hospital wards per case
Participation in the study, confidentiality and ethics
We analyzed pseudonymized routine datasets in a single-centered framework. If reasonably justified, the legislation of the federal state of Saxony (§35(1–3) “Sächsisches Krankenhausgesetz”) does not require individual consent for large pseudonymised, monocentric routine datasets. The legal justification in the Land of Saxony is based on the principle of internal research by specific service providers. We have incorporated these data confidentiality conditions and justifications into our study protocol. The Institutional Review Board (IRB00001473 and IORG0001076) of TU Dresden Medical School reviewed and approved the study protocol.
Patient and public involvement
It was not appropriate to involve patients or the public in the design, conduct, reporting or dissemination plans of our research. It is a non-interventional cross-sectional analysis based on observational data, predefined outcomes and covariates.
Descriptive statistics in case of categorical variables were provided in the form of absolute and relative frequencies. Continuous variables were described by the median and the 1st and 3rd quartiles. We used Bayesian additive regression trees (BART) to predict pressure ulcers and estimate predictive relationships between pressure ulcers and risk factorsten. Generally, BART is based on regression trees, which can be used when associations between independent and dependent variables cannot be described in a linear fashion. The advantage of regression trees over, for example, logistic regression is the ability to handle non-logistic associations and interactions. Regression trees form homogeneous groups to identify relationships between outcome and covariates. At a certain degree of heterogeneity in the groups, the groups are separated to achieve greater homogeneity (splitting). BART combines multiple trees into a “sum of trees” model, which facilitates more accurate and stable out-of-sample predictions than single regression trees. This ability led us to prospectively predict pressure ulcer incidences in addition to associations between dependent and independent variables. In this regard, it should be noted that a high/low predictive power of a model does not necessarily imply an accurate/inaccurate estimation of the relationships between the outcome and the covariates.34.
We used data from 2014 to 2017 to fit the BART model. The number of trees (50, 75, 100) served as a tuning parameter in the tenfold cross-validation. We assessed the predictive performance of the selected model based on a confounding matrix and area under the curve (AUC) using 2018 data. An AUC of 0.5 suggests no discrimination (i.e. ability to predict cases with and without incident UP), 0.7-0.8 is considered fair, 0.8-0.9 is considered excellent, and more than 0.9 is considered exceptional35. In addition to the confusion matrices, we analyzed the performance indicators sensitivity, specificity, positive predictive value, negative predictive value, precision, recall, F1, prevalence, detection rate, detection prevalence, balanced accuracy (in case of imbalance of high class) and accuracy. Subgroup analyzes were performed for the full dataset, intensive care (yes/no), anesthesia (yes/no), ventilation (yes/no), and different grades of PU. To assess the predictive performance of specific risk factors, we calculated variable importance as the proportion of times each risk factor was chosen for a split rule, i.e. to define a node in the sum of trees model. We calculated partial dependencies to explore the influence of risk factors (eg age) on the predicted probability of pressure ulcers. We used 95% credible intervals to assess the precision of partial dependence estimates. Statistical analysis was performed using R 3.6.3 and bartMachine package36. Regarding methodological rigor, the accuracy of BART predictions was compared to those based on multiple logistic regression, random forest and LASSO (see Supplement S10 for a more detailed description).