Adjusting for biases in digital well being report (EHR) knowledge – Healthcare Economist
4 min read
Let’s say you have an interest in measuring the connection between sort 2 diabetes mellitus (T2DM) and melancholy. In lots of instances, one would use digital well being data knowledge and conduct a logistic regression with melancholy because the dependent variable and T2DM (doubtlessly together with demographics and different comorbidities) because the impartial variables. Nevertheless, using EHR is doubtlessly problematic. As famous in Goldstein et al. (2016), there’s a risked of “knowledgeable presence” because the pattern of sufferers in EHR seemingly differs from these in most people since people solely seem once they have a medical encounter.
Particularly, Goldstein and co-authors be aware that extra frequent visits improve the prospect of being recognized with a illness:
Quan et al. assessed sensitivities based mostly on Worldwide Classification of Ailments, Ninth Revision, codes throughout 32 widespread situations. They discovered that sensitivities for prevalence of a situation ranged from 9.3% (weight reduction) to 83.1% (metastatic most cancers). Diabetes with problems, for instance, has a sensitivity of 63.6%. Due to this fact, the extra medical encounters somebody has, the extra seemingly that the presence of diabetes shall be detected.
On the identical time, whereas extra encounters scale back the chance of a false damaging, in addition they improve the chance of a false optimistic as a result of rule-out diagnoses.
Since phenotype algorithms are typically designed to detect the prevalence of a situation through ever/by no means algorithms (you both have the situation otherwise you don’t), the extra health-care encounters somebody has the upper the chance of a false-positive prognosis.
Two varieties of bias could come up:
- Bias to variety of doctor visits. Determine 1A from this paper reveals that the variety of encounters could also be a confounding issue. It’s not proof, nevertheless, whether or not there’s potential for M bias, bias from conditioning on a collider. A collider is a variable that’s an end result of two different variables.
- Bias as a result of normal sickness. The authors be aware {that a} normal sickness could also be the reason for each diabetes and melancholy. As an illustration, maybe somebody sustained an damage which make them train much less and eat much less wholesome (inflicting T2DM) and the damage itself additionally elevated melancholy. Whereas my instance offers a selected damage, the “normal sickness” within the Goldstein paper could or might not be absolutely captured or identified. Thus, the authors declare that the variety of encounters might be able to function a proxy for normal sickness.

Briefly, the authors argue that controlling for variety of visits will be helpful for (i) controlling for the truth that prognosis is correlated with variety of encounters and (ii) variety of encounters could also be a proxy for normal sickness.
The authors then conduct a simulation train utilizing EHR knowledge from the Duke College Well being System. The authors conduct 4 analyses analyzing the connection between end result and publicity controlling for: (i) demographics solely, (ii) medical encounters, (iii) Charlson Comorbidity Index (CCI), and (iv) medical encounters and CCI.
The authors summarize their findings as follows:
If the presence of a medical situation isn’t captured with excessive chance (i.e., excessive sensitivity), there’s the potential for inflation of the impact estimate for affiliation with one other such situation. This potential for bias is exacerbated when the medical situation additionally results in extra affected person encounters…Principle suggests, and our simulations affirm, that conditioning on the variety of health-care encounters can take away this bias. The influence of conditioning is best for diagnoses captured with low sensitivity.
The authors be aware that whereas there’s some concern of M bias–because the variety of encounters is probably going a collider–M bias is probably going considerably much less problematic than confounding bias usually. Others research (Liu et al. 2012) have confirmed that M-bias is commonly smaller than confounder bias.
An apart: Berkson’s Bias
The issue of sicker sufferers showing in EHR knowledge causes a manifestation of Berkson’s bias:
As a result of samples are taken from a hospital in-patient inhabitants, slightly than from most people, this may end up in a spurious damaging affiliation between the illness and the chance issue. For instance, if the chance issue is diabetes and the illness is cholecystitis, a hospital affected person with out diabetes is extra prone to have cholecystitis than a member of the final inhabitants, because the affected person will need to have had some non-diabetes (presumably cholecystitis-causing) purpose to enter the hospital within the first place. That consequence shall be obtained no matter whether or not there’s any affiliation between diabetes and cholecystitis within the normal inhabitants.