Using machine learning to unravel long Covid
Long Covid, with its constellation of symptoms, is proving a challenging moving target for researchers trying to conduct large studies of the syndrome.
As they take aim, they’re debating how to responsibly use growing piles of real-world data — drawing from the full experiences of long Covid patients, not just their participation in stewarded clinical trials.
“People have to really think carefully about what does this mean,” said Zack Strasser, an internist at Massachusetts General Hospital who has used existing patient records to study the characteristics of long Covid. “Is this true? Is this not some artifact that’s just happening because of the people that we’re looking at within the electronic health record? Because there are biases.”
One of the largest sources of real-world data on long Covid is a first-of-its-kind centralized federal database of electronic health records called the National Covid Cohort Collaborative, or N3C. Kickstarted as part of a $25 million National Institutes of Health award early in the pandemic, N3C now includes deidentified patient data from 72 sites around the country, representing 13 million patients and nearly 5 million Covid cases.