Not long ago, Isha Berry was collecting Canadian COVID-19 data by herself with the aid of a single spreadsheet. Now, she is leading the COVID-19 Canada Open Data Working Group, a team of about a dozen people who are collecting, organizing, and publishing public data about the spread of COVID-19.
Composed of students and researchers based in the University of Toronto, her team supplies Canada’s data to a global research collaboration led by researchers from Oxford University and the University of Washington. Berry’s team maintains an online spreadsheet and a public dashboard on Github to record the spread of the virus across Canada.
Berry, a graduate student at the Dalla Lana School of Public Health, has been organizing the team’s data collection efforts, and fellow Dalla Lana graduate student Jean-Paul Soucy is in charge of the public dashboard. They spoke with The Varsity about the importance of maintaining easily accessible public data about the outbreak — and the challenges they face while doing so.
How the team collects data
Berry’s data collection team is limited by the data that provinces make public. Almost all provinces report the progress of the virus at least daily, but the detail and information in their reports vary.
Smaller provinces and provinces with lower case counts have the resources to publish more data for individual cases. When available, information about traits such as the age and gender of afflicted persons can provide valuable data about provinces’ testing practices — but this data can be time-consuming to gather and publish. There have also been backlogs of testing at times, including in Ontario, which can affect the accuracy of the data.
In order to improve the quality of their data, Berry’s team is shifting its research-gathering efforts to regional reports. Public health regions — called public health units in Ontario — provide official data on the COVID-19 pandemic on a smaller, more local scale, and can fill in the gaps left by provincial data.
There is also plenty of data that would help to model the spread of the virus that is not being reported. For example, epidemiologists — experts in the spread and control of diseases — would benefit from information on how long it takes patients to notice symptoms of the virus after infection, but this is often difficult to measure directly.
“That’s probably the most crucial piece of information, but probably the hardest to get in a timely manner,” said Berry. Luckily, epidemiologists have models to retroactively reconstruct information like this from the available data.
Insights from data collection
The U of T team’s data is being used by an American and British-led international research group to track the spread of the infection. Here in Canada, many people — civilians, researchers, and reporters — are likewise using their data to learn things about the virus. This includes tracking everything from provincial testing rates to the effects of specific anti-COVID-19 measures.
But the data itself isn’t cut-and-dry. “If there’s one lesson we can take from epidemiology, it’s that the numbers very rarely speak for themselves,” said Soucy. “All numbers have to be contextualized, or else the conclusions we draw can be quite erroneous.”
Soucy often runs into this conundrum first-hand when designing the public dashboard. The team’s information is heavily influenced by the way each province collects and reports its own data, which can create misunderstandings with the public.
A notable example is Quebec’s jump in cases on March 23. The sudden spike is mostly attributable to a change in Quebec’s reporting process, rather than a sudden dramatic rise in infections. The team is trying to contextualize information like this on the public dashboard so the public can understand the limitations of the data.
The impact on the public
“[Scientists] collect a lot of information… that, if delivered [and]… packaged in a certain way, can be quite useful to members of the public,” said Soucy. “Given that it is the public who’s generally supporting our work… it does behoove us to spend some more time and effort in order to give back to [them].”
In other words, the data could show us the results of our physical distancing policies. Efforts to control the virus, like implementing physical distancing measures, are largely dependent on public buy-in. This is why it is important, now more than ever, to help the public access information about the virus. The U of T team is doing its best to provide those tools.
“It’s an evolving outbreak, and we’re an evolving research working group,” said Berry. “We’ll try to respond as best we can and shift our priorities and needs depending on what’s available and what’s useful.”