The data generated by the virus sequencing initiative (VirusSeq) of the Canadian COVID-19 Genomics Network (CanCOGeN) will help track the transmission of SARS-CoV-2 (the virus that causes COVID-19), identify and monitor mutations that may influence disease severity or infectivity, and inform medical countermeasures. Given the importance of data sharing for public health decision-making and vaccine development, VirusSeq has assembled a team of Canadian experts to develop a roadmap for sharing virus sequence data and other associated information that can help researchers interpret it.
We asked Dr. Yann Joly, Research Director of the Centre of Genomics and Policy at McGill University, and Chair of VirusSeq’s Ethics and Governance Working Group, to discuss the importance of data sharing and how a new memo by VirusSeq’s Ethics and Governance Working Group documenting the privacy implications raised by data sharing is clarifying key questions on the subject.
For a copy of the new memo by VirusSeq’s Ethics and Governance Working Group, please contact Catalina Lopez-Correa, Executive Director, CanCOGeN.
“The memo really clarified things for provincial authorities by confirming that, in general, sharing the type of information required for the VirusSeq minimal metadata list does not raise significant privacy issues.” — Dr. Yann Joly, Research Director of the Centre of Genomics and Policy at McGill University, and Chair of VirusSeq’s Ethics and Governance Working Group
Why is the sharing of virus sequence data and associated information important?
Genomics has become a really important surveillance tool for public health, in the sense that we can track how the virus is evolving and link it with information from the human host of the virus. It also allows you to look at different strands of the virus and how they affect different individuals, and how one variable might affect the other.
You’re basically looking at the virus and seeing in real time how it can evolve, which is key to preventing the evolution of a pandemic and to developing vaccines. Of the vaccines we have, some of the better ones are genomics-based.
Are there already policies and frameworks in place facilitating such data sharing efforts?
There were some attempts in the field of public health to install a pan Canadian data-sharing frameworks. Some policies were developed, but with public health laboratories there has always been a challenge of implementation. Some interesting model sharing agreements have been developed, but it was very hard to get consensus within the community and to get the various provinces to agree to a common framework.
I’m thinking about MLISA (Multi-Lateral Information Sharing Agreement). It’s a data‑sharing agreement proposed by the Public Health Agency of Canada. But it has remained sort of a model agreement rather than being implemented by the provincial public health laboratories
Genome Canada has adopted some data‑sharing norms for organization and scientists that it funds for research. This means we do have some very good recommendations on the sharing of genomic data. But they were developed for the context of research and public health stakeholders are not necessarily informed of their content or ready to apply them. So while we have some important piece we don’t have a complete framework yet.
How has limited data sharing within Canada hurt our progress in fighting the current pandemic?
There are several limits. Lack of knowledge about, for example, the laws and policies that surround data sharing leads to people not knowing what can and cannot be shared. Also, sometimes it’s simply a lack of trust or knowledge about what there is to be gained. There is a need to actively engage public heath stakeholders about the benefits that data sharing can bring.
Sometimes the barriers are purely technical. It takes a lot of people to collect genomic data and metadata, and then prepare it in a way that’s going to be sufficiently standardized that it can be shared with other groups. It takes a proper infrastructure to do that. Smaller labs may lack the people or financial resources to do it.
How is the CanCOGeN network, and this memo, helped increase data sharing within Canada?
The idea behind the memo and the larger VirusSeq Ethics and Governance Working Group I chair is that we seek to identify what the impediments are to data sharing and address these hurdles systematically.
One of the first hurdles we identified—thanks to some of our members who are really on the front lines—involved questions of privacy and how we share virus sequencing data and a minimal amount of accompanying metadata, such as which province samples were collected in or what strand of COVID we were working with.
Based on our review of all the types of virus sequencing information being collected, we were able to clarify for provincial authorities that sharing this minimal metadata set along with the virus sequence generally does not raise significant privacy concerns. We were also able to flag the few areas where there are potential privacy concerns, such as data collection from a province with a very small number of cases or territory. If you wanted to share data from the Northwest Territories, from a specific point in time, for example, it might be possible to identify a COVID patient.
Overall, the conclusion of the memo was that in the majority of cases it doesn’t raise issues to share the data. Clarifying some of these challenges through the memo really helped the provincial labs commit to virus data sharing, through the signing of a memorandum of understanding.
There’s still a lot to be built to ensure a robust Canadian data sharing infrastructure, and we’re working hard at this through CanCOGeN. For example, the Protocol Standardization Workgroup develop consensus standards for validating data before it is can shared. With CanCOGeN, we are able to reach out to a critical mass of public health stakeholders and are no longer working in isolation. We’re a network, and all the provincial players are here at the table.
Are other countries contributing to this data-sharing effort?
There have been a lot of contributions at the international level. For example, Professor Zhang Yongzhen, at the Shanghai Public Health Clinical Centre in China, was the first person to share a whole genome sequence of the virus completely openly and very quickly. It was so important to put this piece of information out. It allowed different countries to basically use that model sequence of the virus to confirm the first few cases they had in their respective countries.
Another very good example is from the COG-UK (COVID-19 Genomics UK) consortium, which is a partner of CanCOGeN’s VirusSeq initiative. COG-UK is collecting genomic information and sharing it quickly through the Global Initiative on Sharing Avian Influenza Data (GISAID) database. They are managing to do this data sharing more quickly than many other countries. Thus, their data is used as a standard right now by the international community. They’re also looked at as a model by the international community for how we share data, what format and standards to use and what processes we need. GISAID is also a good example of an international project doing advocacy work and developing tools for pathogen data sharing.
The WHO (World Health Organization) has been a big advocate of sharing pathogen data sequencing and metadata as rapidly as possible for surveillance purposes. They’ve produced a lot of documentation on that. The Global Alliance for Genomics and Health is another important organization.
The growing international consensus on pathogen data sharing sends a strong signal to Canadian bodies that this is what ought to be done and is becoming a scientific and public health norm. These initiatives also ensure broad agreement on minimal standards for how the data is collected and shared.
Are the challenges of sharing virus sequencing data similar to those involving genetic sequencing data from individuals diagnosed with COVID-19 (the “hosts”)?
In both cases privacy can be an issue, but the host information is much more “identifying” of a person than the virus information. What I mean by that is that if I give you access to the genomic sequence of the virus, it’s unlikely that you’ll be able to re-identify the human host of the virus. But if I give you the genomic sequence of the host, that’s a very identifying piece of information. Hence, it needs to be adequately protected.
Another key difference is with whom the VirusSeq data is being shared. It’s the public health labs that are collecting the information on the virus sequences, and they are also going to be responsible for sharing it. In the case of CanCOGeN HostSeq, it’s mostly clinician researchers at various hospitals that are collecting this information and we’ll do the sharing. The familiarity with data sharing norms is quite different between these two groups.
How will all this work on data sharing help us in the future?
While one hopes this will be the last pandemic we will ever see, it’s not likely. We have fewer excuses each time not to address these public health challenges as quickly and effectively as possible. Genomic data sharing is key to preparation for the next pandemic. It’s getting easier and easier to gather data, analyze it and protect it. We must use the means at our disposal to promote data-driven science, policy and international collaboration.
What’s next for your research?
One big question that’s on my mind is this question of inclusion and diversity in CanCOGEN. We are collecting a lot of data, which is great. But we want to make sure that the data we collect reflects the diversity of the Canadian population—including ethnic minorities and marginalized population groups who tend to be underrepresented in genomics research. If we only collect data from specific groups, not everyone will benefit from our research. We need to be working in collaboration with diverse communities to ensure their vision, data and input also inform our research.
How has being part of CanCOGeN helped your work?
It’s very demanding work. From a financial perspective alone, CanCOGeN’s support is key. We need this support for research teams to do the legal and ethical research extremely quickly and provide results for the memos. It also requires the collaboration of individuals possessing different expertise in the network given that promoting data sharing requires a multidisciplinary approach. This network connects me with the public health community, and this gives me the reach to make a difference—quickly. We have all the provincial health labs and data producing organizations at the table. We also have politicians on board.
The Canadian COVID-19 Genomics Network (CanCOGeN) is on a mission to respond to COVID-19 by generating accessible and usable data from viral and host genomes to inform public health and policy decisions, and guide treatment and vaccine development. This pan-Canadian consortium is led by Genome Canada, in partnership with six regional Genome Centres, the National Microbiology Lab and provincial public health labs, genome sequencing centres (through CGEn), hospitals, academia and industry across the country.