You are here

CanCOGeN HostSeq on a mission to understand variability in COVID-19 outcomes

Monday, May 31, 2021

Q&A with Dr. Naveed Aziz

Dr. Naveed AzizThe Canadian COVID-19 Genomics Network (CanCOGeN)’s HostSeq initiative is sequencing the genomes of up to 10,000 Canadians affected by COVID-19 (“the hosts”) – informing our understanding of how COVID-19 impacts different populations, as well as public health decision-making. Led by CGEn, the initiative has established the national CGEn HostSeq Databank where sequencing data are deposited and accessible to researchers across Canada and the world.

Host genome sequencing is crucial to Canada’s pandemic response and preparation for future public health challenges. It helps us answer key questions, such as why the disease is more detrimental to certain individuals than others. By identifying new biomarkers of the disease severity risk in host samples, HostSeq-supported research will help inform new therapeutic strategies, for example.

HostSeq-supported researchers have access to free DNA extraction and whole genome sequencing, with a copy of the data deposited into the HostSeq databank. This allows researchers from across Canada to bolster each other’s work through data sharing.


“When the pandemic hit, it left many healthcare researchers with a pressing need for scientific data that might inform mechanisms of disease manifestation. Specifically, they needed data to guide treatment decisions, support therapeutics and drive vaccine development.” -  Dr. Naveed Aziz


What makes the CGEn HostSeq Databank the first of its kind in Canada?

Historically, the Canadian genomics ecosystem did not have a population-wide data set, and we did not have a working pipeline to actually pivot and focus on the data we needed to understand how COVID-19 was affecting different populations. That’s what makes this Databank so vital and a first for Canada. For the first time, we will have a large data set that researchers across the country and world can tap into.

What will the databank accomplish?

In the shorter term, it will help us understand why some people are genetically predisposed to more severe, or mild, symptoms. This information can be used to inform the approach to medical care for those people. For example, if we know an individual’s natural predisposition to disease severity, we can start to use host sequencing data to make better decisions across the healthcare system, such as who should be admitted to ICU, and who would be better off at home.

In the medium term, we will start to mine the data for potential targets for therapeutics. When we consider global health equity and the challenges around vaccine efficacy, storage and delivery, host sequencing data has the potential to help us develop targeted vaccines or other drugs that could be taken orally, for example.

Longer term, when another pandemic or major public health challenge arrives, we can mine this data bank and learn from what we did for COVID-19.

Why was CGEn the right organization to lead design and delivery of this initiative?

CGEn was created to be response-ready to large-scale Canadian scientific challenges. CGEn has links regionally, nationally and internationally, allowing for our ability to respond rapidly to emerging needs. The advantage is that many of the Canadian genomics researchers are our clients and users of the CGEn infrastructure. We already have a network of people who we can call upon and who are familiar with the quality of research output and impact that CGEn delivers. CGEn was established in 2014 and is funded primarily by the Canada Foundation for Innovation, with support from other stakeholders like Genome Canada and the host institutions. We support more than 2,000 independent research groups every year.

What challenges did you encounter while getting the Databank off the ground?

The biggest challenge was the lack of preparedness in terms of governance policies to conduct genomics health research on a national scale. Other countries have large-scale programs that have been running for many years. We also have a healthcare system that is funded federally and administered provincially, which makes the governance of national projects a challenge. It would normally take years to put, for example, a national policy on common consent, together. Because of the pandemic, we were able to do that in under 10 weeks, which is something all involved should be very proud of.

Given the underrepresentation of diverse populations in a lot of genomic data collection, how are you ensuring all of Canada is represented in the data collected?

We must realize and acknowledge that the barriers to inclusion can be very complicated. Our model prioritizes establishing diverse partnerships, bringing unique participant cohorts into the HostSeq Databank. We are doing this to ensure that this national resource is inclusive and representative of the Canadian population. We have a strong national strategy that covers the inclusion barriers of underrepresented groups, including Indigenous, Black, South Asian and Hispanic communities, for example. It’s about meaningful collaboration with communities and taking the extra measures to ensure the data is representative of the actual Canadian population.  

How is the data generated by HostSeq being shared and who can access it?

We have a unique collaborative national research infrastructure. Our sequencing and analysis platform is accessible to Canadian and international researchers from all disciplines in academic, private and government sectors. We offer complete high-throughput, low-cost and high-quality sequence generation and analysis services to decode whole genomes. The HostSeq data is accessible by national and international researchers through a Data Access Compliance Office (DACO).

How will you ensure maximum impact?

We are making sure that we provide access to as many researchers as possible – putting data into the hands of people who will use it is key here.

How will privacy concerns be addressed?

It’s important for Canadians to know that, while we are making sequencing and clinical data widely accessible to researchers, their privacy is paramount. The samples we collect are de-identified before we perform whole genome sequencing and the corresponding information is added to the Databank. CGEn does not store any identifiers that might link personal information to a participant. This information remains with the consenting investigator, which ensures privacy of all participants.


The Canadian COVID-19 Genomics Network (CanCOGeN) is on a mission to respond to COVID-19 by generating accessible and usable data from viral and host genomes to inform public health and policy decisions, and guide treatment and vaccine development. This pan-Canadian consortium is led by Genome Canada, in partnership with six regional Genome Centres, the National Microbiology Lab and provincial public health labs, genome sequencing centres (through CGEn), hospitals, academia and industry across the country.