This past month, scientists from the Hospital for Sick Children and the Wellcome Trust Sanger Institute collaborated to create a 3-D map of the human genome. Much like a GPS, the map allows for navigation through the human genome like never before. It details genetic variation and copy number variations (CNVs), in particular, that scientists were unable to detect until now.
CNVs refer to the number of genes in a strand of DNA. Genes have two copies per genome, but some are found to occur in three copies or more, or sometimes not at all. Although many of these inconsistencies are benign, others influence predisposition to disease. For example, individuals with a particular deletion in one of the two copies of chromosome five are at a considerably higher risk of developing Crohn’s disease. CNVs are also associated with Parkinson’s disease and Alzheimer’s. In general, the genes that tend to have variable copy numbers are associated with the immune system and brain development and activity.
CNVs were first discovered in 2004 when Dr. Stephen Scherer and his research team at The Hospital for Sick Children detailed the concept of copy number variation. At the time, little was known about detecting disease-causing abnormalities, and the relationship between CNVs and disease was yet to be identified.
“It had been known for 15 years that there were large structural changes in chromosomes called cytogenetically abnormal chromosomes. You see these when people have [chronic] diseases [and] these large changes affect millions of base pairs of DNA,” explained Scherer.
As a result of the human genome project, information concerning the single nucleotide base pair changes along chromosomes became widely available. But nothing was known about the 500 base pair size range up until the millionth base pair range—a middle class of genetic variation.
In order to focus on this middle-class variation, Scherer and his team used microarray technology to analyze gene expression (a term used to describe the transcription of DNA into messenger RNA from which proteins are created) and identify CNVs along the chromosome. “It was still pretty low resolution,” mentioned Scherer. “In any given data sample, we found 12 or so CNVs and that was surprising because we had only seen these changes among people with diseases.”
In 2006, the researcher team began a new, larger project attempting to identify CNVs in sample populations. The CNVs were catalogued to use as a reference database. The first high resolution map of CNVs was completed in 2006 after the DNA from 270 people across four populations was analyzed and catalogued. This map became the standard across the field of biology so that other scientists could compare their own data to that which was catalogued in the map. However, this map did not identify all of the CNVs—only approximately 100 CNVs were found in each genome, so another experiment needed to be conducted.
The experiments had to be carried out in steps because of the cost and lack of data available. “We did the experiment with 40 individuals first because each experiment costs us about $20,000. And with all of the CNVs that we identified in the so-called discovery phase of that project, we then ran DNA samples across microarrays—so we [genotyped] across the population. Ultimately, we generated a great resource of information. We actually integrated all of the data, so now we have an integrated genetic variation map of the human genome.” In this third step, which took three years with 12 people working on the project full-time at Sick Kids, the research teams at Sick Kids and the Singer Institute equally divided the research.
The Hospital for Sick Children houses the Database of Genetic Variants (CNVs that occur in the general population), while the Wellcome Singer Institute holds the database of CNVs that are related to clinical conditions.
In an interview with News at U of T Scherer said, “Variation is indeed the spice of life and we now know that nature buffers this variation by using CNVs. We are harnessing this knowledge to fight disease.”
“We haven’t gotten all of the CNV variation between one nucleotide and 500 [base pair] because microarrays can’t do that, so we’d have to [accomplish that] by DNA sequencing,” says Scherer. The Singer Institute is currently working on another project called the Thousand Genome Project which will expand upon this research.
The team’s research paper published in the journal Nature is a major stepping stone to the complete analysis and cataloguing of CNVs in the human genome.
“This paper will be the one that scientists will use [as a comparison for their studies] for the next two years. It is a quantum leap from what we knew in 2006.”