www.scienceboard.org
PerspectivesAre you interested in submitting a Perspective Article? Be sure to read The Science Advisory Board's Editorial Guides for Perspective Articles. Click here. Genomic Structural Variation - the end of the array? by Richard Wintle, Ph.D. The human genome is a large and complex place, variable in its sequence content and organization in ways that were not contemplated when the first draft human genome sequences were published in 2001 [1,2]. Many types of genomic changes, including insertions, deletions, duplications, inversions and complex copy-number variations (CNVs), have since been detected, although rare examples of these had been described, in some cases many years previously [3,4]. Indications that CNV events were both common and widespread came from fortuitous observation of unusual patterns of intensity in genome-wide microarray data [5]. Since that time, studies of genomic structural variants have become common. New technologies have in some cases been created to enable these studies – for example, clone-based comparative genomic hybridization (CGH) arrays, or hybrid microarrays containing single nucleotide polymorphism (SNP) and CNV probes. Existing technologies such as DNA sequencing and mapping of clone ends have also been applied to identify and catalogue these types of variation (e.g. [6]). The power of using whole-genome sequencing to discover and catalogue genetic variation has been recently illustrated by two studies: the sequencing of Dr. Craig Venter's genome by conventional Sanger capillary-electrophoresis sequencing [7] and of Dr. James Watson's genome using next-generation Roche/454 pyrosequencing technology [8]. Each of these studies identified millions of single-base changes, as compared with reference sequences, as well as a host of larger structural variants: insertions, deletions, CNVs and inversions. Importantly for studies of genetic predisposition to disease or other traits of interest, a complete sequence captures the entire SNP content of the genome, eliminating the need for whole genome array-based, or locus-specific targeted, genotyping. So, is this the end of the genome-wide DNA microarray? Even complete genomic sequencing cannot reliably detect all classes of genomic variation. In particular, direct duplications with perfect or near-perfect homology will likely be mis-assembled as single regions, although these might be revealed by counting read depth of sequence coverage. Other types of events, particularly those in “hard to assemble” regions, might be similarly missed. Paired-end sequencing, whereby sequence reads are derived from each end of a single DNA fragment, can help to overcome this obstacle, by allowing for identification of inversions (when sequence ends map in an unexpected orientation), or insertions and deletions (when they map nearer to, or farther away from, each other than is expected). Even so, the sizes of events that can be detected by paired-end technologies are limited by the size of fragments that are sequenced. These might be tens to hundreds of kilobases (for clones such as fosmids [9] or bacterial artificial chromosomes), to a few kilobases or smaller (for fragment libraries made for analysis on next-generation instruments from Illumina, Applied Biosystems, 454/Roche, or others). By contrast, microarray-based approaches such as array-comparative genomic hybridization (array-CGH), or derivation of genomic copy number by examination of signal intensities on SNP- or non-polymorphic oligonucleotide arrays, are relatively robust methods of detecting certain classes of structural variation [10]. These methods, typically based on arrays from Affymetrix, Agilent, Illumina, Nimblegen, and various vendors of clone-based CGH arrays, are widely used in both research and clinical settings. However, none of these methods are able to detect balanced events such as translocations or inversions, necessitating other approaches such as sequencing, karyotyping or fluorescent in situ hybridization (FISH). Further, extracting CNV data from SNP and CGH arrays relies on various computational algorithms, and the application and interpretation of these can be challenging [11,12]. As with sequence-based analysis, array-based approaches also suffer from resolution limitations. Arrays can detect copy number events with resolution determined by the technology used: hundreds of kb in size for BAC-CGH arrays, down to perhaps tens of kb for high-density SNP arrays. With the promise of even higher density arrays, including a 10-million locus single array now in development by Affymetrix [13], and the potential to custom-create very high density, multiple array, genome-wide sets using Affymetrix, Agilent, Nimblegen or other technologies, it would seem that there is no practical reason that even the smallest genomic events cannot be detected using microarrays. Indeed, the premise of resequencing arrays is that single-base substitutions can be identified. Reaching the point at which whole-genome sequencing becomes preferable and routine is now only being prevented by cost, both of sequencing and computational resources to analyze the resulting data. Eventually, the expense of creating and using higher and higher density arrays will outstrip the ever-decreasing cost of whole-genome sequencing. Once this tipping point is reached, sequencing will become the method of choice for comprehensive assessment of genomic structural variation. In the meantime, microarrays will remain the tool of choice for genome-wise analysis.
### << Previous Next >> [ View All Perspectives ] |
|