56834616820.1.1.2330.1503source_20Emerging themes and new challenges in defining the role of structural variation in human disease.The widespread use of array-comparative genomic hybridization (array-CGH) for the detection of copy number variants (CNVs) in both research and clinical laboratories has created a renaissance in the field of molecular cytogenetics, revealing that the human genome contains both a wealth of structural polymorphism and many novel genomic disorders. A new generation of experimental platforms enable structural variants to be identified with increasing resolution, and will require the development of more sophisticated methods to assess the pathogenic significance of novel structural variants if these technologies are to be of clinical utility. Indeed, we are now entering an era in which technologies to detect CNVs have advanced much faster than our understanding of the consequences of these variants on human phenotypes, and I argue that over the last few years the problem has now become one of interpretation rather than identification. This problem is made more complex by the realization that many genomic disorders show highly variable penetrance, blurring the boundary of how to define benign vs. pathogenic variants. I discuss insights from recent research which shed light on potential mechanisms that may underlie this phenomenon, and possible methods to determine the genetic elements that are responsible for the associated phenotype. Furthermore, there is now a growing appreciation that the underlying chromosomal architecture which catalyses many genomic disorders is polymorphic within the general population, and I discuss potential mechanisms by which inversion polymorphisms might create predispositions to genomic disorders.2008Human MutationArticleAndrew J SharpK M Timms, M L Bondeson, M A Ansari-Lari, K Lagerstedt, D M Muzny, S P Dugan-Rocha, D L Nelson, U Pettersson, R A Gibbs, Molecular and phenotypic variation in patients with severe Hunter syndrome., Article, Human Molecular Genetics 04/1997; 6(3):479-86.|M A Jobling, G A Williams, G A Schiebel, G A Pandya, G A McElreavey, G A Salas, G A Rappold, N A Affara, C Tyler-Smith, A selective difference between human Y-chromosomal DNA haplotypes., Article, Current Biology 01/1998; 8(25):1391-4.|Irina Balikova, Kevin Martens, Cindy Melotte, Mustapha Amyere, Steven Van Vooren, Yves Moreau, David Vetrie, Heike Fiegler, Nigel P Carter, Thomas Liehr, Miikka Vikkula, Gert Matthijs, Jean-Pierre Fryns, Ingele Casteels, Koen Devriendt, Joris Robert Vermeesch, Autosomal-dominant microtia linked to five tandem copies of a copy-number-variable region at chromosome 4p16., Article, The American Journal of Human Genetics 02/2008; 82(1):181-7.HUMAN MUTATION 0,1^10,2008
REVIEW ARTICLE
Emerging Themes and New Challenges in Defining
the Role of Structural Variation in Human Disease
Andrew J. Sharp?
Department of Genetic Medicine and Development, University of Geneva Medical School, University Medical Center (CMU), Geneva,
Switzerland
The widespread use of arraycomparative genomic hybridization (array-CGH) for the detection of copy number
variants (CNVs) in both research and clinical laboratories has created a renaissance in the field of molecular
cytogenetics, revealing that the human genome contains both a wealth of structural polymorphism and many
novel genomic disorders. A new generation of experimental platforms enable structural variants to be identified
with increasing resolution, and will require the development of more sophisticated methods to assess the
pathogenic significance of novel structural variants if these technologies are to be of clinical utility. Indeed, we
are now entering an era in which technologies to detect CNVs have advanced much faster than our
understanding of the consequences of these variants on human phenotypes, and I argue that over the last few
years the problem has now become one of interpretation rather than identification. This problem is made more
complex by the realization that many genomic disorders show highly variable penetrance, blurring the boundary
of how to define benign vs. pathogenic variants. I discuss insights from recent research which shed light on
potential mechanisms that may underlie this phenomenon, and possible methods to determine the genetic
elements that are responsible for the associated phenotype. Furthermore, there is now a growing appreciation
that the underlying chromosomal architecture which catalyses many genomic disorders is polymorphic within
the general population, and I discuss potential mechanisms by which inversion polymorphisms might create
predispositions to genomic disorders. Hum Mutat 0, 110, 2008.
rrrr2008 Wiley-Liss, Inc.
KEY WORDS: CNV; deletion; duplication; inversion
INTRODUCTION
Copy number variants (CNVs) have emerged as a major force,
shaping both genetic and phenotypic variation. Current best-
estimates suggest that in terms of the total number of base pairs of
genetic difference between any two individuals, CNVs contribute
approximately twice the amount than single nucleotide poly-
morphisms (SNPs) [Tuzun et al., 2005; Korbel et al., 2007]. The
rapid progress in developing new technologies for the study of
CNVs has led to an explosion in the amount of data now available.
Techniques such as arraycomparative genomic hybridization
(array-CGH) are now becoming standard tools in most clinical
laboratories, and may soon even begin to replace conventional
karyotyping as the primary method for investigating some
disorders. Despite this rapid growth, our understanding of the
link between structural variants and human disease is somewhat
limited. It is now clear that there is considerable complexity when
attempting to assess the role of CNVs in human phenotypes, much
of which is not immediately apparent to the casual observer. In this
article, I attempt to provide an overview of several emerging
themes in the field of structural variation and discuss the
implications of these, both in the clinical and research setting.
Rather than discussing the connection between common structur-
al polymorphism and common phenotypic variation, such as that
observed with the beta defensin locus in Crohn disease [Fell-
ermann et al., 2006] and psoriasis [Hollox et al., 2008], this review
focuses mainly on the role of rare variants (frequency typically
o1%) in human disease.
CURRENTCNV MAPSARE INCOMPLETE AND
CONTAIN SIGNIFICANT ERRORS
A recent meta-analysis of published data shows that more than
600Mb, or approximately 20%, of the human genome sequence is
currently annotated as CNV [Cooper et al., 2007]. However, as
the field of structural variation is still immature and there is wide
variation in the methods by which these CNVs have been
annotated, great care should be taken when interpreting these
data. Most importantly, this number almost certainly represents an
overestimate of the true amount of structural variation in humans.
The main reasons for this are two-fold. First, the majority of
putative CNVs identified to date have been annotated using BAC
array-CGH. While this platform has arguably been the major
catalyst for the study of CNV, the large size of the probes used in
BAC-based microarray studies means that these platforms lack
precision, and crucially will tend to overestimate the size of the
Published
wiley.com).
onlineinWiley InterScience(www.interscience.
DOI10.1002/humu.20843
Received 29 January 2008; accepted revised manuscript 20 May
2008.
?Correspondence to: Andrew J. Sharp, Department of Genetic
Medicine and Development, University of Geneva Medical School,
CMU, Room 9148,1rueMichel-Servet,1211Geneva, Switzerland.
E-mail: andrew.sharp@medecine.unige.ch
Grant sponsor: European Communitys Seventh Framework
Program;Grant number:219250.
rrrr2008 WILEY-LISS, INC.
CNVs that are predicted. From the use of much more precise
methods, such as paired-end mapping [Tuzun et al., 2005; Korbel
et al., 2007; Kidd et al., 2008], it is now clear that the majority of
CNVs in humans are o50kb in size. However, by their very
nature, array-CGH platforms are unable to accurately define any
CNV that is smaller than the probe used to detect it. For example,
for a situation in which a CNV covers only 50% of a BAC probe,
the entire region corresponding to the BAC is labeled as a site of
CNV. As BAC arrays with an average probe size of ?100 to 200kb
have been the most widely used tool for the study of structural
variation until recent times, this has resulted in an exaggeration of
the amount of the genome that is defined as structurally variant.
This exaggeration of CNV size was conclusively demonstrated by
Perry et al. [2008], who used high-resolution oligonucleotide
arrays to fine map 1,153 putative CNV loci and found that in
reality, 88% were smaller in size than is recorded in the Database
of Genomic Variants (http://projects.tcag.ca/variation). Impor-
tantly, this exaggeration was considerable, with the putative size
of 76% of CNV loci being overestimated by more than two-fold.
The implication of these results is that a more accurate estimate of
the proportion of the human genome that is structurally variant is
perhaps in the order of ?5 to 10%.
Further evidence to illustrate this bias comes from the study of
Redon et al. [2006], in which both whole-genome BAC and
oligonucleotide SNP arrays were used side-by-side to study the
same population of 269 HapMap individuals (www.hapmap.org).
While the number of CNV loci identified by the two platforms was
similar (980 vs. 913), therefore demonstrating that they have
similar sensitivity, there was a more than three-fold difference in
the median size of the CNVs reported by each method (63kb with
the SNP array vs. 228kb with the BAC array). Additional indirect
evidence of the tendency of BAC arrays to overestimate the
boundaries of CNVs comes from the observation that many
dosage-sensitive genes that are essential for normal development
are contained within the putative boundaries of CNVs that have
been assayed by BAC array-CGH in control populations [Redon
et al., 2006; Wong et al., 2007]. While it is possible that known
disease genes undergo polymorphic deletion/duplication, a far
more probable explanation is that most such genes are not in fact
polymorphic in copy number in the general population, but merely
occur in proximity to a CNV whose size has been overestimated by
array-CGH and have thus been aberrantly included within the
putative CNV. The use of techniques with superior resolution,
such as high-density oligonucleotide arrays and paired-end
mapping approaches, will lead to more accurate definition of
CNV boundaries such that these errors can be avoided, and it is
critical that these refinements [Perry et al., 2008] are incorporated
into current CNV maps to improve their accuracy.
Another problem contributing to the overestimation of CNVs
within currently defined maps is the presence of a significant
amount of erroneous data. The presence of experimental artifacts
and the desire to maximize sensitivity for CNV detection when
performing genome-wide screens can often result in a significant
false-positive rate. Although these false positives probably
represent only a minority of reported structural variants, it seems
likely that most, if not all, studies published to date contain some
proportion of sites which represent annotation errors, rather than
real CNVs. Independent validation remains the most effective
method to assign confidence to any putative CNV, and with the
availability of data from multiple different studies and assay types,
overlapping sites that have been reported by more than one
methodology can now be defined with high confidence (Fig. 1). A
systematic effort to validate individual CNVs will be necessary in
the future to minimize the number of falsely annotated sites, and,
as with the overestimation of CNV size, it is important that these
data are used to produce a high-quality next-generation CNV
map.
Contrary to the inevitable overestimates of CNV boundaries
and false data points that exist in current maps of human CNVs, it
is also clear that current structural variation maps are by no means
complete. There are two main reasons for this. First, many
techniques that have been used to date, notably BAC array-CGH
or lower-density oligonucleotide arrays, lack sufficient resolution
to reliably detect CNVs o50kb in size. Second, many studies
have lacked the required depth of sampling, such that they have
insufficient power to detect all common variants with high
confidence.
However, exactly what constitutes a complete map of
structural variation is something of an open-ended question. It is
clear that the size of genetic variants in the human genome is a
continuum, from single-nucleotide differences at one end of the
spectrum to entire chromosomal aneuploidies at the end other.
Although the working definition of a CNV clearly lies somewhere
between these two extremes, this point serves to highlight the fact
that any survey of CNVs is only as comprehensive as the
resolution of the technique used. Given the limited resolution of
the majority of studies published to date, small CNVs are generally
below the reliable detection limit, and are thus underrepresented
in current databases. However, new technologies, such as next-
generation oligonucleotide arrays comprising millions of optimized
probes [Sharp et al., 2007a] and novel paired-end mapping
approaches combined with high-throughput sequencing [Tuzun
et al., 2005; Korbel et al., 2007; Kidd et al., 2008], open the
possibility of reliably detecting CNVs that are one to two orders of
FIGURE 1. Identication of probable false-positive CNVs in cur-
rent maps of structural variation. Four genome-wide studies of
CNVshaveeachassayedthesame 270 individuals fromtheHap-
Map population using dierent methodologies [Conrad et al.,
2006; McCarroll et al., 2006; Locke et al., 2006; Redon et al.,
2006]. As real CNVs above acertainsize thresholdshould bede-
tected by all four of these studies, high-condence CNVs can be
assigned based on concordance between these four datasets. In
contrast, putative CNVs seen in only one study likely represent
false-positive signals. The image shows a screenshot from the
University of California, Santa Cruz (UCSC) Genome Browser
(http://genome.ucsc.edu) showing a 1.5-Mb region of chr22
(hg18, chr22:23,650,000^25,150,000). Note the overlap of the
high-condence CNV with a segmental duplication cluster
[Sharpet al.,2005].
2HUMAN MUTATION 0,1^10,2008
magnitude smaller than those assayed previously. Given that there
is an inverse relationship between the number of structural
variants in the human genome and their size [Korbel et al., 2007],
future work will likely lead to the identification of many thousands
of polymorphic variations which are o10kb in size.
Furthermore, it is currently unclear as to the sampling depth
that is needed to reliably capture the majority of polymorphic
structural variants. Although the classical definition of a
polymorphism is a variant with a frequency of Z1%, it is
becoming clear that there are many CNVs present within the
general population with frequencies lower than this that are
important in human disease. The majority of studies performed to
date have used relatively small sample sizes (no100), with
correspondingly limited power to detect rarer variants. However,
there are now a number of efforts either published [Zogopoulos
et al., 2007] or underway to screen much larger populations
comprising many hundreds or thousands of individuals, including
studies of cohorts in the Wellcome Trust Case Control Consortium
(www.wtccc.org.uk), and the Centre dEtude du Polymorphisme
Humain (CEPH) diversity panel (www.cephb.fr/HGDP-CEPH-
Panel). The completion of high-quality maps from these large
populations will likely yield a much deeper view of the extent and
nature of CNVs in humans, and will provide a much-needed
baseline to allow the assessment of the role of CNVs in human
disease. However, until these data become available, current CNV
maps represent a sliding scale in which the majority of common
large rearrangements have likely been identified, with a trend that
as events become smaller and rarer they are increasingly lacking
from our current annotations.
PROBLEMS IN ASSESSING THE PATHOGENIC
SIGNIFICANCE OFCNVS
Long before array-CGH revealed the wealth of structural
polymorphism that is present in the human genome, G-banding
studies had demonstrated that even some relatively large
cytogenetically-visible variants could be directly transmitted
through pedigrees without any obvious phenotypic effect [re-
viewed in Barber, 2005]. As a result, the established paradigm for
distinguishing pathogenic microdeletions/duplications from benign
CNVs is based on the presumption that pathogenic events will
show segregation of genotype with phenotype. In the majority of
situations in the clinical laboratory, this involves two steps: 1)
consulting with published maps of CNVs to assess whether the
variant identified in a diseased patient has been observed within
the normal population; and 2) testing of parental samples to
determine if a CNV has arisen either de novo in the proband, or
instead has been inherited from a phenotypically normal parent.
However, these tests 1) require the availability of high-resolution
CNV maps derived from sufficiently large control populations such
that all CNVs are determined with sufficient accuracy and
frequency in the general population; and 2) presume that the
mutation in question is fully penetrant.
As discussed above, the incomplete nature and inherent
inaccuracies of current maps of (presumably benign) CNVs that
have been identified in the general population means that this
methodology is imperfect (Fig. 2). In many cases, comparison of
single insertion/deletion events detected in a diseased patient
against a control database comprised of multiple smaller, partially
overlapping CNVs will be inconclusive. Added complexity comes
from the fact that insertions and deletions of the same region may
be differentially tolerated or have different phenotypic outcome,
thus making these comparisons even more difficult to interpret.
FIGURE 2. False-positives and overestimation of CNVs in current
maps of structural variation can mask the presence of patho-
genic rearrangement syndromes. Chromosome band 1q21.1 re-
presents one of the most structurally complex regions of the
genome,containingnumerousassemblygapsandahighcontent
ofsegmentalduplications, manyofwhichshowextremestructur-
al polymorphism. Amalgamation of all published studies of
CNVs in this regionresults in theentire1q21.1band being anno-
tated as structurally variant in the normal population. As a re-
sult, the classical mutation screening paradigm would suggest
that any CNVs found in 1q21.1 in disease patients represent be-
nign polymorphisms. However,1q21.1 is now known to contain
at leasttwo loci (shadedregions) thatundergorecurrentrearran-
gement associated with genetic disease:1) the thrombocytope-
nia-absent radius (TAR) syndrome deletion region [Klopocki
et al., 2007], and 2) the recurrent 1q21.1 deletion/duplication
syndromeregion,whichis associatedwith awiderangeof physi-
cal and cognitive abnormalities [Christiansen et al., 2004; de
Vries et al.,2005; Sharp et al.,2006; Redon et al.,2006; AGPC,
2007; Cheroki et al., 2008] (H. Meord and A. Sharp, unpub-
lishedresults).There aremultiplereasonsunderlying thecurrent
annotation of 1q21.1 as structurally polymorphic: 1) the use of
large insert BAC clones for array-CGH, which tend to overesti-
mate the boundaries of CNVs; 2) the presence of false-positive
CNVs in some datasets; 3) the presence of multiple structural
polymorphisms in this region that have likely been inappropri-
ately merged into single large contiguous CNVs by data analysis
algorithms; 4) the complex paralogous nature of 1q21.1 that re-
sults in the mapping of array and sequence-basedCNVassays to
multiple loci (so calledshadowing eects); and 5) the low pe-
netrance and variableexpressivity of both theTAR syndrome de-
letion and the recurrent 1q21.1 deletion/duplication syndrome,
whicharefrequentlyobservedinpatientswithamildorasympto-
maticclinicalphenotype.Theimageshowsascreenshotfromthe
Database of Genomic Variation (http://projects.tcag.ca/varia-
tion), showing RefSeq genes, putative CNVs 45kb in size, and
segmental duplications 490% identity within1q21.1.
HUMAN MUTATION 0,1^10,20083
Therefore, while studies have suggested an apparent positive
correlation between the size of a CNV and its probability of being
de novo and thus presumably pathogenic [de Vries et al., 2005],
the testing of parental samples generally remains the key step
when attempting to determine the pathogenic significance of a
novel variant. A review of 432 chromosomal imbalances detected
by array-CGH showed that in 37.5% of cases, the same
rearrangement identified in an affected child was also present in
one of the parents [Menten et al., 2006]. However, it is becoming
apparent that for some loci, the distinction between disease-
causing events (generally presumed to be those that are de novo)
and benign variants (traditionally viewed as those that are
inherited from an unaffected parent) is blurring [Bisgaard et al.,
2007]. There are now several examples in the literature of
microdeletion/duplication syndromes that are associated with
highly variable phenotypes:
1. The 200-kb to 500-kb deletions of 1q21.1 associated with
thrombocytopenia-absent radius (TAR) syndrome [Klopocki
et al., 2007]. Although all patients examined to date have
deletion of a common region, only ?25% of these deletions
occur de novo, with the majority instead being inherited from a
phenotypically normal parent.
2. The 1.65-Mb reciprocal deletions and duplications of 16p13.11
associated with autism, mental retardation, and congenital
anomalies [Ullmann et al., 2007; Hannes et al., 2008]. These
deletion and duplication events are observed both de novo, and
in apparently unaffected parents. Testing of a large population
of control individuals showed that the duplication, but not the
deletion, also occurs at an appreciable frequency (?0.3%) in
the general population.
3. Duplications of 22q11.21 (representing the reciprocal event to
the common velocardiofacial syndrome [VCFS] deletion) are
associated with highly variable phenotypes, ranging from severe
mental retardation with congenital anomalies to almost
completely asymptomatic, and are frequently observed to be
inherited from apparently normal individuals [Edelmann et al.,
1999; Ensenauer et al., 2003; Yobb et al., 2005; de la
Rochebrochard et al., 2006].
As a result, the paradigm of causality for CNVs can no longer
always be viewed simply in the context of whether a CNV is
inherited or de novo. While this criterion will likely remain
accurate in the majority of cases, future studies will require
additional methods to draw firm connections between genotype
and phenotype. As illustrated by the examples of TAR syndrome,
duplication/deletion 16p13.11, and duplication 22q11.21, some
genomic disorders show highly variable penetrance that can easily
confound traditional segregation-based studies when viewed on a
case-by-case basis. It is likely that many other similar disorders
await discovery, and unless a more holistic approach is taken, there
is a danger that these may easily be labeled as mere genomic
polymorphisms without phenotypic consequence. The effective
identification of such regions will likely require collaborative efforts
by multiple centers, such that sufficient numbers of patients
carrying the same structural variant can be collected for study. By
drawing conclusions from multiple cases, the power of a study is
markedly increased, making it possible to more accurately define
the associated phenotype, if any, of a particular CNV. On the basis
that multiple individuals with a particular pathogenic variant will
likely show at least some degree of phenotypic concordance even
where penetrance is incomplete, causality can also be inferred
through the use of phenotype-genotype correlations. Finally,
accurate assessment of the prevalence of specific CNVs in large
populations comprising thousands of control individuals will also
be necessary to determine if the variant in question occurs at
significantly increased frequencies in the phenotype under
investigation [Weiss et al., 2008; Hannes et al., 2008].
Anecdotally, CNV studies of large numbers of individuals from
the general population have shown the presence of genomic
rearrangements generally thought to be associated with some
degree of clinical abnormality, further supporting the idea that
many genomic disorders show variable penetrance. These include
rearrangements such as deletions of 15q24 [Sharp et al., 2007b],
deletions of 17p11.2 [Greenberg et al., 1991], duplications of
17p12 [Lupski et al., 1991], and duplications of 22q11.21
[Edelmann et al., 1999] (G. Cooper, personal communication).
These observations have a number of important implications. First,
they call in to question the current presumption that all CNVs
which are annotated in public databases are benign polymorph-
isms. If the phenomenon of variable penetrance is common, it is
quite likely that some CNVs that have been identified in the
general population and have accordingly been labeled as benign
variants may, in some individuals, be associated with a clinical
phenotype. In this regard, it is noteworthy that the Database of
Genomic Variants (http://projects.tcag.ca/variation) includes a
number of loci that were originally ascertained in patients with
unexplained mental retardation, but because they were also
present in a phenotypically normal parent, these CNVs were
considered unrelated to the patients disease [de Vries et al.,
2005]. Some of these loci are quite large (41Mb), and may in fact
represent disease regions associated with incomplete penetrance,
for which further investigation is warranted. Second, the
observation of these rearrangements in the general population
supports the emerging view that that many microdeletion/
duplications can be associated with highly variable, and
often mild (subclinical) phenotypes. And third, they suggest
that the rearrangements which cause these genomic disorders may
be significantly more common than we currently appreciate
[Turner et al., 2008].
A further problem that clouds the interpretation of the
relationship between the de novo status of CNVs and pathogeni-
city is the potentially high rate of new mutations that are observed
at some CNV loci. As the resolution of array-CGH platforms
improves, a greater number of progressively smaller CNVs can be
detected, theoretically giving a greater probability of detecting a
pathogenic change. In this context, the increasing power of new
microarray platforms to detect CNVs represents something of a
double-edged sword. Specifically, as the ascertainment of CNVs
within an individual genome becomes more complete, there is a
danger that the background mutation rate for structural variants
will become routinely detectable. This is because estimates of the
mutation rate for CNVs suggest that any one genome has a
relatively high chance of containing a novel CNV that is not
present in either parent.
In humans, the average mutation rate for single base pair
mutations is estimated to be 2.5?108per nucleotide per
generation [Nachman and Crowell, 2000]. In a 6-Gb diploid
genome, this equates to approximately 150 new SNPs per
individual. While the number of new large-scale structural
mutations per generation will likely be much lower than this,
current evidence suggests that the de novo rate of CNV formation
is still appreciable. Utilizing data collected on the occurrence of
pathogenic CNVs within the Duchenne muscular dystrophy
(DMD) locus and extrapolating this genome-wide, it has been
estimated that the overall rate for de novo CNV formation is
4 HUMAN MUTATION 0,1^10,2008
approximately 0.14 insertion/deletion events per generation [van
Ommen, 2005]. However, as this measure is indirect and does not
take into account the high mutation rates that are observed at
some loci, this number almost certainly represents an under-
estimate of the true situation. More accurate measures of
rearrangement were obtained by recent studies of single sperm
cells, in which the de novo duplication/deletion rate at four
different genomic disorder loci was directly assayed [Turner et al.,
2008]. At each of these sites, all of which are known to undergo
recurrent rearrangement by nonallelic homologous recombination
(NAHR), it was found that the frequency of rearrangement varies
from approximately 6?105for the hereditary neuropathy with
liability to pressure palsies (HNPP)/Charcot-Marie-Tooth disease
type 1A (CMT1A) locus, to 2?106at the Smith-Magenis
syndrome locus. Population data shows that for other sites, e.g.,
the recurrent 22q11.21 deletion associated with VCFS, the
mutation rate can be as high as 2.5?104[Wilson et al., 1994].
Although the number of sites within the genome that will show
similar mutation rates to these disease loci is unknown, it seems
likely that de novo structural rearrangements occur at a significant
frequency within the general population. CNVs that show
hallmarks of recurrent rearrangement, such as breakpoint hetero-
geneity in different individuals and/or low levels of linkage
disequilibrium with flanking SNP markers [Locke et al., 2006;
Redon et al., 2006], are strong candidates to undergo recurrent
mutation.
FACTORS INFLUENCING THE PENETRANCE OF
GENOMIC DISORDERS
The existence of chromosome rearrangements that can be
associated with highly variable phenotypes raises questions as to
potential mechanisms that might underlie this phenomenon. In
addition to purely stochastic or environmental effects, a number of
different genetic mechanisms exist, and are discussed below.
Imprinting
The potential involvement of epigenetics in genomic disorders is
well established at several loci, the best characterized of which is
the Prader-Willi/Angelman syndrome (PWS/AS) region. Here, the
presence of several imprinted transcripts that are expressed
uniparentally means that genomic deletions of this region only
result in disease when transmitted through one parental lineage.
The result of this is that some or all of the phenotypic effects of a
PWS/AS-associated rearrangement can essentially skip genera-
tions in a manner that is dependent on whether the transmission
occurs maternally or paternally, thus presenting with widely
variable penetrance [Ming et al., 2000]. Although the involve-
ment of imprinting in most other genomic disorders is generally
thought to be minimal, it remains possible that imprinted genes
may contribute toward more subtle phenotypes in many other
diseases [Eliez et al., 2001]. However, the current lack of any
comprehensive catalogue of imprinting in the human genome
often means that only gross epigenetic defects can readily be
identified unless sufficiently large populations of patients are
studied in detail. The future development of genome-wide maps of
imprinted genes [Luedi et al., 2007] will likely aid in the
identification of the phenotypic effects of CNVs.
Unmasking of RecessiveAlleles
One result of genomic deletions is that they effectively reduce a
normally diploid genome to a haploid state within the deleted
locus. As such, genomic deletions can result in the unmasking of
otherwise recessive alleles that may remain on the intact homolog.
This mechanism can result in increased phenotypic variability of
deletion variants in a manner that is dependent upon the
haplotype of the single remaining allele, and there are now several
confirmed examples of this in the literature [Lee et al., 1994;
Flipsen-ten Berg et al., 2007]. However, recent investigations of
candidate genes within the 16p13.11 deletion locus associated
with autism/mental retardation [Hannes et al., 2008] and the
1q21.1 locus associated with TAR syndrome [Klopocki et al.,
2007] have failed to uncover any point mutations that might
explain the wide variations in phenotypic outcome associated with
these deletions, suggesting the involvement of alternate mechan-
isms at these loci.
Background GenomicVariation
Several lines of evidence now convincingly show that
polymorphic variation within the genome can influence the
phenotypic outcome of genomic disorders. The use of mouse
models shows that the penetrance of heart defects associated with
VCFS deletions is strongly dependent upon the background strain
in which the deletion occurs [Taddei et al., 2001], suggesting the
existence of genetic modifier loci. Furthermore, in TAR syndrome,
it is postulated that in addition to the common 1q21.1 deletion, a
modifier locus, termed mTAR, dictates the penetrance of the
phenotype [Klopocki et al., 2007]. However, it is unclear if these
putative modifier variants occur genome-wide, or instead might be
contained within the haploid deletion region (as discussed above).
Studies of VCFS patients have clearly demonstrated that at least
for some psychiatric symptoms associated with 22q11.21 deletions,
the latter is true. The presence of a common functional amino acid
polymorphism in the single remaining allele of the COMT (MIM]
116790) gene, an enzyme that metabolizes several key neuro-
transmitters and that is reduced to single-copy in VCFS deletions,
is associated with cognitive decline and the prevalence of
schizophrenia in 22q11.21 deletion carriers [Bearden et al.,
2004; Gothelf et al., 2005]. As such, it seems likely that future
studies using large cohorts of genomic disorder patients to
investigate haplotype variation, either specifically within the
single-copy region associated with genomic deletions, or alter-
natively on a larger genome-wide scale, will reveal further factors
that influence the phenomenon of variable penetrance associated
with CNVs.
METHODS FOR DETERMINING THE CAUSE OFA
GENOMIC DISORDER PHENOTYPE
Despite the identification of an increasing number of genomic
disorders that result in the deletion or duplication of defined
chromosomal regions and cause well-defined phenotypes, the
specific genes responsible for many of these syndromes have often
remained elusive. In the case of any new deletion or duplication
that is identified in association with a given syndrome, it is
generally presumed that the rearrangement acts by altering the
expression of a dosage sensitive gene or genes that are located
within the aneusomic segment. While this represents the most
parsimonious explanation, it is becoming clear that several
alternative mechanisms can also operate in the case of CNVs,
complicating the determination of the specific elements that are
responsible for the phenotype.
A classical technique in disease mapping is the study of multiple
patients with overlapping deletions or duplications to delineate a
common minimal region in which a candidate gene must lie.
Subsequent detailed study of patients with similar phenotypes, but
HUMAN MUTATION 0,1^10,20085318590364834481909701284999228