DNA Sequence Analysis Method
The DNA sequencing assay used by Correlagen employs polymerase chain reaction (PCR) and Sanger (dideoxy) sequencing methodologies for variant detection. During PCR, the DNA sequence between two oligonucleotides (primers) is specifically amplified from genomic DNA (template) that has been derived from the patient sample. The PCR products (amplicons) are then sequenced bi-directionally,1 using Sanger sequencing methodology. (Small deletions or insertions render the sequencing output for downstream sequences un-interpretable. If such a small deletion or insertion occurs close to one primer binding site, the sequence for this amplicon can then only be derived from one direction.) For the index patient, sequence is typically determined for all coding exons of a gene that are represented in the most prevalent mRNA isoform or, if known, the mRNA isoform most relevant to the expression of the disease phenotype. For genes with more than one exon, flanking intronic sequences containing the highly conserved splice sites for each exon are also analyzed. When confirming presence or absence of certain variants, eg, for the purpose of family testing, sequencing may be limited to specific amplicons.
Return to top
Limitations of DNA Sequence Analysis Method
While Sanger sequencing is generally considered the “gold standard” for mutation detection, it has a number of limitations.
- The method does not allow any conclusion as to whether a heterozygous variant is present on the maternal or the paternal chromosome copy. For this reason, the DNA sequence analysis performed here cannot determine if two different heterozygous variants are located on the same or on different chromosome copies. For the purposes of this report, two different heterozygous variants are "by default" assumed to be located on separate chromosome copies. Parent testing can be used to determine if two different heterozygous variants are located on the same or on different chromosome copies. If both variants are inherited from the same parent, they are likely to be located on the same chromosome copy. If each variant is inherited from a different parent, they are likely to be located on different chromosome copies.
- The method does not reliably detect mosaic variants. The sequencing output reflects the sum of all sequence versions present in the PCR product. Presence of a variant in only some of the templates will lead to a mixed-base signal at the variant position. While heterozygous variants, which are present in about half of all templates, can be detected with 99% reliability by the software algorithm used, mosaic variants, which could be present in only a small proportion of templates, may or may not be detected.
- The method cannot detect large deletions. If one or both of the primer binding sites for an amplicon are deleted from a template, eg, as part of a large deletion, the amplicon cannot be generated from this template. If all template versions carry the deletion, no PCR product will be generated. If, for example, only the template derived from the maternal chromosome copy carries the deletion, the amplicon can still be generated from the paternal chromosome copy. The only indication that sequence was derived only from the paternal chromosome copy would be that all variants detected on that amplicon would appear homozygous in the final sequence output. The prevalence of large deletions varies widely between genes.
- The method cannot detect large duplications, inversions, or other re-arrangements. Re-arrangements that disrupt an amplicon will not serve as a template during PCR. Re-arrangements,such as inversions, that preserve an amplicon will not affect generation of the PCR product in any way and will therefore not be detectable through the sequencing output. Duplications will also not be detected, since the sequence output does not allow any conclusion about the number of template copies present.
- The method is affected by allele-dropout. If a template contains a variant in a primer binding site for an amplicon, the amplicon cannot be generated from this template. If, for example, the template derived from the maternal chromosome copy carries a variant in a primer binding site, the amplicon can only be generated from the paternal chromosome copy. In this case, the only indication that the PCR product and the sequence derived from it reflect only the paternal chromosome copy would be that all variants detected on that amplicon would appear homozygous in the final sequence output. Allele dropout should be a rare event, since primer binding sites are specifically chosen not to cover any known variant location.
- The method may not be able to determine the exact numbers of T/A or microsatellite repeats. During PCR and/or during the sequencing reaction, the polymerase may slip on a long stretch of T’s (or A’s) or a microsatellite (such as a CA repeat) in the template, leading to a variable number of T’s (or A’s) or a variable number of microsatellite-repeats in the sequence output. Such slippage may prevent accurate determination of the number of T’s (or A’s) or of microsatellite repeats actually present in the template.
Return to top
Sequence Variant Naming
Correlagen numbers and names all variants relative to the human reference sequence published by http://genome.ucsc.edu in March of 2006 (hg18) and according to the system suggested by the Human Genome Variation Society (http://www.genomic.unimelb.edu.au/mdi/mutnomen), regardless of whether the cited publication does or does not adhere to this convention. According to the HGVS system, the start of the coding sequence (ie, the "A" of the start codon ATG) is designated as +1. All coding nucleotides, ie, all exonic nucleotides, in the designated mRNA isoform are numbered consecutively. Intronic nucleotides are numbered relative to the nearest exonic nucleotide.
| Variant Numbering |
| Exon 1 |
Intron 1 |
Exon 2 |
Intron 2 |
Exon 3 |
| 5'UTR |
|
5'UTR |
Met |
Glu |
|
Val |
stop |
3'UTR |
| G |
A |
G |
G |
T |
A |
G |
G |
T |
A |
T |
G |
G |
A |
G |
G |
T |
A |
G |
G |
T |
A |
T |
G |
A |
G |
A |
| -5 |
-4 |
-3 |
-3+1 |
-3+2 |
-2-2 |
-2-1 |
-2 |
-1 |
1 |
2 |
3 |
4 |
5 |
6 |
6+1 |
6+2 |
7-2 |
7-1 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
Sequence variants are named according to the change they cause in the DNA sequence.
The most common types of changes are:
- Substitutions of one nucleotide for another nucleotide (eg, c.3G>C).
- Deletions of one or more nucleotides (eg, c.4_6delGAG).
- Insertions of one or more nucleotides (eg, c.4_5insT).
- Substitutions of a group of nucleotides for a group of different nucleotides,
where the number of deleted and inserted nucleotides can be different (eg, c.4_6delinsT).
Please click here for a more detailed description of the
numbering and naming rules used by Correlagen.
Mutation types reflect the predicted effect of a variant on the mRNA or the protein level.
The most common mutation types are:
- Splice-site mutations destroy an existing splice site or create a new splice site.
Both types of variations can lead to altered mRNA processing and a dramatically different mature mRNA sequence,
which translates into a dramatically different protein sequence.
- Nonsense mutations introduce a stop codon in the middle of the coding region,
leading to truncation of the protein. Nonsense mutations are commonly caused by a single-nucleotide substitution,
as shown in the example below:
| G |
G |
G |
T |
T |
G |
A |
A |
A |
A |
C |
A |
G |
C |
G |
| Glycine |
Leucine |
Lysine |
Threonine |
Alanine |
| G |
G |
G |
T |
A |
G |
A |
A |
A |
A |
C |
A |
G |
C |
C |
| Glycine |
stop |
|
|
|
- Missense mutations change one amino acid in the protein into another.
Missense mutations are commonly caused by a single-nucleotide substitution, as shown in the example below:
| G |
G |
G |
C |
T |
T |
A |
A |
A |
A |
C |
A |
G |
C |
G |
| Glycine |
Leucine |
Lysine |
Threonine |
Alanine |
| G |
G |
G |
C |
C |
T |
A |
A |
A |
A |
C |
A |
G |
C |
C |
| Glycine |
Proline |
Lysine |
Threonine |
Alanine |
- Synonymous mutations do not cause a change in the amino acid sequence of the
protein:
| G |
G |
G |
C |
T |
T |
A |
A |
A |
A |
C |
A |
G |
C |
G |
| Glycine |
Leucine |
Lysine |
Threonine |
Alanine |
| G |
G |
G |
C |
T |
C |
A |
A |
A |
A |
C |
A |
G |
C |
C |
| Glycine |
Leucine |
Lysine |
Threonine |
Alanine |
- Frameshift mutations cause a shift in reading frame, leading to a complete change of
the amino acid sequence downstream of the frameshift site. Since stop codons tend to be enriched in the two unused
reading frames, frameshift mutations often lead to truncation of the protein. A frameshift mutation is caused by a
net deletion or net insertion of a number of nucleotides not divisible by 3. Of note, the amino acid sequence may
not change until several amino acids downstream of the actual frameshift site, as shown in the example below:
| G |
G |
G |
C |
T |
T |
A |
A |
A |
A |
C |
A |
G |
C |
G |
| Glycine |
Leucine |
Lysine |
Threonine |
Alanine |
| G |
G |
|
C |
T |
T |
A |
A |
A |
A |
C |
A |
G |
C |
C |
| Glycine |
Leucine |
Lysine |
Glutamine |
Arg... |
- In-frame deletions and/or insertions lead to deletion and/or insertion of one or more
amino acids from/into the protein. In-frame deletions and/or insertions do not alter the reading frame and therefore
do not change the amino acid sequence downstream of the deletion and/or insertion site. In-frame deletions and/or
insertions may or may not lead to a missense mutation, as shown in the example of a 3-nucleotide insertion
(GCA) below:
| G |
G |
G |
C |
T |
T |
A |
A |
A |
A |
C |
A |
G |
C |
G |
| Glycine |
Leucine |
Lysine |
Threonine |
Alanine |
| G |
G |
G |
C |
G |
C |
A |
T |
T |
A |
A |
A |
A |
C |
A |
| Glycine |
Arginine |
Isoleucine |
Lysine |
Threonine |
Return to top
Correlagen’s Variant Scoring Method
Meaning of Correlagen’s Variant Scores:
Correlagen’s variant scores reflect the probability of association with monogenic disease
as well as the strength of the supporting data, ie, the confidence that the score is correct (see Figure 1).
The variant scores do not reflect severity of disease. The variant scores also do not reflect
the probability of association with disease in an oligogenic or polygenic rather than monogenic manner. In other
words, a variant score of "unlikely to be associated" does not exclude the possibility that a variant may
"weakly" contribute to a disease in association with several other variants in the same or different genes.
Correlagen’s variant scores are based on Correlagen’s scoring algorithms and may differ from
the variant scores proposed by the authors of a publication. To request detailed information on how the score for
a specific variant found in your patient’s sample was derived, please call Correlagen at 1-866-647-0735.
Return to top
How Are Correlagen’s Variant Scores Determined:
Correlagen’s variant scores are based on the following considerations (summarized in Figure 2):
Has the variant been observed in the general population or normal controls? If a variant
is observed more frequently in the general population than is compatible with the prevalence and mode of inheritance
of the disease, then this variant is assumed to be non-pathogenic. Data for variant frequency in the general population
are derived from dbSNP (NCBI EntrezGene), from publications,2 or from prevalence studies conducted at Correlagen.
(2 peer-reviewed English-language publications listed in PubMed
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi)
Has the variant been observed in affected individuals? If a variant is observed only
in diseased individuals and not in the general (healthy) population, it is assumed to be at least possibly
associated with disease. The probability of association depends on such parameters as the number of diseased
individuals with the variant and the consistency of co-migration of variant and disease within families.
Data for variant frequency in affected individuals are derived from the peer-reviewed published literature.
What effect does the variant have in a controlled experimental system? A significant
effect of a variant on the synthesis, cellular location, and/or the function of the encoded protein in an experimental system suggests
that the variant is pathogenic. While experimental systems can provide powerful information, the results must also
be seen with caution, since an experimental environment lacks many of the complexities of the actual in-vivo
environment. Data for variant effect in an experimental system are derived from the peer-reviewed published
literature.
What is the predicted effect of the variant on synthesis and/or function of the encoded
protein3? If the variant leads to truncation of the gene product due a nonsense mutation or a
frameshift mutation, it is assumed to be pathogenic (for diseases related to loss-of-function mutations). If
the variant affects one of the highly conserved donor or acceptor splice sites, it is predicted to lead to exon
skipping and is assumed to be pathogenic. If the variant leads to a missense variant, it is considered to be
possibly pathogenic, in absence of other information. If the variant is located in the coding region away from
exon/intron junctions and does not lead to a change in the amino acid sequence (synonymous variant), it is
considered unlikely to be pathogenic, although pathogenicity cannot be excluded. If a variant is located in the
coding sequence close to an exon/intron junction or in an intron away from the exon/intron junction, its effect
cannot be predicted, and it is classified as a variant of unknown significance.
3 Predictions are based on the mRNA isoform chosen for reporting. Often,
the same gene sequence can give rise to several mRNA isoforms. For genes with many exons, different mRNA
isoforms may contain sequence from different permutations of exons. The exons reflected in a particular mRNA
isoform define the actual coding region and thus the predicted effect of a variant on protein synthesis
and/or function.
A number algorithms (eg, SIFT, PolyPhen, Align-GVGD)4 have been developed in an
attempt to predict the impact of a missense variant on protein function. Prediction algorithms are typically
based on evolutionary conservation, structure of the protein at the site of the variant, and/or amino acid
properties. While Correlagen routinely uses such algorithms for variant evaluation, it does not base the variant
score on a prediction from any single one of the algorithms, since their specificity and sensitivity are limited
and their predictions frequently contradict each other.
4
http://blocks.fhcrc.org/sift/SIFT.html
http://coot.embl.de/PolyPhen/
http://agvgd.iarc.fr/agvgd_input.php
Additional considerations for scoring include co-occurrence of a variant with
known pathogenic variants, occurrence of a variant in mutually exclusive disease phenotypes, predicted
effect of synonymous variants on splicing, and certain gene-specific and/or disease-specific properties.
Return to top
What is the Difference Between the Variant Score and the Result Interpretation?
The variant score reflects the relationship of an individual variant to a disease phenotype.
The result interpretation considers the variant scores of the two most significant variants in the context of
gene/disease-specific-parameters (eg, mode of inheritance) and patient-specific parameters (eg, variant zygosity
and patient sex). Two variants are considered, since an autosomal recessive disease may be caused by a combination
of two different heterozygous variants. Significance of a variant is determined by the probability of its
association with the test phenotype. Eg, a variant scored as associated is more significant than a variant
scored as possibly associated.
The patient symptoms are assumed to be consistent with the test indication. Consequently,
if a variant is found that has not previously been reported in association with any disease but that is predicted to
be pathogenic, this variant is assumed to be associated with the test phenotype. If a patient’s symptoms are not
consistent with the test indication, the interpretation given in the report may be wrong.
The patient’s sex, if not specified, is assumed to be consistent with the primary test
indication, unless heterozygous variants in an X-linked gene indicate female sex. This assumption is important
in the context of tests that are indicated primarily for one sex only. If this assumption is wrong, the
interpretation given in the report may be wrong.
Can a negative result exclude disease in the patient?
Unless the patient was tested for a known familial variant, a negative result (absence of pathogenic variants) cannot exclude disease. Instead, testing may have failed to detect the pathogenic sequence variant causing the disease, because
- Of the limitations inherent to the test method used.
- The variant is mosaic. Depending on which cell lineage is affected by mosaicism
and the ratio of cells containing and not containing the variation, Correlagen’s test methodology may or
may not be able to detect a mosaic variant.
- The variant is located in a gene region not included in the test.
- The variant is located in a gene not included in the test.
Return to top
How Can Parent Testing Help To Interpret the Sequencing Results for a Child?
Parent testing can help to determine if two heterozygous variants are located on the same
or on different chromosome copies in the child. If each variant is inherited from a different parent, the
variants are likely to be located on different chromosome copies. If both variants are inherited from the same
parent, the variants are likely to be located on the same chromosome copy. This information is often important
since a recessively inherited disease is only expressed if the variants are located on different chromosome
copies, and a dominantly inherited disease is often more severe if the variants are located on different
chromosome copies. Information about whether two variants are located on the same or on different chromosome
copies is also relevant for genetic counseling or in a situation where expression of the disease is influenced
by whether a pathogenic variant was inherited from the father or from the mother.
Return to top
How Can Family Testing Help To Interpret the Sequencing Results for a Patient?
If a variant is suspected of being associated with disease, this variant interpretation
can be strengthened by detecting the variant in all blood relatives who are affected with the disease but not,
or only very rarely, in blood relatives not affected with disease.
Return to top
Copyright © 2009, Correlagen Diagnostics, Inc. All rights reserved.