Correlagen Diagnostics Logo
Upper Lower Corner Blank Image Upper Right Corner
Blank Image
 

 

More Information on Correlagen's RightReport™

print

 

DNA Sequence Analysis Method

Correlagen uses two different DNA sequencing methods in its assays, known as Sanger sequencing and next-generation sequencing-by-synthesis. Each requires a different method for preparing the patient DNA for sequencing, and each has its own strengths and limitations (see below). Sanger sequencing is currently considered the “gold standard” for sequence analysis because of its high sensitivity for detecting most small variations and its low false positive rate. However, Sanger sequencing is best suited for analyzing small numbers of genes, for a “narrow and deep” genetic analysis. In contrast, next-generation sequencing offers the capacity for analyzing very large numbers of genes simultaneously, thereby increasing the clinical sensitivity of a test, although with currently somewhat lower analytical sensitivity– for a “broader and more shallow” genetic analysis. Next-generation sequencing, at the moment, shows lower senstivity for detecting variants such as deletion and/or insertion variants spanning several nucleotides or variants that occur in duplicated gene regions. For substitution variants, however, which account for the vast majority of variants found in clinical sequencing tests, sensitivity of next-generation sequencing is close to that of Sanger sequencing. In addition, sensitivity limitations of next-generation sequencing are mostly based on current data analysis methods and may be overcome by re-analysis of existing data sets once the analysis methods have further evolved. Improved data analysis methods may then also allow detection of variants that cannot be detected by Sanger sequencing, such as very large deletion and/or insertion variants (copy-number variants) or variants present in only a subset of gene copies tested (mosaic or somatic variants).

For the index patient, sequence is typically determined for all coding exons of a gene that are represented in the most prevalent mRNA isoform or, if known, the mRNA isoform most relevant to the expression of the disease phenotype. For genes with more than one exon, flanking intronic sequences containing the two highly conserved splice sites are also analyzed. Additional intronic sequences and untranslated (UTR) sequences upstream and downstream of the first and last, respectively, coding exon are generally, but not always also analyzed, unless they are known to contain pathogenic variants When confirming presence or absence of certain variants, eg, for the purpose of family testing, sequencing may be limited to specific amplicons.

 

Sanger Sequencing

In Sanger sequencing, the region of interest is broken up into small units of 200 to 600 base pairs (usually corresponding to exons), and each unit is enriched and sequenced separately from all other units. Enrichment is achieved by employing a technique known as polymerase chain reaction (PCR) that leads to exponential amplification of DNA sequences located between two specific oligonucleotides (primers). The PCR products (amplicons) are then sequenced bi-directionally by extending primers bound to the end of the PCR product, randomly terminating the extension process at every possible position in the extension product through using a mixture of normal nucleotides and fluorescently labeled “chain-terminator” nucleotides, and determining the length of each extension product and the nature (A, G, T, or C) of the terminal base at that length. Occasionally, the sequence can only be determined in one direction, since small deletions or insertions at the beginning of an extension product render the downstream sequence uninterpretable.

Return to top

 

Limitations of Sanger Sequencing

  • The method does not allow any conclusion as to whether a heterozygous variant is present on the maternal or the paternal chromosome copy. For this reason, the DNA sequence analysis performed here cannot determine if two different heterozygous variants are located on the same or on different chromosome copies. For the purposes of this report, two different heterozygous variants are "by default" assumed to be located on separate chromosome copies. Parent testing can be used to determine if two different heterozygous variants are located on the same or on different chromosome copies. If both variants are inherited from the same parent, they are likely to be located on the same chromosome copy. If each variant is inherited from a different parent, they are likely to be located on different chromosome copies.


  • The method does not reliably detect mosaic variants. The sequencing output reflects the sum of all sequence versions present in the PCR product. Presence of a variant in only some of the templates will lead to a mixed-base signal at the variant position. While heterozygous variants, which are present in about half of all templates, can be detected with 99% reliability by the software algorithm used, mosaic variants, which could be present in only a small proportion of templates, may or may not be detected.


  • The method cannot detect large deletions. If one or both of the primer binding sites for an amplicon are deleted from a template, eg, as part of a large deletion, the amplicon cannot be generated from this template. If all template versions carry the deletion, no PCR product will be generated. If, for example, only the template derived from the maternal chromosome copy carries the deletion, the amplicon can still be generated from the paternal chromosome copy. The only indication that sequence was derived only from the paternal chromosome copy would be that all variants detected on that amplicon would appear homozygous in the final sequence output. The prevalence of large deletions varies widely between genes.


  • The method cannot detect large duplications, inversions, or other re-arrangements. Re-arrangements that disrupt an amplicon will not serve as a template during PCR. Re-arrangements,such as inversions, that preserve an amplicon will not affect generation of the PCR product in any way and will therefore not be detectable through the sequencing output. Duplications will also not be detected, since the sequence output does not allow any conclusion about the number of template copies present.


  • The method is affected by allele-dropout. If a template contains a variant in a primer binding site for an amplicon, the amplicon cannot be generated from this template. If, for example, the template derived from the maternal chromosome copy carries a variant in a primer binding site, the amplicon can only be generated from the paternal chromosome copy. In this case, the only indication that the PCR product and the sequence derived from it reflect only the paternal chromosome copy would be that all variants detected on that amplicon would appear homozygous in the final sequence output. Allele dropout should be a rare event, since primer binding sites are specifically chosen not to cover any known variant location.


  • The method may not be able to determine the exact numbers of T/A or microsatellite repeats. During PCR and/or during the sequencing reaction, the polymerase may slip on a long stretch of T’s (or A’s) or a microsatellite (such as a CA repeat) in the template, leading to a variable number of T’s (or A’s) or a variable number of microsatellite-repeats in the sequence output. Such slippage may prevent accurate determination of the number of T’s (or A’s) or of microsatellite repeats actually present in the template.

Return to top

 

 

Sequence Variant Naming

Correlagen numbers and names all variants relative to the human reference sequence published by http://genome.ucsc.edu in March of 2006 (hg18) and according to the system suggested by the Human Genome Variation Society (http://www.hgvs.org/rec.html), regardless of whether the cited publication does or does not adhere to this convention. According to the HGVS system, the start of the coding sequence (ie, the "A" of the start codon ATG) is designated as +1. All coding nucleotides, ie, all exonic nucleotides, in the designated mRNA isoform are numbered consecutively. Intronic nucleotides are numbered relative to the nearest exonic nucleotide.

Variant Numbering
Exon 1 Intron 1 Exon 2 Intron 2 Exon 3
5'UTR   5'UTR Met Glu   Val stop 3'UTR
G A G G T A G G T A T G G A G G T A G G T A T G A G A
-5 -4 -3 -3+1 -3+2 -2-2 -2-1 -2 -1 1 2 3 4 5 6 6+1 6+2 7-2 7-1 7 8 9 10 11 12 13 14

Sequence variants are named according to the change they cause in the DNA sequence. The most common types of changes are:

  • Substitutions of one nucleotide for another nucleotide (eg, c.3G>C).
  • Deletions of one or more nucleotides (eg, c.4_6delGAG).
  • Insertions of one or more nucleotides (eg, c.4_5insT).
  • Substitutions of a group of nucleotides for a group of different nucleotides, where the number of deleted and inserted nucleotides can be different (eg, c.4_6delinsT).

Please click here for a more detailed description of the numbering and naming rules used by Correlagen.

Mutation types reflect the predicted effect of a variant on the mRNA or the protein level. The most common mutation types are:

  • Splice-site mutations destroy an existing splice site or create a new splice site. Both types of variations can lead to altered mRNA processing and a dramatically different mature mRNA sequence, which translates into a dramatically different protein sequence.


  • Nonsense mutations introduce a stop codon in the middle of the coding region, leading to truncation of the protein. Nonsense mutations are commonly caused by a single-nucleotide substitution, as shown in the example below:


  • G G G T T G A A A A C A G C G
    Glycine Leucine Lysine Threonine Alanine
    G G G T A G A A A A C A G C C
    Glycine stop      


  • Missense mutations change one amino acid in the protein into another. Missense mutations are commonly caused by a single-nucleotide substitution, as shown in the example below:

  • G G G C T T A A A A C A G C G
    Glycine Leucine Lysine Threonine Alanine
    G G G C C T A A A A C A G C C
    Glycine Proline Lysine Threonine Alanine


  • Synonymous mutations do not cause a change in the amino acid sequence of the protein:

  • G G G C T T A A A A C A G C G
    Glycine Leucine Lysine Threonine Alanine
    G G G C T C A A A A C A G C C
    Glycine Leucine Lysine Threonine Alanine


  • Frameshift mutations cause a shift in reading frame, leading to a complete change of the amino acid sequence downstream of the frameshift site. Since stop codons tend to be enriched in the two unused reading frames, frameshift mutations often lead to truncation of the protein. A frameshift mutation is caused by a net deletion or net insertion of a number of nucleotides not divisible by 3. Of note, the amino acid sequence may not change until several amino acids downstream of the actual frameshift site, as shown in the example below:

  • G G G C T T A A A A C A G C G
    Glycine Leucine Lysine Threonine Alanine
    G G   C T T A A A A C A G C C
    Glycine Leucine Lysine Glutamine Arg...


  • In-frame deletions and/or insertions lead to deletion and/or insertion of one or more amino acids from/into the protein. In-frame deletions and/or insertions do not alter the reading frame and therefore do not change the amino acid sequence downstream of the deletion and/or insertion site. In-frame deletions and/or insertions may or may not lead to a missense mutation, as shown in the example of a 3-nucleotide insertion (GCA) below:

  • G G G C T T A A A A C A G C G
    Glycine Leucine Lysine Threonine Alanine
    G G G C G C A T T A A A A C A
    Glycine Arginine Isoleucine Lysine Threonine


Return to top

 

Correlagen’s Variant Scoring Method

 

Meaning of Correlagen’s Variant Scores:

Correlagen’s variant scores reflect the probability of association with monogenic disease as well as the strength of the supporting data, ie, the confidence that the score is correct (see Figure 1).

The variant scores do not reflect severity of disease. The variant scores also do not reflect the probability of association with disease in an oligogenic or polygenic rather than monogenic manner. In other words, a variant score of "unlikely to be associated" does not exclude the possibility that a variant may "weakly" contribute to a disease in association with several other variants in the same or different genes.

Monogenic Disease Probability Graph

Figure 1

Correlagen’s variant scores are based on Correlagen’s scoring algorithms and may differ from the variant scores proposed by the authors of a publication. To request detailed information on how the score for a specific variant found in your patient’s sample was derived, please call Correlagen at 1-866-647-0735.

Return to top

 

How Are Correlagen’s Variant Scores Determined:

Correlagen’s variant scores are based on the following considerations (summarized in Figure 2):

Has the variant been observed in the general population or normal controls? If a variant is observed more frequently in the general population than is compatible with the prevalence and mode of inheritance of the disease, then this variant is assumed to be non-pathogenic. Data for variant frequency in the general population are derived from dbSNP (NCBI EntrezGene), from publications,2 or from prevalence studies conducted at Correlagen. (2 peer-reviewed English-language publications listed in PubMed http://www.ncbi.nlm.nih.gov/gene)

Has the variant been observed in affected individuals? If a variant is observed only in diseased individuals and not in the general (healthy) population, it is assumed to be at least possibly associated with disease. The probability of association depends on such parameters as the number of diseased individuals with the variant and the consistency of co-migration of variant and disease within families. Data for variant frequency in affected individuals are derived from the peer-reviewed published literature.

What effect does the variant have in a controlled experimental system? A significant effect of a variant on the synthesis, cellular location, and/or the function of the encoded protein in an experimental system suggests that the variant is pathogenic. While experimental systems can provide powerful information, the results must also be seen with caution, since an experimental environment lacks many of the complexities of the actual in-vivo environment. Data for variant effect in an experimental system are derived from the peer-reviewed published literature.

What is the predicted effect of the variant on synthesis and/or function of the encoded protein3? If the variant leads to truncation of the gene product due a nonsense mutation or a frameshift mutation, it is assumed to be pathogenic (for diseases related to loss-of-function mutations). If the variant affects one of the highly conserved donor or acceptor splice sites, it is predicted to lead to exon skipping and is assumed to be pathogenic. If the variant leads to a missense variant, it is considered to be possibly pathogenic, in absence of other information. If the variant is located in the coding region away from exon/intron junctions and does not lead to a change in the amino acid sequence (synonymous variant), it is considered unlikely to be pathogenic, although pathogenicity cannot be excluded. If a variant is located in the coding sequence close to an exon/intron junction or in an intron away from the exon/intron junction, its effect cannot be predicted, and it is classified as a variant of unknown significance.

3 Predictions are based on the mRNA isoform chosen for reporting. Often, the same gene sequence can give rise to several mRNA isoforms. For genes with many exons, different mRNA isoforms may contain sequence from different permutations of exons. The exons reflected in a particular mRNA isoform define the actual coding region and thus the predicted effect of a variant on protein synthesis and/or function.

A number algorithms (eg, SIFT, PolyPhen, Align-GVGD) have been developed in an attempt to predict the impact of a missense variant on protein function. Prediction algorithms are typically based on evolutionary conservation, structure of the protein at the site of the variant, and/or amino acid properties. While Correlagen routinely uses such algorithms for variant evaluation, it does not base the variant score on a prediction from any single one of the algorithms, since their specificity and sensitivity are limited and their predictions frequently contradict each other.

Additional considerations for scoring include co-occurrence of a variant with known pathogenic variants, occurrence of a variant in mutually exclusive disease phenotypes, predicted effect of synonymous variants on splicing, and certain gene-specific and/or disease-specific properties.

Variant Scoring Flow

Figure 2

Return to top

 

What is the Difference Between the Variant Score and the Result Interpretation?

The variant score reflects the relationship of an individual variant to a disease phenotype. The result interpretation considers the variant scores of the two most significant variants in the context of gene/disease-specific-parameters (eg, mode of inheritance) and patient-specific parameters (eg, variant zygosity and patient sex). Two variants are considered, since an autosomal recessive disease may be caused by a combination of two different heterozygous variants. Significance of a variant is determined by the probability of its association with the test phenotype. Eg, a variant scored as associated is more significant than a variant scored as possibly associated.

The patient symptoms are assumed to be consistent with the test indication. Consequently, if a variant is found that has not previously been reported in association with any disease but that is predicted to be pathogenic, this variant is assumed to be associated with the test phenotype. If a patient’s symptoms are not consistent with the test indication, the interpretation given in the report may be wrong.

The patient’s sex, if not specified, is assumed to be consistent with the primary test indication, unless heterozygous variants in an X-linked gene indicate female sex. This assumption is important in the context of tests that are indicated primarily for one sex only. If this assumption is wrong, the interpretation given in the report may be wrong.

Can a negative result exclude disease in the patient? Unless the patient was tested for a known familial variant, a negative result (absence of pathogenic variants) cannot exclude disease. Instead, testing may have failed to detect the pathogenic sequence variant causing the disease, because

  • Of the limitations inherent to the test method used.
  • The variant is mosaic. Depending on which cell lineage is affected by mosaicism and the ratio of cells containing and not containing the variation, Correlagen’s test methodology may or may not be able to detect a mosaic variant.
  • The variant is located in a gene region not included in the test.
  • The variant is located in a gene not included in the test.

Return to top

 

How Can Parent Testing Help To Interpret the Sequencing Results for a Child?

Parent testing can help to determine if two heterozygous variants are located on the same or on different chromosome copies in the child. If each variant is inherited from a different parent, the variants are likely to be located on different chromosome copies. If both variants are inherited from the same parent, the variants are likely to be located on the same chromosome copy. This information is often important since a recessively inherited disease is only expressed if the variants are located on different chromosome copies, and a dominantly inherited disease is often more severe if the variants are located on different chromosome copies. Information about whether two variants are located on the same or on different chromosome copies is also relevant for genetic counseling or in a situation where expression of the disease is influenced by whether a pathogenic variant was inherited from the father or from the mother.

Return to top

 

How Can Family Testing Help To Interpret the Sequencing Results for a Patient?

If a variant is suspected of being associated with disease, this variant interpretation can be strengthened by detecting the variant in all blood relatives who are affected with the disease but not, or only very rarely, in blood relatives not affected with disease.

Return to top

 

 
Blank Image
Blank Image Lower Right Corner
 
Blank Image