Table of Contents : Data Sources :
Data source list | Genomic Variants | ClinVar | COSMIC | Orthologue alignments | Mastermind

Orthologue Alignments


By default, orthologues aligned and displayed in Alamut® Visual are taken from the Ensembl Compara database. So far (March 2010), the only non-Ensembl-based alignments available were the manually-curated alignments of ATM NM_000051 and U82828, provided by Tavtigian et al. (2009) and BRCA1 NM_007294, provided by IARC with Align GVGD.

Building new alignments

Although Ensembl Compara is a very valuable information source, manual selection of orthologues and alignment curation are necessary for optimal missense interpretation and for scoring systems like SIFT, Polyphen, and Align GVGD.

This is why we have designed a semi-automatic procedure for orthologue alignment construction, briefly described here (Deforche and Blavier 2010).

Orthologous sequences are searched with the BlastP program, first against the Uniprot/Swissprot database. If sequences of distant species are not found, BlastP is then run against the Refseq database, and finally against the NCBI non-redundant protein sequence database, if needed.

The set of orthologues is then filtered manually, based on sequence length, identity with the human sequence, and available annotations.

Selected sequences are then aligned with M-Coffee, a meta-multiple sequence alignment program.

In order to adjust the alignment depth, two quality criteria are calculated. These criteria are based on work published by Tavtigian et al. (2008, 2009) and on recommendations published on the SIFT web site (Sorting Intolerant From Tolerant, a variant classification system). Correct alignments should contain on average three substitutions per position, and the median information content should be less than or equal to 3.25. If the alignment does not satisfy the quality criteria, sequences creating large gaps are removed, and new sequences are added if needed to raise information content.

As a final step, alignments are optimized manually.

Available manually-curated alignments (last update: 17 March 2016)

GeneTranscriptOrigin
ABCA4NM_000350.2Interactive Biosoftware
ACVRL1NM_000020.2Interactive Biosoftware
ADCK3NM_020247.4Interactive Biosoftware
APCNM_000038.4Interactive Biosoftware
APCNM_000038.5Interactive Biosoftware
APCNM_001127510.2Interactive Biosoftware
ATMNM_000051.3Interactive Biosoftware
ATMU82828.1Interactive Biosoftware
BBS1NM_024649.4Interactive Biosoftware
BRCA1NM_007294.2IARC
BRCA1NM_007294.3IARC
BRCA2NM_000059.3IARC
BSCL2NM_001122955.2Interactive Biosoftware
BSCL2NM_001122955.3Interactive Biosoftware
C3NM_000064.2Interactive Biosoftware
C3NM_000064.3Interactive Biosoftware
CDKN2ANM_058195.2Interactive Biosoftware
CDKN2ANM_058195.3Interactive Biosoftware
CHD7NM_017780.2Interactive Biosoftware
CHD7NM_017780.3Interactive Biosoftware
COL3A1NM_000090.3Interactive Biosoftware
CYP21A2NM_000500.5Interactive Biosoftware
CYP21A2NM_000500.7Interactive Biosoftware
DEPDC5NM_001242896.1Interactive Biosoftware
DMDNM_004006.2Interactive Biosoftware
DYNC2H1NM_001080463.1Interactive Biosoftware
ENGNM_001114753.1Interactive Biosoftware
EYSNM_001142800.1Interactive Biosoftware
F8NM_000132.3Interactive Biosoftware
FAT4NM_024582.4Interactive Biosoftware
GATA2NM_001145661.1Interactive Biosoftware (GRCh38 LRG_295)
GATA2NM_032638.4Interactive Biosoftware (GRCh38 LRG_295)
GATA2NM_001145662.1Interactive Biosoftware (GRCh38)
GCKNM_000162.3Interactive Biosoftware
GCKNM_033507.1Interactive Biosoftware
GJB2NM_004004.5Interactive Biosoftware
GLANM_000169.2Interactive Biosoftware
HBBNM_000518.4Interactive Biosoftware
KCNQ1NM_000218.2Interactive Biosoftware
KCTD7NM_153033.1Interactive Biosoftware
KCTD7NM_153033.4Interactive Biosoftware
KITNM_000222.2Interactive Biosoftware
KRASNM_033360.2Interactive Biosoftware
L1CAMNM_000425.3Interactive Biosoftware
L1CAMNM_000425.4Interactive Biosoftware
LDLRNM_000527.3Interactive Biosoftware
LDLRNM_000527.4Interactive Biosoftware
LMNANM_170707.2Interactive Biosoftware
LMNANM_170707.3Interactive Biosoftware
MAP3K14NM_003954.3Interactive Biosoftware
MAP3K14NM_003954.4Interactive Biosoftware
MECP2NM_001110792.1Interactive Biosoftware
MEN1NM_000244.3Interactive Biosoftware
MLH1NM_000249.2IARC
MLH1NM_000249.3IARC
MSH2NM_000251.1IARC
MSH2NM_000251.2IARC
MSH6NM_000179.1IARC
MSH6NM_000179.2IARC
MUTYHNM_001128425.1Interactive Biosoftware
MYBPC3NM_000256.3Interactive Biosoftware
MYH7NM_000257.2Interactive Biosoftware
MYL2NM_000432.3Interactive Biosoftware
NEFLNM_006158.2Interactive Biosoftware
NEFLNM_006158.3Interactive Biosoftware
NEFLNM_006158.4Interactive Biosoftware
NF1NM_001042492.2Interactive Biosoftware
NOTCH3NM_000435.2Interactive Biosoftware
NRXN1NM_001135659.1Interactive Biosoftware
ORC1NM_004153.3Interactive Biosoftware
PKD1L33243.1Interactive Biosoftware
PKD1NM_001009944.2Interactive Biosoftware
PKP2NM_004572.3Interactive Biosoftware
PMS2NM_000535.5IARC
RB1NM_000321.2Interactive Biosoftware
RECQL4NM_004260.3Interactive Biosoftware
SCN1AAB093548.1Interactive Biosoftware
SCN1ANM_001165963.1Interactive Biosoftware
SCN5ANM_198056.2Interactive Biosoftware
SDHBNM_003000.2Interactive Biosoftware
SH3TC2NM_024577.3Interactive Biosoftware
SMCHD1NM_015295.2Interactive Biosoftware
SPRED1NM_152594.2Interactive Biosoftware
SRD5A2NM_000348.3Interactive Biosoftware
TARDBPNM_007375.3Interactive Biosoftware
TFR2NM_003227.3Interactive Biosoftware
TP53NM_000546.4Interactive Biosoftware
TP53NM_000546.5Interactive Biosoftware
TTNNM_133378.4Interactive Biosoftware
TTNNM_001256850.1 Interactive Biosoftware
VHLNM_000551.3Interactive Biosoftware
WNK1NM_018979.2Interactive Biosoftware

We intend to add new alignments for the most frequently studied genes regularly. Should you wish a new alignment for a specific gene not in the above list, please send us a request at .

Acknowledgments

We would like to express our thanks to the Genetic Cancer Susceptibility Group at IARC for their kind help in defining our alignment protocol.

References

Tavtigian, SV., Greenblatt, MS., Lesueur, F., Byrnes, GB. (2008). In silico analysis of missense substitutions using sequence-alignment based methods. Hum Mutat.11 : 1327-36

Tavtigian, SV., Oefner, PJ., Babikyan, D. et al (2009). Rare, evolutionarily unlikely missense substitutions in ATM confer increased risk of breast cancer. Am J Hum Genet. 85 : 427-46.

Deforche A., Blavier A. (2010). Systematic Building of Multiple Protein Alignments for Variant Interpretation Human Genome Meeting poster.


© 2020 Interactive Biosoftware - Last modified: 30 December 2017