Table of Contents : Managing variants :
Entering variants | Variants annotations | Importing variants | Exporting & reporting variants

Importing Variants

Alamut® Visual can import variant annotations from external sources. When variants are successfully imported they are saved as standard Alamut® Visual internal variants and can then be handled like other internal variants.

Alamut® Visual can import variant annotations in cDNA or gDNA coordinates and also in VCF (Variant Calling Format).

Preparing import files

In order to import variants into Alamut® Visual you have to prepare a tabular import file containing variant descriptions and annotations. Import files must follow a precise format that is most easily created using a spreadsheet application like Excel or OpenOffice Calc (for cDNA and gDNA), or generated by a dedicated software for VCF.

cDNA variants files

For cDNA import, the gene specified in the file is supposed to be already loaded within Alamut® Visual.

Here is an example:

Gene Transcript Variant Pathogenic Patient ID Family ID Phenotype Comment

MSH2 NM_000251.1 c.5C>A unknown 1 123 Adam+

MSH2 NM_000251.1 c.15_18del yes 2 456

MSH2 NM_000251.1 c.1276+4A>C Class 3

Gene	Transcript	Variant	Pathogenic	Patient ID	Family ID	Phenotype
MSH2	NM_000251.1	c.5C>A	unknown	1	123	Adam+
MSH2	NM_000251.1	c.15_18del	yes	2	456
MSH2	NM_000251.1	c.1276+4A>C	Class 3

The column order must be strictly observed, however only the first 3 columns are mandatory.

The header line (with column labels) is not mandatory. If it is present then columns must be named as in the example above.

Let's review each column contents:

Gene — The official symbol of the gene carrying the variant.
Transcript — The accession number of the transcript used to describe the variant. The transcript must be known to Alamut® Visual.
Variant — The cDNA-level variant description, using the HGVS nomenclature. The "c." prefix is not mandatory.
Pathogenic — The variant classification: "no", "unknown" or "yes", if using the simple 3-classes scheme; or "Class (1-4)", if using the 5-classes scheme. (If no value is supplied here, "unknown" is assumed.)
Patient ID — Free content field.
Family ID — Free content field.
Phenotype — Free content field.
Comment — Free content field.

gDNA variants files

For gDNA import, the process requires to load within Alamut® Visual each gene holding potentially variants from the file and to import it. Only variants in gene locus will be processed.

Here is an example:

Assembly Chromosome Variant Pathogenic Patient ID Family ID Phenotype Comment

GRCh37 chr2 g.47630335C>A unknown 1 123 Adam+

GRCh37 2 g.47630347_47630350del yes 2 456

GRCh37 chr2 g.47657084A>C Class 3

Assembly	Chromosome	Variant	Pathogenic	Patient ID	Family ID	Phenotype
GRCh37	chr2	g.47630335C>A	unknown	1	123	Adam+
GRCh37	2	g.47630347_47630350del	yes	2	456
GRCh37	chr2	g.47657084A>C	Class 3

The column order must be strictly observed, however only the 3 first columns are mandatory.

The header line (containing column labels) is not mandatory, but it's a good idea to keep it.

Let's review each column contents:

Assembly — GCRh37 or NCBI36.
Chromosome — chrN or N.
Variant — The gDNA-level variant description, using the HGVS nomenclature. The "g." prefix is not mandatory.
Pathogenic — The variant classification: "no", "unknown" or "yes", if using the simple 3-classes scheme; or "Class (1-4)", if using the 5-classes scheme. (If no value is supplied here, "unknown" is assumed.)
Patient ID — Free content field.
Family ID — Free content field.
Phenotype — Free content field.
Comment — Free content field.

VCF (Variant calling Format) files

For VCF import, the process requires also to load within Alamut® Visual each gene holding potentially variants from the file and to import it. Only variants in gene locus will be processed.

Standard VCF files can be imported. Fields are #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT
Only CHROM, POS, REF and ALT are used to set up the variant. ID is also used to set the variant comment.

Creating import files (for cDNA and gDNA only)

When variants have been prepared in a spreadsheet application using the format described above, save the data in tab-delimited text format (e.g. in Excel: File > Save As > Save as type: Text (Tab delimited)(*.txt)).

The import process

To import variants from an import file as described above, in Alamut® Visual:

Open the appropriate gene and, if required, select the transcript to which variants to be imported refer.
Go to menu 'Variants' > 'Import variants...' and specify the import file you have created. The import can process either cDNA, or gDNA or again VCF files, but records in each imported file should be of the same type.

cDNA variants files

As an example, suppose you have prepared a file with gene MSH2 variants.

Before actually importing the variants, Alamut® Visual first analyzes the import file and reports valid and invalid entries as follows:

Import analysis

This example for cDNA import is somewhat contrived in order to highlight a few points:

The first line is rejected because the transcript accession is incomplete (no version number).
The second line is rejected firstly because it refers to gene MLH1 and not MSH2.
The third line is rejected because the variant c.4a>G is not correct (NM_000251.1:c.4 is a G, not a A).
The fourth line is rejected because '4 xG>A' is meaningless.
Two lines refer to variant c.4G>C, describing two occurrences of the variant. This is correct and is the appropriate way to import occurrences.

At this step, if you click 'Import Now' the validated entries will actually get imported. A report then shows up:

Note that entries that don't add new information are marked as redundant.

gDNA variants and VCF files

An analogous process exists for gDNA variants and VCF files. In these cases, only variants in locus are viewed in the import analysis (excluding variants on a different chromosome or with a position out of the scope of the current gene).

This example for gDNA import is somewhat contrived in order to highlight a few points:

The third line is rejected because the assembly is not the current one in use.
The fourth line is rejected because '37050000T' is meaningless.
Three variants (7 − 4) are present in the file but not viewed since they are not in the scope of the current gene.

This example for VCF import shows few variants all matching with dbSNP entries. Only a few hundreds of variants from the file are in the scope of the current gene.