MapView 3.4.1 (Developed for Windows. But you can also run it on linux use mono : http://www.mono-project.com/)
MapView: visualization of short reads alignment on a desktop computer. Bioinformatics, 2009, 25 (12) : 1554-1555
Download: http://evolution.sysu.edu.cn/mapview/
Visualization of next-generation huge amount of alignment data on desktop computer presents many informatics challenges. The great majority of alignment viewers were designed for loading and processing big assembly file in the ACE format. This memory based design requires huge amount of memory (>10G) not typically available to desktop computer users.
We introduce a new visual analytics tool MapView to facilitate visualization of large-scale short reads alignment data and genetic variation analysis. MapView can handle hundreds of millions of short reads on desktop computer with limited memory. We developed a novel binary file format and fast loading algorithm for superfast (<2s) and memory efficiency (<60M) visualization of huge amount of short reads alignmnet. Moreover, MapView is well established for its multitasking and multithreading. It can process multiple tasks (i.e. SNP detection on whole-genome scale, coverage computation and visualization of alignment) in parallel.
Windows:
Linux:
For Linux, e.g. Ubuntu:
sudo apt-get install libmono-winforms2.0-cil
mono MapView.exe
Computational efficiency comparison
INPUT
1. single-end reads
Preparing the reference sequence in Fasta format and text-based alignment results file(output by Eland, Maq, SOAP, MapNext, SeqMap … …).
(1) If the reference file and alignment results file contains only one reference sequence id, then you can click MVFMaker to input these 2 files and make a MVF format file. Then you can select the MVF file to view alignments and SNPs.
(2) If the reference file or alignment results file contains multiple reference sequence id, then you can click Splitter to split the file into multiple files and one file only contains only one reference id. Then you can follow (1).
(3) If the alignment results file are not Eland, Maq, SOAP or MapNext format, then you can define the format in the file MVFmaker_NewFormat.txt. And when you input reference and alignment results to make MVF file, you could select format <User-defined> (which you defined in the file MVFmaker_NewFormat.txt).
2. Pair-end reads
MapView has preliminary support for pair-end reads. Preparing the reference sequence in Fasta format and pair-end alignment results file (one or two files) from SOAP.(Note: Current version of MapView only support for SOAP’s pair-end output files).
1. Click MVFmaker and choose the checkbox of Paired-end data.
2. Upload the paired alignment results file and the reference file. You can also upload the unpaired alignment data (output by SOAP) at the same time (optional).
3. Click "Save as" button, and specify the .MVF file name.
4. Go back to the main manual and click "Open MVF file" to upload the MVF file you just generated.
3. About text-based alignment results file
(1) MAQ
8:18:1354:1553 chr1 597 - 0 0 99 99 99 0 01 0 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA GGBFPOLQOPLHYRDOCLYM`OO^VSP``YR`_]T
8:22:173:821 chr1 2597 - 0 0 99 99 99 0 0 10 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA NHELOUPVTZUT_WU^```````````````````
8:29:309:1409 chr1 5397 - 0 0 99 99 99 0 01 0 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA HLHOBQCOOEGTP]LNJXVX`````J`````````
(2) SOAP
8:1:3:1697 GTCTAGATATCGCACAATCTTNAATCTTTAAAATG hhhhhhhhhhhhhhhhhhhhh;hhhhhhhhhhhhh 1 a 35 - chr1 1266 0
8:1:3:1804 CCTAGGGTTGATTTAGAAACGNGAGCATTTTGTTG hhhhb^hhhhhhhhhhhhhhh;hhhhhhhhhhhhh 1 a 35 - chr1 208 0
8:1:3:1247 AGGTTAATCCTGCNTATACATGCAGCTCTAATTCA hhhhhhhhhhhhh;hhhhhhhhhhhhhhhhhhh^h 1 a 35 + chr1 122 0
(3) ELAND
I326_2_FC306FCAAXX:8:1:211:212 GATTTATCATGTAAGAGGTTGTCATTCAGAATGGT U0 1 0 0 chr1 4 R
I326_2_FC306FCAAXX:8:1:913:158 GAAAAAGGAACCTCTGCAGACATATCATGCAACTG U0 1 0 0 chr1 164 R
I326_2_FC306FCAAXX:8:1:183:435 GGGAGATTCCCATATCTTTTCCACTTTCCTCTTCC U0 1 0 0 chr1 530 F
(4)User-defined
R A Q C P S x <User-defined> 1 0
[Using TAB as a separator; Symbolic meaning as follows] x: (neglected / pass over) R: Read ID A: ATGC... P: Position Q: Quality score(maybe x) S: Strand(F/R or +/-) C: Chromosome <FormatName> Sort(0/1) Reverse(0/1)
Sort 1 means the alignment position is not sorted. So MapView will sort.
Reverse 1 means the read sequence (-) must be complementary reverse when display.
NOTE: MapView 3.1.2 only support for ungapped alignment.
HELP
1.Main window
Click the nucleotide on short read:
P: Position on reference sequence
Q: Quality score of the nucleotide you clicked
PP: Pair read alignment positon
PD: Pair distance
Click the nucleotide on reference sequence:
Count of A,C,G,T and N
Coverage information
Variant frequency
2.Quality score
Solexa quality score: ASCII code-64
For example 'h' means quality score:40.
Phred quality score: ASCII code-33
For example 'I' means quality score:40.
3.MVR file
The file of SNPs list.
4.SNP detection
The SNP detection will look at each position in the contig to determine if there is a SNP at this position. In order to make a qualified and significant assessment, it needs three thresholds:
(1). Minimum quality of central base. Bases with a quality score below this value are not considered in the SNP calculation at this position.
(2). Minimum coverage. If SNPs were called in areas of low coverage, you would get a higher amount of false positives. Therefore you can set the minimum coverage for a SNP to be called. Note that the coverage is counted as the number of valid reads at the current position (i.e. the reads remaining when the quality assessment has filtered out the bad ones).
(3). Minimum variant frequency. If only one read has a variant base, you probably do not want this to count as a SNP. This threshold is used to determine the minimum frequency for a variant to be called a SNP. Per default, the value is set to 0.4, which means that there should be a variant base in at least 40% of the bases in the valid reads before a SNP is called. Note that if you have two different variants with each having e.g. 20% frequency, it will not be counted as a SNP. If you sequence diploid genomes, you may have to lower this value to detect all SNPs.
MapView: visualization of short reads alignment on a desktop computer. Bioinformatics, 2009, 25 (12) : 1554-1555
Download: http://evolution.sysu.edu.cn/mapview/
Visualization of next-generation huge amount of alignment data on desktop computer presents many informatics challenges. The great majority of alignment viewers were designed for loading and processing big assembly file in the ACE format. This memory based design requires huge amount of memory (>10G) not typically available to desktop computer users.
We introduce a new visual analytics tool MapView to facilitate visualization of large-scale short reads alignment data and genetic variation analysis. MapView can handle hundreds of millions of short reads on desktop computer with limited memory. We developed a novel binary file format and fast loading algorithm for superfast (<2s) and memory efficiency (<60M) visualization of huge amount of short reads alignmnet. Moreover, MapView is well established for its multitasking and multithreading. It can process multiple tasks (i.e. SNP detection on whole-genome scale, coverage computation and visualization of alignment) in parallel.
Windows:
Linux:
For Linux, e.g. Ubuntu:
sudo apt-get install libmono-winforms2.0-cil
mono MapView.exe
Computational efficiency comparison
INPUT
1. single-end reads
Preparing the reference sequence in Fasta format and text-based alignment results file(output by Eland, Maq, SOAP, MapNext, SeqMap … …).
(1) If the reference file and alignment results file contains only one reference sequence id, then you can click MVFMaker to input these 2 files and make a MVF format file. Then you can select the MVF file to view alignments and SNPs.
(2) If the reference file or alignment results file contains multiple reference sequence id, then you can click Splitter to split the file into multiple files and one file only contains only one reference id. Then you can follow (1).
(3) If the alignment results file are not Eland, Maq, SOAP or MapNext format, then you can define the format in the file MVFmaker_NewFormat.txt. And when you input reference and alignment results to make MVF file, you could select format <User-defined> (which you defined in the file MVFmaker_NewFormat.txt).
2. Pair-end reads
MapView has preliminary support for pair-end reads. Preparing the reference sequence in Fasta format and pair-end alignment results file (one or two files) from SOAP.(Note: Current version of MapView only support for SOAP’s pair-end output files).
1. Click MVFmaker and choose the checkbox of Paired-end data.
2. Upload the paired alignment results file and the reference file. You can also upload the unpaired alignment data (output by SOAP) at the same time (optional).
3. Click "Save as" button, and specify the .MVF file name.
4. Go back to the main manual and click "Open MVF file" to upload the MVF file you just generated.
3. About text-based alignment results file
(1) MAQ
8:18:1354:1553 chr1 597 - 0 0 99 99 99 0 01 0 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA GGBFPOLQOPLHYRDOCLYM`OO^VSP``YR`_]T
8:22:173:821 chr1 2597 - 0 0 99 99 99 0 0 10 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA NHELOUPVTZUT_WU^```````````````````
8:29:309:1409 chr1 5397 - 0 0 99 99 99 0 01 0 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA HLHOBQCOOEGTP]LNJXVX`````J`````````
(2) SOAP
8:1:3:1697 GTCTAGATATCGCACAATCTTNAATCTTTAAAATG hhhhhhhhhhhhhhhhhhhhh;hhhhhhhhhhhhh 1 a 35 - chr1 1266 0
8:1:3:1804 CCTAGGGTTGATTTAGAAACGNGAGCATTTTGTTG hhhhb^hhhhhhhhhhhhhhh;hhhhhhhhhhhhh 1 a 35 - chr1 208 0
8:1:3:1247 AGGTTAATCCTGCNTATACATGCAGCTCTAATTCA hhhhhhhhhhhhh;hhhhhhhhhhhhhhhhhhh^h 1 a 35 + chr1 122 0
(3) ELAND
I326_2_FC306FCAAXX:8:1:211:212 GATTTATCATGTAAGAGGTTGTCATTCAGAATGGT U0 1 0 0 chr1 4 R
I326_2_FC306FCAAXX:8:1:913:158 GAAAAAGGAACCTCTGCAGACATATCATGCAACTG U0 1 0 0 chr1 164 R
I326_2_FC306FCAAXX:8:1:183:435 GGGAGATTCCCATATCTTTTCCACTTTCCTCTTCC U0 1 0 0 chr1 530 F
(4)User-defined
R A Q C P S x <User-defined> 1 0
[Using TAB as a separator; Symbolic meaning as follows] x: (neglected / pass over) R: Read ID A: ATGC... P: Position Q: Quality score(maybe x) S: Strand(F/R or +/-) C: Chromosome <FormatName> Sort(0/1) Reverse(0/1)
Sort 1 means the alignment position is not sorted. So MapView will sort.
Reverse 1 means the read sequence (-) must be complementary reverse when display.
NOTE: MapView 3.1.2 only support for ungapped alignment.
HELP
1.Main window
Click the nucleotide on short read:
P: Position on reference sequence
Q: Quality score of the nucleotide you clicked
PP: Pair read alignment positon
PD: Pair distance
Click the nucleotide on reference sequence:
Count of A,C,G,T and N
Coverage information
Variant frequency
2.Quality score
Solexa quality score: ASCII code-64
For example 'h' means quality score:40.
Phred quality score: ASCII code-33
For example 'I' means quality score:40.
3.MVR file
The file of SNPs list.
4.SNP detection
The SNP detection will look at each position in the contig to determine if there is a SNP at this position. In order to make a qualified and significant assessment, it needs three thresholds:
(1). Minimum quality of central base. Bases with a quality score below this value are not considered in the SNP calculation at this position.
(2). Minimum coverage. If SNPs were called in areas of low coverage, you would get a higher amount of false positives. Therefore you can set the minimum coverage for a SNP to be called. Note that the coverage is counted as the number of valid reads at the current position (i.e. the reads remaining when the quality assessment has filtered out the bad ones).
(3). Minimum variant frequency. If only one read has a variant base, you probably do not want this to count as a SNP. This threshold is used to determine the minimum frequency for a variant to be called a SNP. Per default, the value is set to 0.4, which means that there should be a variant base in at least 40% of the bases in the valid reads before a SNP is called. Note that if you have two different variants with each having e.g. 20% frequency, it will not be counted as a SNP. If you sequence diploid genomes, you may have to lower this value to detect all SNPs.
Comment