SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie: Ultrafast and memory-efficient alignment of short reads to the human genome Ben Langmead Literature Watch 2 03-04-2013 02:06 AM
converting maq .map or mapview files to SINGLE-END mapview nazanin Bioinformatics 1 01-30-2011 11:02 PM
Slider - Maximum use of probability information for alignment of short sequence reads ECO Bioinformatics 17 09-21-2010 04:35 PM
SAMTools alignment viewer Fred13 Bioinformatics 0 11-30-2009 05:09 AM
Haman genome alignment with short reads ptongyoo Bioinformatics 4 04-14-2009 05:27 PM

Reply
 
Thread Tools
Old 01-10-2009, 04:41 AM   #1
baohua100
Senior Member
 
Location: Canada

Join Date: Jun 2008
Posts: 103
Default MapView: a short reads alignment viewer for genetic variation analysis

MapView 3.4.1 (Developed for Windows. But you can also run it on linux use mono : http://www.mono-project.com/)


MapView: visualization of short reads alignment on a desktop computer. Bioinformatics, 2009, 25 (12) : 1554-1555

http://bioinformatics.oxfordjournals...act/25/12/1554

Download: http://evolution.sysu.edu.cn/mapview/

Visualization of next-generation huge amount of alignment data on desktop computer presents many informatics challenges. The great majority of alignment viewers were designed for loading and processing big assembly file in the ACE format. This memory based design requires huge amount of memory (>10G) not typically available to desktop computer users.

We introduce a new visual analytics tool MapView to facilitate visualization of large-scale short reads alignment data and genetic variation analysis. MapView can handle hundreds of millions of short reads on desktop computer with limited memory. We developed a novel binary file format and fast loading algorithm for superfast (<2s) and memory efficiency (<60M) visualization of huge amount of short reads alignmnet. Moreover, MapView is well established for its multitasking and multithreading. It can process multiple tasks (i.e. SNP detection on whole-genome scale, coverage computation and visualization of alignment) in parallel.

Windows:


Linux:


For Linux, e.g. Ubuntu:

sudo apt-get install libmono-winforms2.0-cil

mono MapView.exe




Computational efficiency comparison



INPUT
1. single-end reads
Preparing the reference sequence in Fasta format and text-based alignment results file(output by Eland, Maq, SOAP, MapNext, SeqMap … …).

(1) If the reference file and alignment results file contains only one reference sequence id, then you can click MVFMaker to input these 2 files and make a MVF format file. Then you can select the MVF file to view alignments and SNPs.
(2) If the reference file or alignment results file contains multiple reference sequence id, then you can click Splitter to split the file into multiple files and one file only contains only one reference id. Then you can follow (1).
(3) If the alignment results file are not Eland, Maq, SOAP or MapNext format, then you can define the format in the file MVFmaker_NewFormat.txt. And when you input reference and alignment results to make MVF file, you could select format <User-defined> (which you defined in the file MVFmaker_NewFormat.txt).

2. Pair-end reads
MapView has preliminary support for pair-end reads. Preparing the reference sequence in Fasta format and pair-end alignment results file (one or two files) from SOAP.(Note: Current version of MapView only support for SOAP’s pair-end output files).

1. Click MVFmaker and choose the checkbox of Paired-end data.
2. Upload the paired alignment results file and the reference file. You can also upload the unpaired alignment data (output by SOAP) at the same time (optional).
3. Click "Save as" button, and specify the .MVF file name.
4. Go back to the main manual and click "Open MVF file" to upload the MVF file you just generated.

3. About text-based alignment results file

(1) MAQ
8:18:1354:1553 chr1 597 - 0 0 99 99 99 0 01 0 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA GGBFPOLQOPLHYRDOCLYM`OO^VSP``YR`_]T
8:22:173:821 chr1 2597 - 0 0 99 99 99 0 0 10 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA NHELOUPVTZUT_WU^```````````````````
8:29:309:1409 chr1 5397 - 0 0 99 99 99 0 01 0 35 GGTGGGACCGTTCGTGAAGGCTGGCCCATTGAGGA HLHOBQCOOEGTP]LNJXVX`````J`````````

(2) SOAP
8:1:3:1697 GTCTAGATATCGCACAATCTTNAATCTTTAAAATG hhhhhhhhhhhhhhhhhhhhh;hhhhhhhhhhhhh 1 a 35 - chr1 1266 0
8:1:3:1804 CCTAGGGTTGATTTAGAAACGNGAGCATTTTGTTG hhhhb^hhhhhhhhhhhhhhh;hhhhhhhhhhhhh 1 a 35 - chr1 208 0
8:1:3:1247 AGGTTAATCCTGCNTATACATGCAGCTCTAATTCA hhhhhhhhhhhhh;hhhhhhhhhhhhhhhhhhh^h 1 a 35 + chr1 122 0

(3) ELAND
I326_2_FC306FCAAXX:8:1:211:212 GATTTATCATGTAAGAGGTTGTCATTCAGAATGGT U0 1 0 0 chr1 4 R
I326_2_FC306FCAAXX:8:1:913:158 GAAAAAGGAACCTCTGCAGACATATCATGCAACTG U0 1 0 0 chr1 164 R
I326_2_FC306FCAAXX:8:1:183:435 GGGAGATTCCCATATCTTTTCCACTTTCCTCTTCC U0 1 0 0 chr1 530 F

(4)User-defined
R A Q C P S x <User-defined> 1 0
[Using TAB as a separator; Symbolic meaning as follows] x: (neglected / pass over) R: Read ID A: ATGC... P: Position Q: Quality score(maybe x) S: Strand(F/R or +/-) C: Chromosome <FormatName> Sort(0/1) Reverse(0/1)
Sort 1 means the alignment position is not sorted. So MapView will sort.
Reverse 1 means the read sequence (-) must be complementary reverse when display.

NOTE: MapView 3.1.2 only support for ungapped alignment.

HELP
1.Main window
Click the nucleotide on short read:
P: Position on reference sequence
Q: Quality score of the nucleotide you clicked
PP: Pair read alignment positon
PD: Pair distance

Click the nucleotide on reference sequence:
Count of A,C,G,T and N
Coverage information
Variant frequency

2.Quality score
Solexa quality score: ASCII code-64
For example 'h' means quality score:40.
Phred quality score: ASCII code-33
For example 'I' means quality score:40.

3.MVR file
The file of SNPs list.

4.SNP detection

The SNP detection will look at each position in the contig to determine if there is a SNP at this position. In order to make a qualified and significant assessment, it needs three thresholds:

(1). Minimum quality of central base. Bases with a quality score below this value are not considered in the SNP calculation at this position.

(2). Minimum coverage. If SNPs were called in areas of low coverage, you would get a higher amount of false positives. Therefore you can set the minimum coverage for a SNP to be called. Note that the coverage is counted as the number of valid reads at the current position (i.e. the reads remaining when the quality assessment has filtered out the bad ones).

(3). Minimum variant frequency. If only one read has a variant base, you probably do not want this to count as a SNP. This threshold is used to determine the minimum frequency for a variant to be called a SNP. Per default, the value is set to 0.4, which means that there should be a variant base in at least 40% of the bases in the valid reads before a SNP is called. Note that if you have two different variants with each having e.g. 20% frequency, it will not be counted as a SNP. If you sequence diploid genomes, you may have to lower this value to detect all SNPs.

Last edited by baohua100; 06-05-2009 at 05:54 PM. Reason: 3.4.0 version
baohua100 is offline   Reply With Quote
Old 01-10-2009, 07:02 AM   #2
baohua100
Senior Member
 
Location: Canada

Join Date: Jun 2008
Posts: 103
Default

if you have any problems or suggestions, please reply here!
baohua100 is offline   Reply With Quote
Old 01-10-2009, 06:26 PM   #3
jfshao1984
Junior Member
 
Location: Hangzhou,China

Join Date: Mar 2008
Posts: 5
Default

MapView sounds usefull.
I find a bug: if the reference file is a fasta format one with 60 nt one line, mapview will add one more nt after each line, and the alignment view will be in disorder. I change the reference sequence into one line, that is to say, all nucleotides are in one line, then the view is correct.
jfshao1984 is offline   Reply With Quote
Old 01-14-2009, 07:28 AM   #4
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Is it still useful for human data? Loading the entire human refseq, or you suggest that I rather upload just a single chromosome and view it..

I have mate-pair data that I wish to visualize; any changes when using mate-pair reads? Can I upload the 2 reads files?

Thanks
bioinfosm is offline   Reply With Quote
Old 01-14-2009, 01:35 PM   #5
dvh
Member
 
Location: london, uk

Join Date: Jul 2008
Posts: 35
Default

As well as the paired-end read Q above, does mapview cope with gapped alignments in each read? (e.g. where maqview, developed alongside maq package, does not).
I would also be interested to hear from Heng Li re SAMtools viewer if any developments.
dvh is offline   Reply With Quote
Old 01-14-2009, 07:41 PM   #6
baohua100
Senior Member
 
Location: Canada

Join Date: Jun 2008
Posts: 103
Default

The reference sequence and alignment result should be contain only one ref seq id. So you'd better load only one chromosome. if your reference sequence or alignment file contain multiple reference seq id , you could use spliter to split the file. Then view separately!

Now we develop the function for viewing pair end data.
baohua100 is offline   Reply With Quote
Old 01-15-2009, 01:00 AM   #7
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

To dvh:

The viewer that comes with samtools displays gapped alignment. It shows whether the read is an orphan but does not show pairing information directly. For now, I do not have any plan to implement a fancy alignment viewer. The current viewer is just in ~300 lines of C code based on the samtools library and I want to keep as it is. I would appreciate if someone else would like to implement a nice viewer. A good viewer will benefit the community and the samtools project as well. In addition, you may also keep an eye on the GAP5 development.

To baohua100:

I think it would be nice if an alignment viewer may have the following features:

* scalability. It should work with huge alignment with limited memory (e.g. 10~100GB compressed alignment)

* portability. At least here at where I am working, Linux and Mac dominate. A Windows-only application would push away many potential users.

* efficiency over network. We prefer to put huge alignments on a supercomputer or a large cluster while viewing the alignment on a small personal desktop or laptop. This will require transfer alignment data/graphics over network. It would be nice to have a built-in server-client mode or alternatively support X11.

* usability. you can learn this from those main-stream assembly viewers such as consed, hawkeye, eagleview and staden/gap4.
lh3 is offline   Reply With Quote
Old 01-15-2009, 11:46 PM   #8
yasutake
Member
 
Location: Japan

Join Date: Sep 2008
Posts: 11
Default

To baofua100,

This must be useful !

I tried to use Mapview but I couldn't understand how to use it.

I'll be happy if you make the tutorial or the manual.

Besides, let me ask some questions.

1. Can map files of MAQ be imported to Mapview directly ?
2. Should I prepare the ref. sequence as a fasta format ?
yasutake is offline   Reply With Quote
Old 01-15-2009, 11:52 PM   #9
baohua100
Senior Member
 
Location: Canada

Join Date: Jun 2008
Posts: 103
Default

to yasutake

you should prepare ref.fasta and map file, and click MVFmaker to make a MVF file, then you select this MVF file to view alignment results.

We are writing the manual now and will put it on the website. And I updated the software and add a guide for how to input files.

You can download from http://evolution.sysu.edu.cn/software/mapview.rar

Last edited by baohua100; 01-16-2009 at 12:12 AM.
baohua100 is offline   Reply With Quote
Old 01-16-2009, 12:14 AM   #10
baohua100
Senior Member
 
Location: Canada

Join Date: Jun 2008
Posts: 103
Default

To lh3

thanks for your suggestions!
baohua100 is offline   Reply With Quote
Old 01-28-2009, 07:42 AM   #11
Aengus
Junior Member
 
Location: london

Join Date: Sep 2008
Posts: 6
Default

Just to clarify regarding maq format files.

You need to run "maq mapview" and then use the resultant mapview files along with the reference sequence in fasta format to build the mapview MVF format file in the MVFMaker. The rest of the interaction is pretty intuitive. Arrow keys move around and the "Fast Positioning" box lets you enter a base position to jump to.

I have had to have a look at mapview as I have a scientist who uses only a PC and I couldnt get maqview-0.2.4 to build under cygwin on the PC.

If there was a cross-platform viewer as Heng Li has said that would be great, and would go some way to alleviating my issues with supporting scientists and their data analysis.
Aengus is offline   Reply With Quote
Old 01-28-2009, 08:03 AM   #12
Aengus
Junior Member
 
Location: london

Join Date: Sep 2008
Posts: 6
Default

Sorry - I hit send too soon.

From my maq mapview data mapview is not showing the quality scores associated with each base. This is quite important to see, especially when evaluating SNPs.
Aengus is offline   Reply With Quote
Old 01-29-2009, 10:17 AM   #13
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default Feature request - Zoom-out

There is no zoom feature on mapview tool? (apart from quality information as in previous post)
bioinfosm is offline   Reply With Quote
Old 01-30-2009, 08:09 AM   #14
Vince_Funari
Junior Member
 
Location: Los Angeles

Join Date: Jan 2009
Posts: 2
Default Paired end data

I am also interested, this looks very useful as many people have had issues lately with recent build from maqview for maq. And this looks like very user friendly tool, although displaying paired end data is critical for these short reads as it really helps the confidence levels of the data alignments.

My question is Does this support Paired end alignments generated by MAQ yet?

thank you
vince
Quote:
Originally Posted by bioinfosm View Post
Is it still useful for human data? Loading the entire human refseq, or you suggest that I rather upload just a single chromosome and view it..

I have mate-pair data that I wish to visualize; any changes when using mate-pair reads? Can I upload the 2 reads files?

Thanks
Vince_Funari is offline   Reply With Quote
Old 02-02-2009, 02:20 PM   #15
unionicola
Junior Member
 
Location: Wisconsin

Join Date: Feb 2009
Posts: 2
Default

MapView looks great, but I'm having trouble loading data into it. I used MVFMaker tool and loaded in the reference genome I used to align my data to and the .map file that I obtained from MAQ output (using the easyrun command).

However, when I load the MVF file into MapView, an error message appears: "No Reads in MVF!". Is there a specific setting I should have in the "Single Line Format" of the MVFMaker tool? I have been using the format with "maq" in the form since I used MAQ to build the .map file.

Any help anyone can provide would be very appreciated.

Thanks!
unionicola is offline   Reply With Quote
Old 02-18-2009, 01:39 AM   #16
zlu
Member
 
Location: UK

Join Date: Nov 2008
Posts: 32
Default

I was trying to use your MapView to look at some ELAND alignment results but didn't get very far. Since I'm new to all next generation sequencing, please excuse my ignorant questions:

1. Which ELAND alignment file should I use in the MVFmaker? I tried both the s_*_*export.txt and s_*_*_sorted.txt together with my reference genome. All I got were red x and no short reads.

2. My Solexa data comes from a paired-end sequencing of a bacterial strain. How can I input the second read?

Thank you.
zlu is offline   Reply With Quote
Old 02-18-2009, 08:34 AM   #17
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

I doubt if mapviewer works for paired end data!
bioinfosm is offline   Reply With Quote
Old 02-20-2009, 07:51 AM   #18
griffon42
Member
 
Location: New York

Join Date: Jan 2009
Posts: 23
Default Mapview overflow error

@baohua100

Thanks for all of your work on MapView...it's been very helpful so far and for the most part, working well.

I am having one problem that's given me some trouble. I'm using MapView to look at MAQ data aligned to the mouse genome (mm9). Using MVF splitter, I look at each chromosome individually. This works fine up to a certain base number, but once I jump to (or scroll past) around position 10000000, MapView crashes and I get an OVERFLOW ERROR.

I've tried running MapView on a number of different hardware platforms, and always encounter the same problem at the same positions.

Any advice anyone can offer would be much appreciated.

Thanks.
griffon42 is offline   Reply With Quote
Old 02-27-2009, 05:24 AM   #19
WJW-Davy
Member
 
Location: China

Join Date: Feb 2009
Posts: 23
Default

Thanks for your valuable suggestions. A new version is available. Download site: https://sites.google.com/site/wjwdavy/

Recently we are busy programming and testing. MapView is updating. Detailed manual is available later. If you have any question or suggestion, please do not hesitate to contact us.

(See MapView Home or MapView Lab)

Download Link 1: https://sites.google.com/site/wjwdav...attredirects=0





[Change Log - main versions]
3.4.0 [2009.6.1]
Run on Linux (e.g. Ubuntu 9) successfully.
Some bugs were fixed.

3.3.0 [2009.5.10]
The feature 'Overview Bar' map was added.
Double-click one line of SNP list to jump to its position not its SNP No.

3.2.0 [2009.5.7]
The bug about dealing with ambiguous nucleotides was fixed. (Note: In order to deal with ambiguous nucleotides correctly, new MVF file should be made. )
sta2txt feature was added.

3.1.4 [2009.5.2]
Text could be copyed from SNP list.
MVR file could be converted to text file. The SNP list could be exported to a flat file.

3.0 [2009.3.8]
Significant improvement.
Quality Score, Paired-end data, structure variation, coverage distribution, quality distribution, text quick view, zoom in/out, and other features were supported.

2.0 [2008.12.16]
Significant improvement.
Support many formats. Fasta format and text-based alignment results file(output by Eland, Maq, SOAP, MapNext, SeqMap … …).

1.0 [2008.11.28]
This was the first attempt.
Only support MapNext's format
__________________
WJW-Davy
HomePage: http://hi.baidu.com/wjwdavy
Download Center: https://sites.google.com/site/wjwdavy/

Last edited by WJW-Davy; 06-01-2009 at 06:15 PM.
WJW-Davy is offline   Reply With Quote
Old 02-27-2009, 09:04 PM   #20
baohua100
Senior Member
 
Location: Canada

Join Date: Jun 2008
Posts: 103
Default

Now we add ZOOM IN/OUT feature to MapView

http://evolution.sysu.edu.cn/software/mapview.htm

In next week mapview can display quality score.
baohua100 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO