Seqanswers Leaderboard Ad

**Chipper** · 12-09-2009, 12:59 PM

BFAST is giving me the highest % mapped reads so I am surprised by your numbers. Did you use all ten indexes?

**nilshomer** · 12-09-2009, 04:44 PM

Originally posted by cczhong View Post

Hi All,

Could you recommend a software for mapping SOLiD reads?

I do not care about the running time and splice mapping. My purpose is to map as many reads as possible to the reference genome.

I tested a few softwares, inlcuding corona, socs, and bfast(with default parameters). It seems that corona can map ~55% reads, while socs maps ~25% reads and bfast maps ~35% reads.

Could you give me some suggestions?

Thank you!

-Cuncong

You should see approximately 55-60% of all bases map with BFAST when mapping human genomic DNA sequenced on SOLiD v2 & v3. As for software for SOLiD (I am the author of BFAST), my other three favorites are:
BWA
Mosaik
MAQ

Keep checking back every once in a while since new software for SOLiD is always being developed

**cczhong** · 12-09-2009, 09:27 PM

hum... it seems that I have used only one index for BFAST. I will try the whole set of them and report the mapped fraction as soon as I get the result. Thank you!

**nilshomer** · 12-09-2009, 09:59 PM

Originally posted by cczhong View Post

hum... it seems that I have used only one index for BFAST. I will try the whole set of them and report the mapped fraction as soon as I get the result. Thank you!

Great! What organism are you sequencing? Given the polymorphism rate of the organism (ex. mouse), you may need to up the sensitivity since the ten indexes were designed for human resequencing. I look forward to you results.

**Chipper** · 12-10-2009, 12:57 AM

Compared to corona (50.4), BFAST produced significantly more unique alignments (41 vs 68%, human data, ungapped). If you plan to use it I would suggest to filter reads on quality first since it reports far too many alignments from low quality reads. Filtering by mapping quality helps to some extent, but generally reads with an average QV < 10 are just increasing the noise.

**westerman** · 12-10-2009, 06:29 AM

The new corona-replacement software from ABI/Lifetech called 'Bioscope' is suppose to give better mapping results. I do not have good numbers from it at the moment but will try to get at least a bfast vs. bioscope comparison done within the week.

**jlli** · 12-10-2009, 09:16 AM

We just got the Bioscope, it would be great if you can share the results of bfast vs. bioscope.

**aguffanti** · 12-15-2009, 06:51 AM

Mapping software for SOLiD data

Hi. Alhtough is quite slow (but much faster if you use the precompiled binaries with intel's compiler and don't recompile yourself from scratch) I would recommend SHRiMP. Local alignment and excellent sensitivity. You will need around 16 Giga for 4 million 50bp reads vs human genome

Regards

Alessandro

Originally posted by cczhong View Post

Hi All,

Could you recommend a software for mapping SOLiD reads?

I do not care about the running time and splice mapping. My purpose is to map as many reads as possible to the reference genome.

I tested a few softwares, inlcuding corona, socs, and bfast(with default parameters). It seems that corona can map ~55% reads, while socs maps ~25% reads and bfast maps ~35% reads.

Could you give me some suggestions?

Thank you!

-Cuncong

**ech** · 12-15-2009, 04:34 PM

speed of bfast

We tried mapping 25bp SOLiD reads against E.Coli genome. The bfast mapping is very very slow (more than 20 times slower than bwa). Perhaps this is caused by masks we have used or other settings. Can anyone provide a time of how many millions SOLiD reads per hour can be mapped on typical CPU (assume we allow upto 3 errors including short indel) using bfast?

**rdeborja** · 05-19-2010, 01:39 PM

I've installed and run data using Bioscope v1.0 for the last few months. With the progressive mapping approach, we are receiving significantly better alignment results with an average alignment read length of 40bp from a 50bp run. We have been consistently obtaining 70-80% mapped reads on human and mouse data.

I haven't run BFAST much on SOLiD data so can't provide a comment there.

I'll have to post a few caveats that I ran into when installing Bioscope. There are a few counter intuitive issues when installing on a decent size cluster (i.e ~500 nodes).

**pmiguel** · 05-26-2010, 04:44 AM

Originally posted by rdeborja View Post

I've installed and run data using Bioscope v1.0 for the last few months. With the progressive mapping approach, we are receiving significantly better alignment results with an average alignment read length of 40bp from a 50bp run. We have been consistently obtaining 70-80% mapped reads on human and mouse data.

I haven't run BFAST much on SOLiD data so can't provide a comment there.

I'll have to post a few caveats that I ran into when installing Bioscope. There are a few counter intuitive issues when installing on a decent size cluster (i.e ~500 nodes).

I was told at the SOLiD User's Summit in September of last year that BioScope used a "seeded extension" method -- which might be characterized as a "progressive mapping approach". But, if only for historical reasons, I suppose it should be distinguished from the "progressive extension" methodology deployed by Global SETs. Since Global SETs was sold, (BioScope is "free" as in "free beer" -- well, as long as you own a SOLiD) and the one person I heard talk about it seemed to have a low opinion of Global SETs -- I might well be accused of being pedantic for even bringing it up.

Anyway, I'm with you. We seem to be getting very high levels of mapping using v3+ chemistry and BioScope 1.x. Generally 70-80% with both mate pair and fragment runs on small to moderate complexity genomes. (yeast, rice, chicken). Okay, the rice and chicken used mate pair and the yeast was fragment.

Of course, the rub is that validating the mapping positions would be, well, non-trivial.

--
Phillip

**drio** · 05-26-2010, 07:56 AM

Originally posted by ech View Post

We tried mapping 25bp SOLiD reads against E.Coli genome. The bfast mapping is very very slow (more than 20 times slower than bwa). Perhaps this is caused by masks we have used or other settings. Can anyone provide a time of how many millions SOLiD reads per hour can be mapped on typical CPU (assume we allow upto 3 errors including short indel) using bfast?

Bfast aligns first (look for CALs against the indexes/SA) and then performs SW against those. You cannot specify the number of mismatches directly but you can tweak the SW parameters (I haven't had to tweak them).

That being said, you should be able to map 10M reads (50bp -- default human indexes recommend) in about 5 hours against a typical 8 core machine with 32G of RAM. This numbers can change depending on your hardware, particularly the storage system.

Yes bwa is faster and generates pretty good alignments for CS data. But Bfast is the only open source aligner I know that truly works in CS natively. In addition, the reported alignments (SAM) come with very useful CS information (namely, original CS calls/quals, CS correction, etc...)

I also like the fact you get small indels (up to 20 bp) with BF.

Give bfast a try, you won't regret it.

**epigen** · 06-28-2010, 07:07 AM

Comparing aligners for SOLiD data

Originally posted by westerman View Post

The new corona-replacement software from ABI/Lifetech called 'Bioscope' is suppose to give better mapping results. I do not have good numbers from it at the moment but will try to get at least a bfast vs. bioscope comparison done within the week.

What happend to your comparison? The result would be very interesting to know because noone seems to compare those two tools that both claim to be the best for SOLiD data. BFAST is the only popular aligner I haven't tried yet because noone in my institution has any experience with it.
So far, I have used BWA version 0.5.8, BioScope 1.2, and NovoalignCS beta on a set of ~60 Mio. reads from a human transcriptome project (of course not representative, but unbiased). BWA was fastest (3.5 h on 8 CPUs) but only mapped 34% of the reads with default parameters. I will try -l 25 -n 8 as recommended in a thread and see if it gets better. The BioScope WT pipeline was slower (due to merging reads mapped to genome, filters, and splice junctions, in total 10 h on 16 CPUs) and mapped 79% to the genome. NovoalignCS mapped 57% and by far the slowest (took almost a week on 16 CPUs), but it's beta after all.
I agree that installing BioScope is complicated and I dislike the fact that it's quite inscrutable how the programs inside work. Using BWA is easy but the conversion into pseudo-colorspace has the huge disadvantage of missing color space sequences in the sam/bam file (there is no CS tag). For NovoalignCS, parameters that work well for test data will have to be adjusted for real world data in order to make it really comparable.
Another question is if the number of mapped reads is a good criterion ...
Edit:
BWA -l 25 -n 8 took 20h and mapped 50%.
With the new defaults, NovoalignCS improved speed by almost 50% to 75h but unfortunately, the mapping rate became a bit lower.
BFAST took 25h on 8 CPUs and mapped 69% with using the option to output only reads that have a unique best scoring alignment (this underestimates the number of reads that could be mapped, in contrast to BWA, where a random best alignment is output).
The only disadvantage for BFAST is big files: for the human genome, I ended up with 10 index files of 12 GB that have to be read into memory for mapping, then there is a temp file for each for storing the matches, each almost 5 GB => a lot of I/O going on, in which our cluster is not that great.
After learning that BioScope does a lot of hardclipping and ungapped alignment (see below), I think the winner in the category "gapped aligners for SOLiD" with criteria "highest number of mapped reads" is quite obvious!

**nilshomer** · 06-28-2010, 06:44 PM

Originally posted by epigen View Post

What happend to your comparison? The result would be very interesting to know because noone seems to compare those two tools that both claim to be the best for SOLiD data. BFAST is the only popular aligner I haven't tried yet because noone in my institution has any experience with it.
So far, I have used BWA version 0.5.8, BioScope 1.2, and NovoalignCS beta on a set of ~60 Mio. reads from a human transcriptome project (of course not representative, but unbiased). BWA was fastest (3.5 h on 8 CPUs) but only mapped 34% of the reads with default parameters. I will try -l 25 -n 8 as recommended in a thread and see if it gets better. The BioScope WT pipeline was slower (due to merging reads mapped to genome, filters, and splice junctions, in total 10 h on 16 CPUs) and mapped 79% to the genome. NovoalignCS mapped 57% and by far the slowest (took almost a week on 16 CPUs), but it's beta after all.
I agree that installing BioScope is complicated and I dislike the fact that it's quite inscrutable how the programs inside work. Using BWA is easy but the conversion into pseudo-colorspace has the huge disadvantage of missing color space sequences in the sam/bam file (there is no CS tag). For NovoalignCS, parameters that work well for test data will have to be adjusted for real world data in order to make it really comparable.
Another question is if the number of mapped reads is a good criterion ...

(Author of BFAST here :P). Mapped reads really only tells you when something went wrong (low mapping rate), since a high mapping rate can be achieved by aligning everything to chromosome 1 position 1. What matters are the variant calls produced at the end of the day. Unfortunately, this involves post-alignment filtering and then plugging the alignments into variant calling (i.e. many places to go wrong and add bias). I would be happy to help you set up BFAST for your comparison.

Some useful links for your own edification:
(BWA author) http://lh3lh3.users.sourceforge.net/NGSalign.shtml
(BFAST author) http://www.nilshomer.com/index.php?title=NGS_Alignment

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 56 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 52 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

The best software for mapping SOLiD reads?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News