Seqanswers Leaderboard Ad

**Brian Bushnell** · 03-19-2014, 10:14 PM

The best way - in fact, I would say, the only way - to test how accurate an aligner is, would be with synthetic data. The e.coli reference would be nice for this, just download that file and rename it to ecoli.fa (alternately you could just use human chromosome 21).

If you download BBMap, you can generate random reads like this:

randomreads.sh -Xmx1g ref=ecoli.fa build=1 out=reads.fq maxq=10 minq=10 len=100 reads=100000

That will generate 100000 reads of 100bp length, all of quality 10 (meaning 10% chance of error per base - quality 20 is 1%, quality 30 is 0.1%, etc). They will be randomly distributed around the e.coli genome, and every read will have a header indicating its genomic origin. You can also add insertions and deletions with other flags, like "delrate=0.5 maxdellen=20 maxdels=3" which would put deletions in 50% of the reads, of length 1 to 20, and up to 3 deletions per read - specifically, a 50% chance of 1+ deletions, a 25% chance of 2+ deletions, and a 25% chance of 3 deletions.

After you map with an aligner, you will get a sam file. You can evaluate it like this:

gradesam.sh in=mapped.sam reads=100000

This will give you the true positive, false positive, and false negative mapping rates, both strict (requiring both read ends to map back to the exact origin) and loose (requiring at least 1 end to map back to within 20bp of the origin), as well as rate of ambiguous mapping.

P.S. If you want to do everything in Windows, the shellscripts won't work. You have to have Java installed, and run the programs like this:

java -Xmx1g -cp path/to/bbmap/current align2.RandomReads3 ref=ecoli.fa build=1 out=reads.fq maxq=10 minq=10 len=100 reads=100000

and

java -Xmx1g -cp path/to/bbmap/current align2.GradeSamFile in=mapped.sam reads=100000

BBMap also runs in Windows. You can run it like this:

java -Xmx1g -cp path/to/bbmap/current align2.BBMap ref=ecoli.fa in=reads.fq out=mapped.sam

**calatian** · 03-20-2014, 09:53 PM

Originally posted by Brian Bushnell View Post

The best way - in fact, I would say, the only way - to test how accurate an aligner is, would be with synthetic data. The e.coli reference would be nice for this, just download that file and rename it to ecoli.fa (alternately you could just use human chromosome 21).

Brian,

Thank you so much for such a thorough and clear response. This is exactly the kind of direction I needed (and much more than I expected). I will try it out right away. Thanks again!

**Brian Bushnell** · 03-21-2014, 09:00 AM

Originally posted by calatian View Post

Brian,

Thank you so much for such a thorough and clear response. This is exactly the kind of direction I needed (and much more than I expected). I will try it out right away. Thanks again!

You're welcome

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Introduction and request for BWA information

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News