SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
using bwt to align 454 reads haonmada Bioinformatics 7 11-17-2011 01:46 AM
Align reads to contigs ojy Bioinformatics 3 07-25-2011 10:16 AM
SHRiMP - how to obtain unique mapped reads? idonaldson Bioinformatics 10 07-02-2010 03:48 PM
to aligning all reads with SHRiMP (as automatic) ahabnar Bioinformatics 0 11-04-2009 10:43 AM
SHRiMP: Accurate Mapping of Short Color-space Reads doxologist Literature Watch 0 05-27-2009 07:29 AM

Reply
 
Thread Tools
Old 10-09-2011, 04:13 PM   #1
54016565
Junior Member
 
Location: Washington DC

Join Date: Jun 2011
Posts: 5
Default how to align 15bp length reads by SHRiMP

I am testing different aligners for mapping single-end 15bp length reads back to genome. For SHRiMP, What would be an appropriate parameter setting? The parameter system seems much more complicated than other aligner like bowtie, bwa, and I did not find parameter to set seed_length.
54016565 is offline   Reply With Quote
Old 10-17-2011, 07:56 AM   #2
mateidavid
Junior Member
 
Location: Toronto

Join Date: Oct 2011
Posts: 3
Default

15bp single-end reads are too short: mathematically, it is impossible to find the location where they originate from in the human genome. This has nothing to do with the choice of the mapper. Here is why.

Given a fixed 15bp read, assuming the reference is uniform random string (for math purposes), a perfect (15 matches) random hit will occur at any one location w.p. 4^-15. However, the length of the human genome is about 3*4^15. Hence, you expect 3 perfect random hits per read. You have no mathematical chance of distinguishing between those and the location where the read really originates from.

Moreover, the set of random hits grows a lot the moment you allow any polymorphisms. E.g., suppose you allow for a single SNP. The probability the read matches a random string with exactly one mismatch is 15*(1/3)*(1/4^14) = 5*4^-14. Since there are 3*4^15 locations in the genome, you expect 5*3*4=60 random hits at a distance of 1 SNP from your read.

With the above in mind, you cannot map 15bp single-end reads back to the human genome and hope to find where they originate from (on average). The best you can hope for is a list of possible locations, but as explained above, the list will be quite large the moment you allow as little as 1 SNP.

Mappers based on spaced seeds (SHRiMP, BFAST) beat mappers based on exact string matching/Burrows-Wheeler-Transform (BWA, Bowtie) in sensitivity when dealing with highly polymorphic reads (or very noisy data ). E.g. a read of length 50bp with 10 mismatches will be mapped much more reliably by the former than by the latter. However, in this situation you still have 40 matches to go on, which is unlikely to arise by chance in hg (4^-40 vs 3*4^15).

In conclusion, in my opinion you don't need highly sensitive mappers (such as SHRiMP) to deal with your data.
mateidavid is offline   Reply With Quote
Old 10-18-2011, 06:47 AM   #3
54016565
Junior Member
 
Location: Washington DC

Join Date: Jun 2011
Posts: 5
Default

Thanks, mateidavid : )
54016565 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:28 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO