SEQanswers how to align 15bp length reads by SHRiMP
 User Name Remember Me? Password
 Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 Similar Threads Thread Thread Starter Forum Replies Last Post haonmada Bioinformatics 7 11-17-2011 01:46 AM ojy Bioinformatics 3 07-25-2011 10:16 AM idonaldson Bioinformatics 10 07-02-2010 03:48 PM ahabnar Bioinformatics 0 11-04-2009 10:43 AM doxologist Literature Watch 0 05-27-2009 07:29 AM

 10-09-2011, 04:13 PM #1 54016565 Junior Member   Location: Washington DC Join Date: Jun 2011 Posts: 5 how to align 15bp length reads by SHRiMP I am testing different aligners for mapping single-end 15bp length reads back to genome. For SHRiMP, What would be an appropriate parameter setting? The parameter system seems much more complicated than other aligner like bowtie, bwa, and I did not find parameter to set seed_length.
 10-17-2011, 07:56 AM #2 mateidavid Junior Member   Location: Toronto Join Date: Oct 2011 Posts: 3 15bp single-end reads are too short: mathematically, it is impossible to find the location where they originate from in the human genome. This has nothing to do with the choice of the mapper. Here is why. Given a fixed 15bp read, assuming the reference is uniform random string (for math purposes), a perfect (15 matches) random hit will occur at any one location w.p. 4^-15. However, the length of the human genome is about 3*4^15. Hence, you expect 3 perfect random hits per read. You have no mathematical chance of distinguishing between those and the location where the read really originates from. Moreover, the set of random hits grows a lot the moment you allow any polymorphisms. E.g., suppose you allow for a single SNP. The probability the read matches a random string with exactly one mismatch is 15*(1/3)*(1/4^14) = 5*4^-14. Since there are 3*4^15 locations in the genome, you expect 5*3*4=60 random hits at a distance of 1 SNP from your read. With the above in mind, you cannot map 15bp single-end reads back to the human genome and hope to find where they originate from (on average). The best you can hope for is a list of possible locations, but as explained above, the list will be quite large the moment you allow as little as 1 SNP. Mappers based on spaced seeds (SHRiMP, BFAST) beat mappers based on exact string matching/Burrows-Wheeler-Transform (BWA, Bowtie) in sensitivity when dealing with highly polymorphic reads (or very noisy data ). E.g. a read of length 50bp with 10 mismatches will be mapped much more reliably by the former than by the latter. However, in this situation you still have 40 matches to go on, which is unlikely to arise by chance in hg (4^-40 vs 3*4^15). In conclusion, in my opinion you don't need highly sensitive mappers (such as SHRiMP) to deal with your data.
 10-18-2011, 06:47 AM #3 54016565 Junior Member   Location: Washington DC Join Date: Jun 2011 Posts: 5 Thanks, mateidavid : )