Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • how to align 15bp length reads by SHRiMP

    I am testing different aligners for mapping single-end 15bp length reads back to genome. For SHRiMP, What would be an appropriate parameter setting? The parameter system seems much more complicated than other aligner like bowtie, bwa, and I did not find parameter to set seed_length.

  • #2
    15bp single-end reads are too short: mathematically, it is impossible to find the location where they originate from in the human genome. This has nothing to do with the choice of the mapper. Here is why.

    Given a fixed 15bp read, assuming the reference is uniform random string (for math purposes), a perfect (15 matches) random hit will occur at any one location w.p. 4^-15. However, the length of the human genome is about 3*4^15. Hence, you expect 3 perfect random hits per read. You have no mathematical chance of distinguishing between those and the location where the read really originates from.

    Moreover, the set of random hits grows a lot the moment you allow any polymorphisms. E.g., suppose you allow for a single SNP. The probability the read matches a random string with exactly one mismatch is 15*(1/3)*(1/4^14) = 5*4^-14. Since there are 3*4^15 locations in the genome, you expect 5*3*4=60 random hits at a distance of 1 SNP from your read.

    With the above in mind, you cannot map 15bp single-end reads back to the human genome and hope to find where they originate from (on average). The best you can hope for is a list of possible locations, but as explained above, the list will be quite large the moment you allow as little as 1 SNP.

    Mappers based on spaced seeds (SHRiMP, BFAST) beat mappers based on exact string matching/Burrows-Wheeler-Transform (BWA, Bowtie) in sensitivity when dealing with highly polymorphic reads (or very noisy data ). E.g. a read of length 50bp with 10 mismatches will be mapped much more reliably by the former than by the latter. However, in this situation you still have 40 matches to go on, which is unlikely to arise by chance in hg (4^-40 vs 3*4^15).

    In conclusion, in my opinion you don't need highly sensitive mappers (such as SHRiMP) to deal with your data.
    -- Matei David

    Comment


    • #3
      Thanks, mateidavid : )

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 08:47 AM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      59 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X