Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Color-space mapping with Bowtie

    Hi, all.

    I am using bowtie for mapping color-space reads.
    While checking the alignments, I found that some read sequences reported in output file do not match with original read (in base-space).

    For example, read A,
    Code:
    T010130203200100320123211 (color-space)
     TGGTAAGGCTTTGGGCTTGATCAC (base-space)
    was mapped as below:
    Code:
    >bowtie -v 2 -m 1 --best -t -p 10 -y -C hsa_miRBase17_hairpin_c -c T010130203200100320123211
    0       +       hsa-mir-191     16      AACGGAATCCCAAATCCAGCTG  qqqqqqqqqqqqqqqqqqqqqq  0       14:A>T,15:G>C
    Note the reported read sequence is different from original read in base-space.

    So I checked the bowtie index files.

    Code:
    >bowtie-inspect -e hsa_miRBase17_hairpin_c | grep -A 1 hsa-mir-191
    >hsa-mir-191
     TATGCAGCCGTTAATCACTAGATGAACAAAGTCGTGCCACCGGGACGGGTCTAGACGTGCTTTGACAGTAAGTCGAAAGCTGGGGAGCTAG
    Assuming the above sequence was double-encoded, I decoded it back and look where the read was mapped.
    Code:
     TATGCAGCCGTTAATCACTAGATGAACAAAGTCGTGCCACCGGGACGGGTCTAGACGTGCTTTGACAGTAAGTCGAAAGCTGGGGAGCTAG
     3032102112330031013020320010002312321101122201222313020123213332010230023120002132222021302
    CGGCTGGACAGCGGGCAACGGAATCCCAAAAGCAGCTGTTGTCTCCAGAGCATTCCAGCTGCGCTTGGATTTCGTCCCCTGCTCTCCTGCCT
                  T010130203200100320123211
    (I attached 'C' in the beginning so that decoding start correctly)
    Ignoring first color base, '10130203200100320123211' matches the reference in color space within two mismatches (-v 2). But apparently this is wrong.

    I tested with other reads, and some reads were mapped correctly and some were not. Did I get something wrong, or is this a bug in bowtie?

    Thanks in advance!

  • #2
    bowtie matching

    Originally posted by ikarus97 View Post
    Code:
    T010130203200100320123211 (color-space)
     TGGTAAGGCTTTGGGCTTGATCAC (base-space)
    was mapped as below:
    Code:
    >bowtie -v 2 -m 1 --best -t -p 10 -y -C hsa_miRBase17_hairpin_c -c T010130203200100320123211
    0       +       hsa-mir-191     16      AACGGAATCCCAAATCCAGCTG  qqqqqqqqqqqqqqqqqqqqqq  0       14:A>T,15:G>C
    This looks like a reasonable match to me. However, it's 22 bases, with 2 mismatches, and a colour-space sequence, so there's a bit of a chance that you've hit a false positive match. If you consider that any mismatches in colour-space will produce substantially different results in base space, it might make more sense why this has been selected as a match.

    For any colour space sequence, there are really 4 different sequences that are being matched at the same time (one for each starting base). A mismatch in colour-space will result in a jump between these sequences, so you get hybrid sequences that look only a bit like the parents (hmm... reminds me of recombination).

    As you get further away from the start of a read, the reliability of the base-space conversion is reduced, so don't put too much faith in the base-space sequence that you generated from your read.

    If you want to trust fully your own colour-space->base-space conversions, what's the point of matching in colour-space at all? You might as well just convert everything to base-space first, then match.

    Comment


    • #3
      Thanks for you reply.

      Yes, I got your point, but it still seems counter-intuitive to me.
      I thought the read and its match should look alike in both color-space and base-space.

      Then, what IS the point of matching in color-space?
      As you said, 4 different sequences can be matched to the given color-space sequence. If we consider mismatches, the number of such sequences would grow exponentially.
      It feels like projecting different sequences into same point, then trying to find which point is closest to the point of interest.

      Comment


      • #4
        You match in color-space to get high accuracy relative to the previous base. The reads are more reliable, but when there's a sequencing error, it junks the rest of the sequence. It makes it easier to detect sequencing errors and point mutations, but seems (to me) to be much more trouble than its worth for de-novo sequencing and/or mapping.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X