Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Perfect match disagreement between bowtie and BLAT on human genome

    Hello,

    I am currently testing a number of aligners with application to miRNA sequenceing and have come across a curious problem with bowtie. I run bowtie with the following options so should get all the perfect matches:
    Code:
    ./bowtie -p 4 --solexa-quals --best -k 100  -t h_sapiens_asm ../GDB1.fastq GDB1.map
    The index file is the human genome as supplied by the makers of bowtie.

    For the sequence "TGGGAATACCGGGTGCTGTAGGCTTT" I get two hits one on chromosome 12 and the other on the X. When I blat this sequence I get 22 hits (chr1 * 16, 12*2, X*2, 17, 19).

    Does anyone know why there is a difference?

    I also applied the same dataset to novoalign,
    Code:
    ./novoalign -rAll -f ../GDB1.fastq -d hsapiens > GDB1.map
    , and get 23 perfect matches, with an extra chromosome 1 match.

    I am very confused as to why there is so many differences and would welcome any help in this area.

    Thanks

  • #2
    Hi there,

    I can't reproduce this. When I run "./bowtie -c --solexa-quals --best -k 100 h_sapiens_asm TGGGAATACCGGGTGCTGTAGGCTTT", I get 23 hits; presumably the same ones as novoalign:

    sycamore:~/research/bowtie $ ./bowtie -c --solexa-quals --best -k 100 /fs/szasmg/langmead/ebwts/h_sapiens_asm TGGGAATACCGGGTGCTGTAGGCTTT
    0 + gi|89161218|ref|NC_000023.9|NC_000023 68809142 TGGGAATACCGGGTGCTGTAGGCTTT IIIIIIIIIIIIIIIIIIIIIIIIII 1
    0 + gi|89161190|ref|NC_000012.10|NC_000012 34249995 TGGGAATACCGGGTGCTGTAGGCTTT IIIIIIIIIIIIIIIIIIIIIIIIII 1
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226823814 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226826034 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226819358 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226837238 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226821599 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226814876 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226832757 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226812635 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226817117 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226828276 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226848407 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226839463 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226834998 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226841704 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226846176 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226843935 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161185|ref|NC_000001.9|NC_000001 226830515 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161190|ref|NC_000012.10|NC_000012 36841532 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|42406306|ref|NC_000019.8|NC_000019 21087771 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161218|ref|NC_000023.9|NC_000023 28910887 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    0 - gi|89161213|ref|NC_000007.12|NC_000007 139733047 AAAGCCTACAGCACCCGGTATTCCCA IIIIIIIIIIIIIIIIIIIIIIIIII 20
    Reported 23 alignments to 1 output stream(s)

    Is there another example where Bowtie does not produce the expected output that I can try?

    Ben

    Comment


    • #3
      I think I have worked out the problem. Novoalign outputs the source read sequence no matter what strand it is on whereas bowtie always takes the sequence on the same strand, no matter what strand the match was to. So I was just filtering on "TGGGAATACCGGGTGCTGTAGGCTTT" whereas the other hits that came on the reverse strand were reported under "AAAGCCTACAGCACCCGGTATTCCCA".

      Sorry for the confusion.

      Comment


      • #4
        When I enter your sequence in ISAS I get 23 perfect matches. Below is a transcript of an interactive session.

        ========================================================
        Enter next command, or type "?" (and ENTER) for list of commands.

        limit=30
        For each sequence, the search will stop if 30 hits are found.
        Allocated buffer for 58.4 million sequences (0.0 sec.)

        Enter next command, or type "?" (and ENTER) for list of commands.

        sequence=TGGGAATACCGGGTGCTGTAGGCTTT

        23 matches found in 9.0 micro seconds.

        Match no. 1: Reverse Chr. 1 Positions 226812661..226812636, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 2: Reverse Chr. 1 Positions 226814902..226814877, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 3: Reverse Chr. 1 Positions 226817143..226817118, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 4: Reverse Chr. 1 Positions 226819384..226819359, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 5: Reverse Chr. 1 Positions 226821625..226821600, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 6: Reverse Chr. 1 Positions 226823840..226823815, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 7: Reverse Chr. 1 Positions 226826060..226826035, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 8: Reverse Chr. 1 Positions 226828302..226828277, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 9: Reverse Chr. 1 Positions 226830541..226830516, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 10: Reverse Chr. 1 Positions 226832783..226832758, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 11: Reverse Chr. 1 Positions 226835024..226834999, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 12: Reverse Chr. 1 Positions 226837264..226837239, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 13: Reverse Chr. 1 Positions 226839489..226839464, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 14: Reverse Chr. 1 Positions 226841730..226841705, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 15: Reverse Chr. 1 Positions 226843961..226843936, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 16: Reverse Chr. 1 Positions 226846202..226846177, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 17: Reverse Chr. 1 Positions 226848433..226848408, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 18: Reverse Chr. 7 Positions 139733073..139733048, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 19: Forward Chr. 12 Positions 34249996..34250021, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 20: Reverse Chr. 12 Positions 36841558..36841533, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 21: Reverse Chr. 19 Positions 21087797..21087772, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 22: Reverse Chr. 23 Positions 28910913..28910888, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Match no. 23: Forward Chr. 23 Positions 68809143..68809168, 0 Mismatches

        TGGGAATACCGGGTGCTGTAGGCTTT
        TGGGAATACCGGGTGCTGTAGGCTTT
        0 substitutions

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        27 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        26 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X