Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • HELP!! XS tag to filter unique reads Do NOT match bowtie2 report

    I find the XS tag filtering the uniquely mapped reads number is different from the bowtie2 report. I first extract the flag number of 147/99 + 163/83 and it matches the reported 'aligned concordantly exactly 1 time' + 'aligned concordantly >1 times'. But when I extract the pairs with both reads tagged XS and find the number is a bit larger than the reported 'aligned concordantly >1 times'. I wonder what reads bowtie2 reported as unique...

    Here is the output of bowtie2:
    28115453 reads; of these:
    28115453 (100.00%) were paired; of these:
    9177458 (32.64%) aligned concordantly 0 times
    15185809 (54.01%) aligned concordantly exactly 1 time
    3752186 (13.35%) aligned concordantly >1 times
    ----
    9177458 pairs aligned concordantly 0 times; of these:
    2358270 (25.70%) aligned discordantly 1 time
    ----
    6819188 pairs aligned 0 times concordantly or discordantly; of these:
    13638376 mates make up the pairs; of these:
    9978644 (73.17%) aligned 0 times
    1757628 (12.89%) aligned exactly 1 time
    1902104 (13.95%) aligned >1 times
    82.25% overall alignment rate

    and the numbers I get:
    flag:
    147/99 9471127
    163/83 9466868

    Pairs with XS tag
    0 read get XS tag:13757424
    1 read get XS tag:1358209
    2 reads get XS tag:3822362

  • #2
    Mismatch between bowtie2 report and manual XS-flag count

    We are having a similar problem. Our bowtie2 report does not match the manual count of XS flags:

    192904649 reads; of these:
    192904649 (100.00%) were paired; of these:
    30256829 (15.68%) aligned concordantly 0 times
    85757625 (44.46%) aligned concordantly exactly 1 time
    76890195 (39.86%) aligned concordantly >1 times
    ----
    30256829 pairs aligned concordantly 0 times; of these:
    8088262 (26.73%) aligned discordantly 1 time
    ----
    22168567 pairs aligned 0 times concordantly or discordantly; of these:
    44337134 mates make up the pairs; of these: (correct, counted flag 'YT:Z:UP')
    26074256 (58.81%) aligned 0 times (correct, counted flag 'YT:Z:UP' && not 'AS:i')
    4769246 (10.76%) aligned exactly 1 time (??? we get: 10728515, counted flag 'YT:Z:UP' && 'AS:i' && not 'XS:i')
    13493632 (30.43%) aligned >1 times (??? we get: 7534363, counted flag 'YT:Z:UP' && 'XS:i')
    93.24% overall alignment rate

    Any ideas of where it goes wrong??

    We did a quick check and re-run bowtie2 with the 'k 2' option, i.e. bowtie2 reports 2 alignments wherever 2 valid alignments are found. And indeed, we find reads with the 'YT:Z:UP' && 'AS:i' && not 'XS:i' flags, where a second alignment is reported nevertheless. Does this suggest that the XS flag is not always properly set?

    I'd be thankful for any advise!
    Juliana and Katrin

    Comment


    • #3
      The presence of an XS tag doesn't mean it aligns more than once. It'll align more than one time if AS==XS (though I'd have to double check the bowtie2 source code to ensure that that's how it's actually calculated).

      Comment


      • #4
        Hi dpryan,

        Thanks for your reply. What does the XS flag stand for in your opinion?

        As I understood the bowtie2 manual, the XS flag is set, when a second valid alignment for the same read is found. AS==XS would then be the case, if the second valid alignment (XS) has a score as high as the first (reported as "best", AS) valid alignment. Please correct me, if I am wrong.

        Best,
        Katrin

        Comment


        • #5
          That's correct, but one needs to remember that the existence of a secondary alignment doesn't mean that the read doesn't map uniquely. If that were not the case, then one could decrease the threshold for declaring an alignment valid low enough that there would be no possible unique alignments. This rather goes against what we generally mean when we conceive of reads mapping uniquely vs. multiply.

          If AS==XS then everyone agrees that the read is multimapping. If that's not the case, then most people would agree that the read originated from the first rather than the second position, but simply temper that belief by whatever the computed MAPQ turns out to be (N.B., the bowtie2 algorithm is weird). A quick perusal of the bowtie2 code (btw, apparently there's a new version since yesterday) suggests that these values are computed in AlnSinkWrap::finishRead in the aln_sink.cpp file. Have a look there if you really want to know exactly how the reported numbers were aligned (I've never looked through that function since I rely on those numbers for anything important).

          Comment


          • #6
            Hi dpryan,

            Thanks again, also for clarifying terminology. I always assumed the bowtie2 summary stated whether one or more valid alignments are found irrespective of the scores. For single-end reads, this is actually the case. The count of XS flags in the bowtie2 output is equal to the ">1 times" in the summary. So for single-end data, I disregarded all (multiple) alignments with XS==AS by manual filtering.

            Now I was trying to find a way for such a (uniqueness) filter for paired-end data and got stuck on the XS flag counts. But even if you are right, my counts of XS should result in a higher number than ">1 times" in the summary (because I count the mere presence of XS, i.e. both AS>XS and AS==XS). My count is smaller though. I am puzzled!? So either XS is not always set properly, or the bowtie2 summary is wrong, or -of course- I still don't get it.

            Thanks for your help and thoughts,
            Katrin

            Comment


            • #7
              True, someone would need to go through the code to see how those numbers are derived.

              Comment


              • #8
                Yes, probably. It would be nice to know what is going on. I suppose filtering alignments for uniqueness is an issue of general interest. And since this option is gone in bowtie2 ('-m' in bowtie version 1), filtering based on XS flags would be an easy alternative.

                Do you have experience with paired-end DNA-sequence data? If so, do you apply any filter, or do you keep all reads?

                Comment


                • #9
                  I usually just filter by MAPQ score (5 or 10 is a good enough threshold to ensure that an alignment is actually unique).

                  Comment


                  • #10
                    Could you point me to any site that explains how MAPQ is calculated in bowtie2, or is it very complicated? So far I was hesitant to filter based on MAPQ because I only understood it on a superficial level from what I read in the manual (Q = -10*log10(p), where p is the probability that the read does not originate from the aligned location, i.e. for Q=5, p=.32 ... but don't know what p is based on ... can be a not-so-good alignment, can be the presence of another valid alignment, probably altogether).

                    Thanks.

                    Comment


                    • #11
                      I posted the algorithm previously here. There's only a vague relationship between bowtie2's MAPQ scores and the idealized MAPQ scores specified in the SAM spec.

                      Comment


                      • #12
                        I should note that I just corrected the code in that link I posted (the output of 2 or 6 was wrong before).

                        Comment


                        • #13
                          Thanks for the notice. Did you extract this code from bowtie2?

                          Comment


                          • #14
                            It's very close to the code in bowtie2. This is from bison (a BS-seq aligner I wrote that uses bowtie2 as the underlying aligner), which is written in C, rather than bowtie2, which is written in C++, but aside from the language/interface difference most of the code will be identical (there aren't that many ways to code that algorithm, frankly).

                            Edit: I believe the bowtie2 function is in unique.h, if you're curious (there are 3 versions there, #2 seems to be the one used, at least up to version 2.1.0).
                            Last edited by dpryan; 02-11-2014, 09:45 AM.

                            Comment


                            • #15
                              Just to let you know. The new bowtie2 version (2.2.0) is fine! I re-run my analysis -- and the XS counts give the same results as the bowtie2 summary. I should say that the summary is more or less the same, implying that the XS tags were not always properly set before.

                              Best & thanks again,
                              Katrin

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X