Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Good to know, thanks!

    Comment


    • #17
      Hello,
      I find the XS tag filtering the aligned discordantly 1 time reads.

      Thanks
      Amrita

      Comment


      • #18
        Hi all,
        I am having similar problems filtering my bowtie2 output, although not as severe as the originally reported numbers. I mapped reads back to a denovo assembled transcriptome with the following settings

        -all --end-to-end --score-min L,-0.1,-0.1 --no-discordant --no-mixed
        and got the following bowtie2 std output

        44223325 reads; of these:
        44223325 (100.00%) were paired; of these:
        7691237 (17.39%) aligned concordantly 0 times
        25521175 (57.71%) aligned concordantly exactly 1 time
        11010913 (24.90%) aligned concordantly >1 times
        82.61% overall alignment rate
        I was able to confirm the number of reads which did map concordantly exactly 0 time by searching for the 'YT:Z:UP' tag. However, when I look for reads which are mapped concordantly exactly 1 time I get a number (25,531,318) which is about 100,000 higher than the number reported by bowtie2. I counted all lines in the .sam file where the 'XS:i' and 'YT:Z:UP' tags are not present. The number does not change if I include the presence of the 'AS:i' tag.

        I am aware that the --all setting leaves the MAPQ without much meaning, so using this to filter out uniquely mapped reads is not possible. I could redo the analysis to avoid this problem, but would rather continue working with the same dataset to keep things consistent There are no mentions of this setting causing any other problems though.

        Is there something I am missing, or did my settings do something unexpected?

        Best,
        Jan Philip

        Comment


        • #19
          The presence of an XS auxiliary tag doesn't mean that an alignment isn't unique (n.b., "unique" isn't really a useful term, there's a reason for MAPQ scores). bowtie2 should only count an alignment as not unique if the XS and AS scores are the same.

          Note that you're typically best off simply filtering by MAPQ score.

          Comment


          • #20
            Thank you for the swift reply. I however do not fully understand how uniqueness is defined, if not by the fact that a given read only maps to one location (given the set score thresholds etc). I understand that if the first location a read maps to is significantly better than the alternate locations (by MAPQ score feks) it could probably also be considered unique in some respect.

            However, if it is true that uniquely mapped reads could also have a XS tag, I would have expected the number of reads without it to be significantly lower, not higher, than the number reported by bowtie. So I am still pretty puzzled by my results.

            I will try to have a look at your posts regarding the MAPQ scores and seriously consider redoing my analyses.

            Comment


            • #21
              Therein lies the problem, there is no single definition of "uniqueness". There are multiple incompatible definitions. Further, if we relax the --score-min settings enough then by some definitions there will never be any unique alignments. This is why MAPQ is a generally more useful concept and you'd be better served just forgetting about the term "unique" in this context.

              Comment


              • #22
                Thank you @devon

                Comment


                • #23
                  Thanks for the help, I will try to figure out how to best solve the issue for my experiments.

                  I found this, which might be of interest to others trying to understand how bowtie2 assigns scores: link. There are also some interesting thoughts on uniqueness discussed in this and an older blog post.

                  Comment


                  • #24
                    Hi all,
                    This worked for me, but I don't know if it is a general solution. If you set the -k paramenter in Bowtie2 to >=2, you should have at least twice the name of the read in your SAM file. You can use that to remove reads that appear >1 times in the file my_filename.sam. This way you don't have to undertand how Bowtie sets tags and flags.
                    prefix="my_filename"
                    tail -n +$(expr $(grep "^@" "$prefix.sam" | wc -l | cut -f 1 -d " ") + 1) "$prefix.sam" | sort | cut -f 1 | uniq -cd | cut -d " " -f 8 > "$prefix.toremove"
                    grep -vwF -f "$prefix.toremove" "$prefix.sam" > "$prefix.unique.sam"
                    rm "$prefix.toremove"
                    Comments appreciated.
                    Last edited by keo; 03-30-2017, 07:18 PM.

                    Comment


                    • #25
                      Originally posted by dpryan View Post
                      The presence of an XS auxiliary tag doesn't mean that an alignment isn't unique (n.b., "unique" isn't really a useful term, there's a reason for MAPQ scores). bowtie2 should only count an alignment as not unique if the XS and AS scores are the same.

                      Note that you're typically best off simply filtering by MAPQ score.
                      How would you filter with MAPQ score? I have rows with
                      MAPQ=39 ... AS:i:0 XS:i:0

                      39 seems like an arbitrary value. In my case, The lines that don't have XS score, have a score of 42.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      10 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      51 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      68 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X