Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unique mapping with Tophat2

    Hello everyone,

    I have seen posts about filtering unique hits after default run with Tophat2 but some say I should get the reads with MAPQ=255, some say I should get the reads with NH:i:1. Can you briefly describe how to collect the unique hits after tophat2 run, I am a little confused :/

  • #2
    I think the most recent versions use a MAPQ of 50 for unique alignments, but in either case a threshold of 5 will filter out the multimappers (the current possible MAPQs are 0, 1, 2, 3, and 50, if I remember correctly). At least htseq-count will automatically filter by NH tag if it's there, though I always specified a MAPQ threshold anyway.

    Comment


    • #3
      Thanks to devon, I assume I could get the unique hits from a default run (-g 20).

      But to confirm if I get the unique hits, I performed several runs. So, if I substract
      "-g 1" number of reads from "-g 2" number of reads, I should find the number of multi-
      hitting reads, right? In my case it is: 20,064,410 - 19,499,641 = 564,749 reads seems to
      be multi mapping. If I substract 564,749 from "-g 1" number of reads, I should find the
      uniquely mapping reads: 19,499,461 - 564,749 = 18,934,712? But when I typed
      "samtools view -q 50 -o output_file accepted_hits.bam" code to get the number of unique
      hits from a default run (-g 20), it is: 19,170,015 :/ so it's not equal, am I thinking
      wrong?

      Also what I have realized is that if I try to filter unique hits from a "-g 2", the number is: 19,136,250 but if I try to filter unique hits from a "-g 20" (default), it is: 19,170,015 , I expected that they would be also equal in number. What am I missing?

      Comment


      • #4
        I'm not sure what's going on in your case, but there are certainly problems with the mapq encoding in tophat. Specifically it seems to calculate the mapq value after doing the -g filtering instead of before, so if you specify -g 1 then all reads (even ones with lots of possible mapping positions) will get a mapq of 50.

        If you're using -g > 1 then you could also compare the mapq values to the NH:i:x values since these should correctly report whether the hit is unique or not.

        Comment


        • #5
          yes, you are right. If you do a "-g 1" run, all the reads have MAPQ of 50, so it defines uniqueness depending on the output file.

          Comment


          • #6
            Originally posted by sazz View Post
            yes, you are right. If you do a "-g 1" run, all the reads have MAPQ of 50, so it defines uniqueness depending on the output file.
            Indeed, but it really shouldn't do that as it's completely misleading. The NH:i;x tags are specifically designed to show the number of redundant hits within the current file, but mapq should be a measure of mapping quality which is independent of how many hits you reported.

            Comment


            • #7
              I tried to find the number of unique mappings with "NH:i:1" tag but the result is exactly same as the number I find with MAPQ:50.

              "-g 2": unique hits both with NH:i:1 and MAPQ 50 is 19,136,250
              "-g 20": unique hits both with NH:i:1 and MAPQ 50 is 19,170,015
              "-g 1000": unique hits both with NH:i:1 and MAPQ 50 is 19,170,015

              Still I don't understand why there is difference between "-g 2" and "-g 20"?

              Comment


              • #8
                Originally posted by sazz View Post
                I tried to find the number of unique mappings with "NH:i:1" tag but the result is exactly same as the number I find with MAPQ:50.

                "-g 2": unique hits both with NH:i:1 and MAPQ 50 is 19,136,250
                "-g 20": unique hits both with NH:i:1 and MAPQ 50 is 19,170,015
                "-g 1000": unique hits both with NH:i:1 and MAPQ 50 is 19,170,015

                Still I don't understand why there is difference between "-g 2" and "-g 20"?
                I'm not sure. In the controlled tests I did to get to the bottom of this we got back what we expected, with the -g filter being applied correctly. We also tested the --report-secondary-alignments options which also seemed to work OK. The only problem we hit was with the mapq values.

                As another check you could look at the 0x100 position in the flag value. This should be false for only one read in each set (the primary alignment) so finding the number of reads for which this is false should also give you a measure of different hits.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                51 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X