Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to get uniquely mapped reads from Tophat

    Hi,

    I was wondering if anyone has used the "--max-multihits" option in Tophat. I specified "--max-multihits 1" in my command line with the hope of getting uniquely mapped reads; however, when I checked the run.log file, I noticed that the program still uses the default "--max-multihits 40". It seems that specifying the value of --max-multihits doesn't make a change in the analysis.

    Has anyone experienced similar problems before?

    Thanks,

    Subeet

  • #2
    I've used -g 1 successfully, which is the same as --max-multihits according to the manual.

    Comment


    • #3
      If you want to get uniquely mapped reads from Tophat output bam file,you can use the command like this:
      Code:
      samtools view accepted_hits.bam | awk '$5==255{print $0}' > uniq_mapped_reads.sam
      In the bam file ,when the 5 colum is 255,it means that this read is an uniquely mapped read,which also contain a string"NH:i:1" in the last field.You can see the statement of bam/sam format file for detail.
      Also,you can use another command as follow,which will do the same as the first one:
      Code:
      samtools view accepted_hits.bam | grep -w "NH:i:1" > uniquely_mapped_reads.sam

      Comment


      • #4
        We use -g 1 as part of our standard tophat pipeline. Seems to work OK and definitely doesn't produce the same results as running with the default options.

        Comment


        • #5
          Originally posted by wisense View Post
          If you want to get uniquely mapped reads from Tophat output bam file,you can use the command like this:
          Code:
          samtools view accepted_hits.bam | awk '$5==255{print $0}' > uniq_mapped_reads.sam
          In the bam file ,when the 5 colum is 255,it means that this read is an uniquely mapped read,which also contain a string"NH:i:1" in the last field.You can see the statement of bam/sam format file for detail.
          Also,you can use another command as follow,which will do the same as the first one:
          Code:
          samtools view accepted_hits.bam | grep -w "NH:i:1" > uniquely_mapped_reads.sam
          Hi,

          When I read the SAM file format, it states:
          "MAPQ: ...A value 255 indicates that the mapping quality is not available."

          Does that mean uniquely mapped?

          thanks.

          Comment


          • #6
            Originally posted by simonandrews View Post
            We use -g 1 as part of our standard tophat pipeline. Seems to work OK and definitely doesn't produce the same results as running with the default options.
            For RNA-seq study, is it too stringent to use -g 1? I recall I read somewhere, that one will miss gene famlily memebers.

            what number is more optimal? the default is 20, i believe.

            Comment


            • #7
              Originally posted by JQL View Post
              Hi,

              When I read the SAM file format, it states:
              "MAPQ: ...A value 255 indicates that the mapping quality is not available."

              Does that mean uniquely mapped?

              thanks.
              For Tophat it does.

              I am not sure who made the following observation on a Galaxy mailing list but it is what I refer to:

              Tophat/bowtie don't report mapping quality
              values that are as meaningful as BWA, but there is some information in the
              mapping quality values tophat reports. Tophat yields 4 distinct values for
              its mapping quality values (you can do a "unique" count on the mapping
              quality field of any SAM file from tophat to verify this):



              255 = unique mapping

              3 = maps to 2 locations in the target

              2 = maps to 3 locations

              1 = maps to 4-9 locations

              0 = maps to 10 or more locations.

              Comment


              • #8
                Originally posted by westerman View Post
                For Tophat it does.

                I am not sure who made the following observation on a Galaxy mailing list but it is what I refer to:
                thanks. that info is helpful. It is makes senes to me now. Because it takes two mappings to calculate quality. So uniquely mapped reads do not have mapping quality to report, hence, 255.

                Comment


                • #9
                  Originally posted by westerman View Post
                  For Tophat it does.

                  I am not sure who made the following observation on a Galaxy mailing list but it is what I refer to:
                  The information is helpful. Does tophat in Galaxy adopt such protocal?

                  However, I found .bam file from tophat.v2 used a different version:
                  255: unmapped reads
                  50: unique mapped reads
                  3: two mapped locations
                  1: 3 or 4 mapped locations
                  0: >4 mapped locations

                  Comment


                  • #10
                    Originally posted by Triple_W View Post
                    The information is helpful. Does tophat in Galaxy adopt such protocal?

                    However, I found .bam file from tophat.v2 used a different version:
                    255: unmapped reads
                    50: unique mapped reads
                    3: two mapped locations
                    1: 3 or 4 mapped locations
                    0: >4 mapped locations
                    You're right.I also find this change in tophat v2.0.4. In my opinion,maybe the best way to find the uniquely mapped reads is using the "NH:i:1" tag in the last field of every alignment.

                    Comment


                    • #11
                      With version v2.0.6 I do not find this exact behavior. But perhaps it is because my reads get separated into 'accepted_hits.bam' and 'unmapped.bam'. The latter does, indeed, have all reads at mapq 255 but the former does not. The former has the mapq of 50, 3, 1 or 0. So when I look at the "interesting" reads -- i.e., those that map in accepted_hits.bam -- then just doing a samtools filtering by mapq is good enough.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-27-2024, 06:37 PM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-27-2024, 06:07 PM
                      0 responses
                      11 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      53 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      69 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X