Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to get uniquely mapped reads from Tophat

    Hi,

    I was wondering if anyone has used the "--max-multihits" option in Tophat. I specified "--max-multihits 1" in my command line with the hope of getting uniquely mapped reads; however, when I checked the run.log file, I noticed that the program still uses the default "--max-multihits 40". It seems that specifying the value of --max-multihits doesn't make a change in the analysis.

    Has anyone experienced similar problems before?

    Thanks,

    Subeet

  • #2
    I've used -g 1 successfully, which is the same as --max-multihits according to the manual.

    Comment


    • #3
      If you want to get uniquely mapped reads from Tophat output bam file,you can use the command like this:
      Code:
      samtools view accepted_hits.bam | awk '$5==255{print $0}' > uniq_mapped_reads.sam
      In the bam file ,when the 5 colum is 255,it means that this read is an uniquely mapped read,which also contain a string"NH:i:1" in the last field.You can see the statement of bam/sam format file for detail.
      Also,you can use another command as follow,which will do the same as the first one:
      Code:
      samtools view accepted_hits.bam | grep -w "NH:i:1" > uniquely_mapped_reads.sam

      Comment


      • #4
        We use -g 1 as part of our standard tophat pipeline. Seems to work OK and definitely doesn't produce the same results as running with the default options.

        Comment


        • #5
          Originally posted by wisense View Post
          If you want to get uniquely mapped reads from Tophat output bam file,you can use the command like this:
          Code:
          samtools view accepted_hits.bam | awk '$5==255{print $0}' > uniq_mapped_reads.sam
          In the bam file ,when the 5 colum is 255,it means that this read is an uniquely mapped read,which also contain a string"NH:i:1" in the last field.You can see the statement of bam/sam format file for detail.
          Also,you can use another command as follow,which will do the same as the first one:
          Code:
          samtools view accepted_hits.bam | grep -w "NH:i:1" > uniquely_mapped_reads.sam
          Hi,

          When I read the SAM file format, it states:
          "MAPQ: ...A value 255 indicates that the mapping quality is not available."

          Does that mean uniquely mapped?

          thanks.

          Comment


          • #6
            Originally posted by simonandrews View Post
            We use -g 1 as part of our standard tophat pipeline. Seems to work OK and definitely doesn't produce the same results as running with the default options.
            For RNA-seq study, is it too stringent to use -g 1? I recall I read somewhere, that one will miss gene famlily memebers.

            what number is more optimal? the default is 20, i believe.

            Comment


            • #7
              Originally posted by JQL View Post
              Hi,

              When I read the SAM file format, it states:
              "MAPQ: ...A value 255 indicates that the mapping quality is not available."

              Does that mean uniquely mapped?

              thanks.
              For Tophat it does.

              I am not sure who made the following observation on a Galaxy mailing list but it is what I refer to:

              Tophat/bowtie don't report mapping quality
              values that are as meaningful as BWA, but there is some information in the
              mapping quality values tophat reports. Tophat yields 4 distinct values for
              its mapping quality values (you can do a "unique" count on the mapping
              quality field of any SAM file from tophat to verify this):



              255 = unique mapping

              3 = maps to 2 locations in the target

              2 = maps to 3 locations

              1 = maps to 4-9 locations

              0 = maps to 10 or more locations.

              Comment


              • #8
                Originally posted by westerman View Post
                For Tophat it does.

                I am not sure who made the following observation on a Galaxy mailing list but it is what I refer to:
                thanks. that info is helpful. It is makes senes to me now. Because it takes two mappings to calculate quality. So uniquely mapped reads do not have mapping quality to report, hence, 255.

                Comment


                • #9
                  Originally posted by westerman View Post
                  For Tophat it does.

                  I am not sure who made the following observation on a Galaxy mailing list but it is what I refer to:
                  The information is helpful. Does tophat in Galaxy adopt such protocal?

                  However, I found .bam file from tophat.v2 used a different version:
                  255: unmapped reads
                  50: unique mapped reads
                  3: two mapped locations
                  1: 3 or 4 mapped locations
                  0: >4 mapped locations

                  Comment


                  • #10
                    Originally posted by Triple_W View Post
                    The information is helpful. Does tophat in Galaxy adopt such protocal?

                    However, I found .bam file from tophat.v2 used a different version:
                    255: unmapped reads
                    50: unique mapped reads
                    3: two mapped locations
                    1: 3 or 4 mapped locations
                    0: >4 mapped locations
                    You're right.I also find this change in tophat v2.0.4. In my opinion,maybe the best way to find the uniquely mapped reads is using the "NH:i:1" tag in the last field of every alignment.

                    Comment


                    • #11
                      With version v2.0.6 I do not find this exact behavior. But perhaps it is because my reads get separated into 'accepted_hits.bam' and 'unmapped.bam'. The latter does, indeed, have all reads at mapq 255 but the former does not. The former has the mapq of 50, 3, 1 or 0. So when I look at the "interesting" reads -- i.e., those that map in accepted_hits.bam -- then just doing a samtools filtering by mapq is good enough.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Advancing Precision Medicine for Rare Diseases in Children
                        by seqadmin




                        Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                        12-16-2024, 07:57 AM
                      • seqadmin
                        Recent Advances in Sequencing Technologies
                        by seqadmin



                        Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                        Long-Read Sequencing
                        Long-read sequencing has seen remarkable advancements,...
                        12-02-2024, 01:49 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 12-17-2024, 10:28 AM
                      0 responses
                      26 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-13-2024, 08:24 AM
                      0 responses
                      43 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-12-2024, 07:41 AM
                      0 responses
                      29 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 12-11-2024, 07:45 AM
                      0 responses
                      42 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X