Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to get uniquely mapped reads from Tophat

    Hi,

    I was wondering if anyone has used the "--max-multihits" option in Tophat. I specified "--max-multihits 1" in my command line with the hope of getting uniquely mapped reads; however, when I checked the run.log file, I noticed that the program still uses the default "--max-multihits 40". It seems that specifying the value of --max-multihits doesn't make a change in the analysis.

    Has anyone experienced similar problems before?

    Thanks,

    Subeet

  • #2
    I've used -g 1 successfully, which is the same as --max-multihits according to the manual.

    Comment


    • #3
      If you want to get uniquely mapped reads from Tophat output bam file,you can use the command like this:
      Code:
      samtools view accepted_hits.bam | awk '$5==255{print $0}' > uniq_mapped_reads.sam
      In the bam file ,when the 5 colum is 255,it means that this read is an uniquely mapped read,which also contain a string"NH:i:1" in the last field.You can see the statement of bam/sam format file for detail.
      Also,you can use another command as follow,which will do the same as the first one:
      Code:
      samtools view accepted_hits.bam | grep -w "NH:i:1" > uniquely_mapped_reads.sam

      Comment


      • #4
        We use -g 1 as part of our standard tophat pipeline. Seems to work OK and definitely doesn't produce the same results as running with the default options.

        Comment


        • #5
          Originally posted by wisense View Post
          If you want to get uniquely mapped reads from Tophat output bam file,you can use the command like this:
          Code:
          samtools view accepted_hits.bam | awk '$5==255{print $0}' > uniq_mapped_reads.sam
          In the bam file ,when the 5 colum is 255,it means that this read is an uniquely mapped read,which also contain a string"NH:i:1" in the last field.You can see the statement of bam/sam format file for detail.
          Also,you can use another command as follow,which will do the same as the first one:
          Code:
          samtools view accepted_hits.bam | grep -w "NH:i:1" > uniquely_mapped_reads.sam
          Hi,

          When I read the SAM file format, it states:
          "MAPQ: ...A value 255 indicates that the mapping quality is not available."

          Does that mean uniquely mapped?

          thanks.

          Comment


          • #6
            Originally posted by simonandrews View Post
            We use -g 1 as part of our standard tophat pipeline. Seems to work OK and definitely doesn't produce the same results as running with the default options.
            For RNA-seq study, is it too stringent to use -g 1? I recall I read somewhere, that one will miss gene famlily memebers.

            what number is more optimal? the default is 20, i believe.

            Comment


            • #7
              Originally posted by JQL View Post
              Hi,

              When I read the SAM file format, it states:
              "MAPQ: ...A value 255 indicates that the mapping quality is not available."

              Does that mean uniquely mapped?

              thanks.
              For Tophat it does.

              I am not sure who made the following observation on a Galaxy mailing list but it is what I refer to:

              Tophat/bowtie don't report mapping quality
              values that are as meaningful as BWA, but there is some information in the
              mapping quality values tophat reports. Tophat yields 4 distinct values for
              its mapping quality values (you can do a "unique" count on the mapping
              quality field of any SAM file from tophat to verify this):



              255 = unique mapping

              3 = maps to 2 locations in the target

              2 = maps to 3 locations

              1 = maps to 4-9 locations

              0 = maps to 10 or more locations.

              Comment


              • #8
                Originally posted by westerman View Post
                For Tophat it does.

                I am not sure who made the following observation on a Galaxy mailing list but it is what I refer to:
                thanks. that info is helpful. It is makes senes to me now. Because it takes two mappings to calculate quality. So uniquely mapped reads do not have mapping quality to report, hence, 255.

                Comment


                • #9
                  Originally posted by westerman View Post
                  For Tophat it does.

                  I am not sure who made the following observation on a Galaxy mailing list but it is what I refer to:
                  The information is helpful. Does tophat in Galaxy adopt such protocal?

                  However, I found .bam file from tophat.v2 used a different version:
                  255: unmapped reads
                  50: unique mapped reads
                  3: two mapped locations
                  1: 3 or 4 mapped locations
                  0: >4 mapped locations

                  Comment


                  • #10
                    Originally posted by Triple_W View Post
                    The information is helpful. Does tophat in Galaxy adopt such protocal?

                    However, I found .bam file from tophat.v2 used a different version:
                    255: unmapped reads
                    50: unique mapped reads
                    3: two mapped locations
                    1: 3 or 4 mapped locations
                    0: >4 mapped locations
                    You're right.I also find this change in tophat v2.0.4. In my opinion,maybe the best way to find the uniquely mapped reads is using the "NH:i:1" tag in the last field of every alignment.

                    Comment


                    • #11
                      With version v2.0.6 I do not find this exact behavior. But perhaps it is because my reads get separated into 'accepted_hits.bam' and 'unmapped.bam'. The latter does, indeed, have all reads at mapq 255 but the former does not. The former has the mapq of 50, 3, 1 or 0. So when I look at the "interesting" reads -- i.e., those that map in accepted_hits.bam -- then just doing a samtools filtering by mapq is good enough.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Choosing Between NGS and qPCR
                        by seqadmin



                        Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                        10-18-2024, 07:11 AM
                      • seqadmin
                        Non-Coding RNA Research and Technologies
                        by seqadmin




                        Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                        Nobel Prize for MicroRNA Discovery
                        This week,...
                        10-07-2024, 08:07 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:09 AM
                      0 responses
                      10 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-30-2024, 05:31 AM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-24-2024, 06:58 AM
                      0 responses
                      21 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-23-2024, 08:43 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X