Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How can I determine the mapping rates of tophat output such as accepted_hits.bam?

    I used TopHat to run the same RNA-Seq data with different -r/--mate-inner-dist and --mate-std-dev.

    Here are the parameters:
    1. -r 160, --mate-std-dev (default) 20
    2. -r (default) 50, --mate-std-dev (default) 20
    3. -r 0, --mate-std-dev 60

    After the TopHat runned, I used the samtools flagstat to estimates the results.

    The results are listed below in order:
    1.-r 160, --mate-std-dev (default) 20
    Code:
    27139030 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    27139030 + 0 mapped (100.00%:-nan%)
    27139030 + 0 paired in sequencing
    14171642 + 0 read1
    12967388 + 0 read2
    22063409 + 0 properly paired (81.30%:-nan%)
    24154960 + 0 with itself and mate mapped
    2984070 + 0 singletons (11.00%:-nan%)
    516422 + 0 with mate mapped to a different chr
    217580 + 0 with mate mapped to a different chr (mapQ>=5)
    4141901 + 2091 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (0.00%:0.00%)
    4141901 + 2091 paired in sequencing
    1533088 + 997 read1
    2608813 + 1094 read2
    0 + 0 properly paired (0.00%:0.00%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (0.00%:0.00%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)
    2.-r (default) 50, --mate-std-dev (default) 20
    Code:
    27639199 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    27639199 + 0 mapped (100.00%:-nan%)
    27639199 + 0 paired in sequencing
    14422450 + 0 read1
    13216749 + 0 read2
    21085751 + 0 properly paired (76.29%:-nan%)
    24654856 + 0 with itself and mate mapped
    2984343 + 0 singletons (10.80%:-nan%)
    706460 + 0 with mate mapped to a different chr
    215918 + 0 with mate mapped to a different chr (mapQ>=5)
    4142842 + 2091 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (0.00%:0.00%)
    4142842 + 2091 paired in sequencing
    1533869 + 997 read1
    2608973 + 1094 read2
    0 + 0 properly paired (0.00%:0.00%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (0.00%:0.00%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)
    3. -r 0, --mate-std-dev 60
    Code:
    41145664 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    41145664 + 0 mapped (100.00%:-nan%)
    41145664 + 0 paired in sequencing
    21422982 + 0 read1
    19722682 + 0 read2
    22975306 + 0 properly paired (55.84%:-nan%)
    37774543 + 0 with itself and mate mapped
    3371121 + 0 singletons (8.19%:-nan%)
    10967682 + 0 with mate mapped to a different chr
    207758 + 0 with mate mapped to a different chr (mapQ>=5)
    2934463 + 2091 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    0 + 0 mapped (0.00%:0.00%)
    2934463 + 2091 paired in sequencing
    906826 + 997 read1
    2027637 + 1094 read2
    0 + 0 properly paired (0.00%:0.00%)
    0 + 0 with itself and mate mapped
    0 + 0 singletons (0.00%:0.00%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)
    As the total input reads of the sample were 31387112, so at first I felt confusing about the result 3, because the total output reads of accepted_hits.bam were much more than the total input reads.

    After I checked the bam file, I found there were lots of repeats because of the multihits.

    So the results I've got from the samtools flagstat were not that accurate.
    Is there any way to estimates the mapping rates and unique mapping rates or anything else?

    Hoping for your help!

  • #2
    Try these to see if they suit your needs.

    BAMStats: http://bamstats.sourceforge.net/

    Bam_utils: http://genome.sph.umich.edu/wiki/BamUtil:_stats

    Comment


    • #3
      Originally posted by lucyyang1991 View Post
      After I checked the bam file, I found there were lots of repeats because of the multihits.

      So the results I've got from the samtools flagstat were not that accurate.
      Is there any way to estimates the mapping rates and unique mapping rates or anything else?
      Hi- A quick shortcut to get the mapping rate is to count reads in the bam file where tophat puts the unmapped reads, called unmapped.bam or something like that. Your mapping rate than would be (tot reads - reads in unmapped.bam)/tot reads. For uniquely mapped reads you could use the mapq score if tophat sets correctly to reflect uniqueness of mapping.

      Dario

      Comment


      • #4
        Originally posted by GenoMax View Post
        Try these to see if they suit your needs.

        BAMStats: http://bamstats.sourceforge.net/
        Thanks a lot for your help!
        I've downloaded the BAMStats. After I unzip the 'BAMStats-1.25-src.zip', I couldn't find the 'BAMStats-GUI-1.25.jar' and didn't know how to use the program even when I was told to run
        Code:
        java -Xmx4g -jar BAMStats-1.25.jar -i <bam file>
        .

        Comment


        • #5
          Originally posted by GenoMax View Post
          Try these to see if they suit your needs.

          BAMStats: http://bamstats.sourceforge.net/

          Bam_utils: http://genome.sph.umich.edu/wiki/BamUtil:_stats
          Hi,
          I've tried all the methods, and find out that BamUtil give the same result with samtools flagstat. So, I still can't estimate which parameter is better because they just can't rule out the repeats due to multihits.

          Comment


          • #6
            If you are only interested in uniquely mapped reads then see post #14 in this thread: http://seqanswers.com/forums/showthread.php?t=25096

            Here is one more option for summarizing read mappings: http://bioinf.wehi.edu.au/featureCounts/

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            47 views
            0 likes
            Last Post seqadmin  
            Working...
            X