Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • rajesh1989
    Junior Member
    • Feb 2015
    • 7

    what is wrong with samtools flagstat or read mapping with tophat?

    I have 6,673,385 (around 6 million) reads in each pair end file after quality filtering. but when i map it using tophat and run samtools flagstat on bam file it gives following output
    1343686 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    1343686 + 0 mapped (100.00%:-nan%)
    1343686 + 0 paired in sequencing
    670808 + 0 read1
    672878 + 0 read2
    1203600 + 0 properly paired (89.57%:-nan%)
    1311198 + 0 with itself and mate mapped
    32488 + 0 singletons (2.42%:-nan%)
    15874 + 0 with mate mapped to a different chr
    452 + 0 with mate mapped to a different chr (mapQ>=5)
    I am not very sure how to interpret samtools flagstat output, but as i assume there are only 670808 (around 0.6 million) reads in pair1 are mapped 672878 (around 0.6 million) from pair2. is it correct? That is 1/10 th of the total input reads. where are rest of my reads???

    Report produced by tophat shows some other statistics

    Left reads:
    Input : 468668
    Mapped : 443344 (94.6% of input)
    of these: 216780 (48.9%) have multiple alignments (1 have >20)
    Right reads:
    Input : 468668
    Mapped : 444468 (94.8% of input)
    of these: 217699 (49.0%) have multiple alignments (1 have >20)
    94.7% overall read mapping rate.
    Aligned pairs: 433356
    of these: 211726 (48.9%) have multiple alignments
    5512 ( 1.3%) are discordant alignments
    91.3% concordant pair alignment rate.

    why is tophat saying it mapped around 94% of reads when there are around 6 million reads in beginning?
    how to interpret all these numbers thank you.
  • gringer
    David Eccles (gringer)
    • May 2011
    • 845

    #2
    samtools flagstat can only report what's in the file, so if there are no unmapped reads in the BAM file then the calculated mapping rate will be 100% (with some reduction in that due to unpaired and low-quality mappings, if included).

    Comment

    • rajesh1989
      Junior Member
      • Feb 2015
      • 7

      #3
      thank you for the reply.
      this is output of tophat prep_reads.info

      left_min_read_len=25
      left_max_read_len=101
      left_reads_in =6673385
      left_reads_out=6667431
      right_min_read_len=25
      right_max_read_len=101
      right_reads_in =6673385
      right_reads_out=6673220

      where are rest of the reads if tophat didn't map them. i also checked unmapped.bam it's size is very small.

      Comment

      • fanli
        Senior Member
        • Jul 2014
        • 197

        #4
        Originally posted by rajesh1989 View Post
        Report produced by tophat shows some other statistics

        Left reads:
        Input : 468668
        Mapped : 443344 (94.6% of input)
        of these: 216780 (48.9%) have multiple alignments (1 have >20)
        Right reads:
        Input : 468668
        Mapped : 444468 (94.8% of input)
        of these: 217699 (49.0%) have multiple alignments (1 have >20)
        94.7% overall read mapping rate.
        Aligned pairs: 433356
        of these: 211726 (48.9%) have multiple alignments
        5512 ( 1.3%) are discordant alignments
        91.3% concordant pair alignment rate.
        This says your input to tophat is only ~460k read pairs. This directly contradicts what you posted in the prep_reads.info. Are you sure you don't have mismatched files?

        Comment

        • rajesh1989
          Junior Member
          • Feb 2015
          • 7

          #5
          Hello,

          whatever i have written here is correct i just copied details and pasted here.

          what do you mean by mismatched files?

          that is my actual query why tophat is taking only ~460k read pairs?

          Comment

          • fanli
            Senior Member
            • Jul 2014
            • 197

            #6
            Like the prep_reads.info is from one sample and the tophat align_summary is from another?

            Comment

            • rajesh1989
              Junior Member
              • Feb 2015
              • 7

              #7
              no they are not in very same folder i have those two files.

              Comment

              • fanli
                Senior Member
                • Jul 2014
                • 197

                #8
                Perhaps you have mixed up files in your script. You may want to check the logs in your tophat output directory.

                As an example, here's what my align_summary.txt looks like:
                Code:
                Left reads:
                          Input     :   6551998
                           Mapped   :   5980941 (91.3% of input)
                            of these:    199516 ( 3.3%) have multiple alignments (10560 have >10)
                Right reads:
                          Input     :   6551998
                           Mapped   :   5574400 (85.1% of input)
                            of these:    184354 ( 3.3%) have multiple alignments (10346 have >10)
                88.2% overall read mapping rate.
                
                Aligned pairs:   5394272
                     of these:    177939 ( 3.3%) have multiple alignments
                                  148603 ( 2.8%) are discordant alignments
                80.1% concordant pair alignment rate.
                and the corresponding prep_reads.info:
                Code:
                left_min_read_len=75
                left_max_read_len=75
                left_reads_in =6551998
                left_reads_out=6544622
                right_min_read_len=75
                right_max_read_len=75
                right_reads_in =6551998
                right_reads_out=6495499
                Note that both files refer to 6551998 as the number of read pairs input.

                Comment

                • rajesh1989
                  Junior Member
                  • Feb 2015
                  • 7

                  #9
                  i got the answer. i think this is some issue with multi threading. when i run tophat on single core i get correct results. other peoples have also reported this issue.

                  Comment

                  Latest Articles

                  Collapse

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Yesterday, 10:09 AM
                  0 responses
                  10 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  19 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  26 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  21 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...