Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • shurjo
    Senior Member
    • Jan 2009
    • 132

    properly paired reads in TopHat output

    Hi,

    The samtools flagstat results for a TopHat BAM file from my data are reproduced below. I am a little concerned about the percentage of "properly paired" reads (~33%). I guess that in RNA-Seq data, the two reads in a pair may be mapping to different exons, which may cause the pair to deviate from whatever the definition for proper pairing is. Does anyone have any comments on whether these numbers may be within reasonable limits for a RNA-Seq data set?

    Code:
    106172059 in total
    0 QC failure
    0 duplicates
    106172059 mapped (100.00%)
    106172059 paired in sequencing
    53427448 read1
    52744611 read2
    34923852 properly paired (32.89%)
    89669616 with itself and mate mapped
    16502443 singletons (15.54%)
    0 with mate mapped to a different chr
    0 with mate mapped to a different chr (mapQ>=5)
    The reads were PE 76bp, the insert size was 200bp and the syntax used to run TopHat is:
    Code:
    tophat -o 108971.testing.out -m 2 -p 4 -r 200 --mate-std-dev 50 -g 10 /home/sensh/bin/bowtie-0.12.7/indexes/hg18_inclusive 108971.read1.fastq 108971.read2.fastq
    Thanks,

    Shurjo
  • kopi-o
    Senior Member
    • Feb 2008
    • 319

    #2
    I am curious about this issue as well. I have several human RNA-seq data sets where the percentage of "properly paired" reads was about 50% in TopHat output, but much higher in bwa output. Of course, this depends both on the actual mapping and the criterion that each aligner uses to set the "properly paired" flag in the BAM file. In my case, there were some 20% spliced alignments in the TopHat output, which would explain part of the remaining "non-properly paired reads".

    Does anyone know what criterion TopHat uses for setting the "properly paired" flag?

    Comment

    • shurjo
      Senior Member
      • Jan 2009
      • 132

      #3
      Actually, I just looked at the FLAG field and I am even more puzzled. Issuing this command:
      Code:
       samtools view accepted_hits.bam | awk  '{print $2}' | sort | uniq
      returns only 0 and 16. So, I am not sure how samtools flagstat is even deciding which reads are properly paired, given that neither of these flags indicates even pairing, let alone correct pairing. At this point, I suspect that the TopHat BAM is not really doing much with this field, so a different approach may be needed to examine what percentage of PE reads are paired in the output.

      Comment

      Latest Articles

      Collapse

      • SEQadmin2
        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
        by SEQadmin2


        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

        Here are nine questions we think about, in roughly the order they matter, before...
        06-18-2026, 07:11 AM
      • SEQadmin2
        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
        by SEQadmin2


        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
        ...
        06-02-2026, 10:05 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-26-2026, 11:10 AM
      0 responses
      10 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-17-2026, 06:09 AM
      0 responses
      45 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-09-2026, 11:58 AM
      0 responses
      105 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      125 views
      0 reactions
      Last Post SEQadmin2  
      Working...