Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • properly paired reads in TopHat output

    Hi,

    The samtools flagstat results for a TopHat BAM file from my data are reproduced below. I am a little concerned about the percentage of "properly paired" reads (~33%). I guess that in RNA-Seq data, the two reads in a pair may be mapping to different exons, which may cause the pair to deviate from whatever the definition for proper pairing is. Does anyone have any comments on whether these numbers may be within reasonable limits for a RNA-Seq data set?

    Code:
    106172059 in total
    0 QC failure
    0 duplicates
    106172059 mapped (100.00%)
    106172059 paired in sequencing
    53427448 read1
    52744611 read2
    34923852 properly paired (32.89%)
    89669616 with itself and mate mapped
    16502443 singletons (15.54%)
    0 with mate mapped to a different chr
    0 with mate mapped to a different chr (mapQ>=5)
    The reads were PE 76bp, the insert size was 200bp and the syntax used to run TopHat is:
    Code:
    tophat -o 108971.testing.out -m 2 -p 4 -r 200 --mate-std-dev 50 -g 10 /home/sensh/bin/bowtie-0.12.7/indexes/hg18_inclusive 108971.read1.fastq 108971.read2.fastq
    Thanks,

    Shurjo

  • #2
    I am curious about this issue as well. I have several human RNA-seq data sets where the percentage of "properly paired" reads was about 50% in TopHat output, but much higher in bwa output. Of course, this depends both on the actual mapping and the criterion that each aligner uses to set the "properly paired" flag in the BAM file. In my case, there were some 20% spliced alignments in the TopHat output, which would explain part of the remaining "non-properly paired reads".

    Does anyone know what criterion TopHat uses for setting the "properly paired" flag?

    Comment


    • #3
      Actually, I just looked at the FLAG field and I am even more puzzled. Issuing this command:
      Code:
       samtools view accepted_hits.bam | awk  '{print $2}' | sort | uniq
      returns only 0 and 16. So, I am not sure how samtools flagstat is even deciding which reads are properly paired, given that neither of these flags indicates even pairing, let alone correct pairing. At this point, I suspect that the TopHat BAM is not really doing much with this field, so a different approach may be needed to examine what percentage of PE reads are paired in the output.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      47 views
      0 likes
      Last Post seqadmin  
      Working...
      X