Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESEq2 seems to under report counts for SE reads

    Here are the Align summary data from TopHat for single-end reads. These SE Reads are NOT stranded.

    cat /PATH/align_summary.txt | grep Mapped
    Mapped : 66096436 (97.2% of input)
    Mapped : 59491205 (97.4% of input)
    Mapped : 58752388 (97.4% of input)
    Mapped : 61205947 (97.4% of input)

    So an average number of reads of about ~60 million

    The default options for htseq-count are ( i.e. Like I do for paired-end reads): htseq-count -f bam -r name accepted_hits_sorted_QN.bam (name sorted bam)

    21237878 22854704 23350366 21377177 (sums for four conditions -2 control 2 experimental)


    So ~21 million. Roughly one third of the reads are reported.


    Just running DESeq2 on the UN-sorted bam (i.e. The original *bam file from TopHat) with default value for “s” (default is “yes” assumed stranded).
    Gives
    23794422

    Unsorted bam with “s” = no " htseq-count -f bam -r name –s no accepted_hits.bam" (original bam from TopHat)
    47138205
    Name Sorted bam with “s” =no "htseq-count -f bam -r name –s no accepted_hits_sorted_QN.bam” (name sorted bam)

    44074576
    The closest I can get to number of reads reported by TopHat is 47/60 million.

    Can someone explain why I might be seeing the low number of counts relative to the number of reported reads.

    The issue does NOT appear for paired-end reads. For PE reads the counts reported TopHat matches closely the sum of counts reported by htseq-counts.

  • #2
    hi,

    Just to get more accurate attention, note that the title of the post might better mention htseq-count, which is the software in question and not DESeq2 (though DESeq2 does have an import function for htseq-count files, these are separate software packages).

    I think the -r name option is only for use with paired-end files. Otherwise, I would guess that the file should be pos sorted.

    Other questions I would have are to check the quality scores for these BAMs, as there is a quality score filter in htseq-count.

    Also what is the organism and what is the GTF file you are using?

    Comment


    • #3
      For what it's worth, it's MUCH faster to just use featureCounts rather than htseq-count.

      Comment


      • #4
        Good point about the subject line
        Mm_UCSC_Mm10_genome.gtf was used for TopHat/Bowtie2 to make the bam and by htseq-count to count

        I believe default TopHat/Bowtie2 bams are coordinate/position sorted already so I am asking htseq-count to examine the original bam with no r option and s also set to no

        Thanks for the tip.
        Among other careless things, my tests did not include the r option as indicated in the original post.
        The best count is still 47 million with "htseq-count -f bam -s no " and the unaltered TopHat bam

        I'll check out featureCounts

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        39 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        41 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        35 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X