Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jdsv
    Junior Member
    • Mar 2011
    • 3

    DEXSeq vs htseq-count/DESeq counting model

    I have found DESeq very useful and am giving DEXSeq a spin. After reading through the DEXSeq vignette, I thought the most efficient path would be to do read counting with the included 'dexseq_count.py' script, use the output for DEXSeq analysis, and then use the 'geneCountTable' function to get per-gene counts for DESeq.

    I already have counts tables for some of these replicates generated using 'htseq-count' with the union model. A quick look through the 'dexseq_count.py' source suggests that it also uses the same union model, so I did some quick comparisons to make sure the results were consistent. However, the number of counts generated for each gene by 'htseq-count' are usually less than the sum of the exon counts generated by dexseq_count.py for the same dataset.

    It appears that when reads are split over exon boundaries, 'dexseq_count.py' includes the read in each exon count. This results in a summed read count for the gene that is higher than the actual number of reads mapping to it. As far as I can tell, 'geneCountTable' simply sums up the exon counts, and so its per-gene output for genes with spliced mapped reads with be artifically high.

    I'm wondering if Simon or the other authors of DEXSeq and DESeq (or anyone else) has any input on this?

    Thanks,
    Jeremy
  • Simon Anders
    Senior Member
    • Feb 2010
    • 995

    #2
    Your analysis is correct. This is why we still have two script and recommend not using dexseq_count.py to produce input for DESeq. Maybe we should put a more explicit warning about this in the help page for 'geneCountTable' (or, better, simply remove this function; it was only an experimental one anyway that serves little purpose in DEXSeq).

    Comment

    • jdsv
      Junior Member
      • Mar 2011
      • 3

      #3
      Okay, thanks for the reply. I will stick with generating separate count tables.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Pathogen Surveillance with Advanced Genomic Tools
        by seqadmin




        The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
        03-24-2025, 11:48 AM
      • seqadmin
        New Genomics Tools and Methods Shared at AGBT 2025
        by seqadmin


        This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

        The Headliner
        The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
        03-03-2025, 01:39 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-20-2025, 05:03 AM
      0 responses
      49 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-19-2025, 07:27 AM
      0 responses
      57 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-18-2025, 12:50 PM
      0 responses
      50 views
      0 reactions
      Last Post seqadmin  
      Started by seqadmin, 03-03-2025, 01:15 PM
      0 responses
      201 views
      0 reactions
      Last Post seqadmin  
      Working...