SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
[DEXSeq] exon counts to "PSI" (exon inclusion level) yerbol Bioinformatics 3 11-23-2015 05:32 PM
Comparing DEXSeq counting bin and real exons alittleboy Bioinformatics 5 10-13-2015 06:02 AM
DEXSeq questions on power and counting bins alittleboy Bioinformatics 0 07-02-2013 11:00 AM
DEXSeq gene level counts Julien Roux Bioinformatics 3 11-28-2012 12:31 AM
DEXSeq vs htseq-count/DESeq counting model jdsv Bioinformatics 2 11-20-2011 07:48 PM

Reply
 
Thread Tools
Old 05-07-2014, 08:10 AM   #1
Yohann
Junior Member
 
Location: Montréal

Join Date: Aug 2013
Posts: 7
Question DEXSeq - Counting with HT-seq at the exon level

Hi!

I'm currently looking for differently expressed isoforms and got curious about the behaviour of HT-seq when counting exons. Basically, I wanted to check if those two methods would give me really close results when estimating genes expression :
  • counting at the gene level with HT-seq (HTg)
  • counting at the exon level, then summing all the exons per gene (HTe)

The correlation is not that great (see attached pdf) and there is a global trend of higher counts from my HTe method. Some of the highlighted genes have crazy differences between the two methods :
Code:
ensembl_gene_id value_HTg value_HTe ratio
ENSG00000205336 21 6806 0.003231967
ENSG00000165795 73 21996 0.003364095
When looking in a genome browser for ENSG00000205336, I can count 21 mapping reads : it fits with HTg !
I believe that if a read is mapping on a splicing junction, it will be counted 2 times when using HT-seq at the exon level and may explain some of the differences.

In the first steps of the DEXSeq analysis, we have to process a GTF file (from Ensembl, for example) to obtain a GFF with "collapsed" exons from different transcripts of the same gene. For my example gene, the script "dexseq_prepare_annotation.py" generates really small "exonic_part", some have a length of 1bp !

GTF for ENSG00000205336
GFF for ENSG00000205336

Is it expected ?
I think that each of those exonic parts will be treated as an exon when doing the DEXSeq analysis, could it be a problem ?

Thanks for your help!

EDIT :
If found that thread is pretty similar to my question :
http://seqanswers.com/forums/showthread.php?t=25003

It seems that I can't sum the different exonic parts to estimate the gene value as a read can be counted multiple times.
Attached Files
File Type: pdf Correlation_HT-seq_gene_and_exons_expression.pdf (943.9 KB, 7 views)

Last edited by Yohann; 05-07-2014 at 09:01 AM. Reason: found related thread
Yohann is offline   Reply With Quote
Old 05-09-2014, 11:35 AM   #2
Wolfgang Huber
Senior Member
 
Location: Heidelberg, Germany

Join Date: Aug 2009
Posts: 109
Default

Dear Yohann

thanks for the feedback. All of that behaviour is intended, and the rationale behind it is described in the DEXSeq Paper. Briefly, reads that touch multiple counting bins provide evidence for the presence of each of the bins, therefore they are counted for each of the bins. The evidence is not independent, but since the testing in DEXSeq is marginal (bin by bin), the dependence is not a problem (in the same way that the dependence in expression of different genes is not a problem for gene-by-gene testing methods). Therefore the sum of bins counts is typically larger than the gene count.

Second, exons are split up by the preparation script into multiple parts (bins) if the GTF file has different boundaries for them. This is not always pretty, and could probably be coarse-grained in some cases (and you are welcome to do your own manual or automated curation of counting bins to this end!).

Hope this.

Kind regards
Wolfgang
__________________
Wolfgang Huber
EMBL
Wolfgang Huber is offline   Reply With Quote
Reply

Tags
dexseq, exons, gene expression, ht-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:34 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO