SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Making exonCountSet object using DEXSeq in R KHubbard Bioinformatics 1 09-26-2012 09:26 AM
DEXSeq - creating new ExonCountSet from scratch ahmetz Bioinformatics 1 10-21-2011 06:04 AM
Bfast index creation guillaum Bioinformatics 3 04-02-2010 09:40 AM
details of *.csfasta creation d17 SOLiD 4 02-21-2010 10:33 AM
New 454 emulsion creation boss_hoss 454 Pyrosequencing 0 12-14-2009 09:00 AM

Reply
 
Thread Tools
Old 06-19-2013, 11:29 AM   #1
alittleboy
Member
 
Location: USA

Join Date: Apr 2011
Posts: 60
Question creation of ExonCountSet in DEXSeq

Hi,

I am using DEXSeq for differential exon usage tests. In the vignette of producing ExonCountSet object, the author used dexseq_prepare_annotation.py to convert the GTF file to GFF file. In the GFF output, I see that (since the GTF is downloaded from Ensembl) the gene_id's start with "ENSG".

I know that the next step is to use dexseq_count.py on the GFF and SAM files to generate counts. However, because currently we have the count data file (which we prefer to use), we are hoping to use our own counts (i.e. the treated2fb.txt as in the vignette example) for the analysis. The issue is that, our count files contain EntrezGene ID's, NOT Ensembl IDs, and the conversion between the two is not bijective (i.e. 1-1). Therefore, we I run the read.HTSeqCounts() function in R, the error message "Count files do not correspond to the flattened annotation file" appears.

Question:

(1) is Ensembl GTF the only input for dexseq_prepare_annotation.py? It seems the resultant GFF file contains only Ensembl gene IDs, accordingly...

(2) in my case of non-Ensembl gene IDs, how can I instruct or manipulate the codes to generate an ExonCountSet object?

Thank you!
alittleboy is offline   Reply With Quote
Old 06-19-2013, 04:29 PM   #2
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Use the function "newExonCountDataSet" instead of "read.HTSeqCounts". See "?newExonCountDataSet" for details.
Simon Anders is offline   Reply With Quote
Old 06-19-2013, 05:39 PM   #3
alittleboy
Member
 
Location: USA

Join Date: Apr 2011
Posts: 60
Default

Quote:
Originally Posted by Simon Anders View Post
Use the function "newExonCountDataSet" instead of "read.HTSeqCounts". See "?newExonCountDataSet" for details.
Thank you so much for the information, Simon! I think that's the function that suits my current situation ;-)
alittleboy is offline   Reply With Quote
Old 06-20-2013, 09:58 AM   #4
alittleboy
Member
 
Location: USA

Join Date: Apr 2011
Posts: 60
Default

So I got one more question: if I don't use read.HTSeqCounts() which takes an annotation file, then it takes extra efforts to generate a plot using plotDEXSeq(ExonCountSet, ...), right? The function read.HTSeqCounts() automatically takes care of that, but in newExonCountSet(), I need to specify the "transcripts" argument in order to plot? Thanks!
alittleboy is offline   Reply With Quote
Old 06-20-2013, 10:09 AM   #5
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Exactly. Or more specifically, you need the "transcripts" argument (and the "exonIntervals" argument) if you want to get gene models at the bottom of your plot.
Simon Anders is offline   Reply With Quote
Old 06-24-2013, 05:50 AM   #6
alittleboy
Member
 
Location: USA

Join Date: Apr 2011
Posts: 60
Default

Quote:
Originally Posted by Simon Anders View Post
Exactly. Or more specifically, you need the "transcripts" argument (and the "exonIntervals" argument) if you want to get gene models at the bottom of your plot.
Hi Simon:

Thanks for the information! I have two question while using DEXSeq, and hopefully you can help me clarify:

1. in the newExonCountSet() function, you mentioned that in order to get the gene model at the bottom of plot, two more arguments need to be passed: "exonIntervals" and "transcripts" -- do you have an example of how these two inputs are formatted? Are they derived from the GFF files? Any functions in DEXSeq will help me to get the two inputs? (Sorry I didn't find any examples in the help document...)

2. in the vignette written by Alejandro, about "data preprocessing and creation of the data objects pasillaGenes and pasillaExons", up to section 5 all the steps aim to generate the per-exon read counts for each sample (finally in the form of .txt files to be used in R). My question is, if I have those per-exon read counts files (from other sources) and would like to use them directly, can I just start from section 5: "creation of CountDataSet" using the per-exon read counts I have at hand? Actually, I have another file of junction reads, and I don't know if DEXSeq can ever take it as inputs in whatever functions? Or DEXSeq just work with per-exon read counts?

Thank you so much!!
alittleboy is offline   Reply With Quote
Old 06-26-2013, 12:22 AM   #7
areyes
Senior Member
 
Location: Heidelberg

Join Date: Aug 2010
Posts: 165
Default

Hi @alittleboy,

The function newExonCountSet will allow you to generate your ExonCountSet from basic R data structures. For an example of an ExonCountSet object you could have a look at the pasillaExons object in the pasilla package. You will find how this is supposed to be formatted. Regarding junction reads, you could also input them in DEXSeq, and it will be consider as an additional exon bin!

Alejandro
areyes is offline   Reply With Quote
Old 06-26-2013, 05:54 AM   #8
alittleboy
Member
 
Location: USA

Join Date: Apr 2011
Posts: 60
Default

Quote:
Originally Posted by areyes View Post
Hi @alittleboy,

The function newExonCountSet will allow you to generate your ExonCountSet from basic R data structures. For an example of an ExonCountSet object you could have a look at the pasillaExons object in the pasilla package. You will find how this is supposed to be formatted. Regarding junction reads, you could also input them in DEXSeq, and it will be consider as an additional exon bin!

Alejandro
Hi Alejandro:

Thank you for your suggestions! Would you please be more specific on how can I input my junction_reads file to DEXSeq? I read the related vignette and online documents, but didn't find a solution... Thanks ;-)
alittleboy is offline   Reply With Quote
Old 07-01-2013, 01:28 AM   #9
areyes
Senior Member
 
Location: Heidelberg

Join Date: Aug 2010
Posts: 165
Default

Hi @alittleboy,

You probably realized that exons in DEXSeq are not "real" exons, but rather exon bins (defined as non overlapping exonic parts of transcripts, see the publication for more detail).

Now if you want to test your junction reads, you would have to add as a counting bin, e.g. as a row in your count data, a counting bin that reflects the junction between your exons of interest:

If you have this gene model with exons A, B and C:


---[ A ]----[ B ]----[ C ]---

Your counting bins would be A, B and C, you will count reads that fall into this exons and your matrix would look like this ( I am making the numbers up):

A 2
B 4
C 3


If you want to test your exons bins you would need a matrix like this:

A 2
A-B 1
B 4
B-C 3
C 3
A-C 2

And input this into DEXSeq

Best regards,
Alejandro
areyes is offline   Reply With Quote
Old 07-01-2013, 07:54 AM   #10
alittleboy
Member
 
Location: USA

Join Date: Apr 2011
Posts: 60
Default

Quote:
Originally Posted by areyes View Post
Hi @alittleboy,

You probably realized that exons in DEXSeq are not "real" exons, but rather exon bins (defined as non overlapping exonic parts of transcripts, see the publication for more detail).

Now if you want to test your junction reads, you would have to add as a counting bin, e.g. as a row in your count data, a counting bin that reflects the junction between your exons of interest:

If you have this gene model with exons A, B and C:


---[ A ]----[ B ]----[ C ]---

Your counting bins would be A, B and C, you will count reads that fall into this exons and your matrix would look like this ( I am making the numbers up):

A 2
B 4
C 3


If you want to test your exons bins you would need a matrix like this:

A 2
A-B 1
B 4
B-C 3
C 3
A-C 2

And input this into DEXSeq

Best regards,
Alejandro
Hi @areyes:

That's very clear! Thanks for the examples -- I see how important to understand it is the exonic region (exon bin) instead of real exon that constitutes the building block for DEXSeq counting.

I remember someone in the past asked why testing these exonic regions instead of real exons is more important biologically and in terms of interpretations. That is also my concern and needs to be clarified. Would you share your thoughts on this? Sorry maybe you've already discussed before, and I appreciate if you can redirect me to the posts ;-)

I think maybe it's more concrete to give an example: the top gene in my test comparing two groups is ENSG00000113845. In the HTML output of DEXSeq, there are 22 exonID's from E001 to E022, and the last exon E022 is deferentially expressed. However, from the Ensembl website on this gene here, there are 9 transcripts (splice variants) for this gene. How can I know E022, the exon counting bin, corresponds to which transcript? Or the inference is only limited to the gene-level (maybe wrong), i.e. this gene has at least 1 exon that is DEU, but we don't know which exon is DEU?

Thank you so much for your clarifications!

Last edited by alittleboy; 07-01-2013 at 06:21 PM.
alittleboy is offline   Reply With Quote
Old 11-15-2013, 07:31 PM   #11
nbahlis
Member
 
Location: Canada

Join Date: May 2013
Posts: 25
Default DEXSeq exon bins

Thank you for this useful discussion.
If I understand correctly the "exon bins" to not correspond to actual exons?
I did run DEXSeq on 22 samples with condition pre- and post-treament. Inspecting one of the genes of interest ENSG00000113851, in plotDEXSeq this gene is plotted as having 37 exons (or bins) while in reality this gene has only 11 exons. Is the difference due to bin (instead of exon) counting or something went wrong in my analysis?
nbahlis is offline   Reply With Quote
Old 11-16-2013, 05:05 AM   #12
areyes
Senior Member
 
Location: Heidelberg

Join Date: Aug 2010
Posts: 165
Default

Hi @nbahlis,

The preprocessing scripts from DEXSeq define new exon bins based on non overlapping regions of the transcripts (http://www.ncbi.nlm.nih.gov/pmc/arti...195/figure/F1/). If you look at this gene ID in ENSEMBL, (http://www.ensembl.org/Homo_sapiens/...190676-3221394), Two isoforms contain 11 exons, but there are many other isoforms as well. The 37 exon bins defined in DEXSeq come from the preprocessing considering all these isoforms.
areyes is offline   Reply With Quote
Reply

Tags
deseq_prepare_annotation, dexseq, exoncountset

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:18 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO