![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Survey: RNA-Seq analysis for Differential Gene/Transcript Expression | bodhisattvax | Bioinformatics | 14 | 06-12-2013 09:06 AM |
differential gene expression analysis between Different strains by RNA-seq | qqtwee | Bioinformatics | 3 | 07-30-2012 04:19 AM |
RNA-Seq Quantification and Differential Expression Analysis | days369 | RNA Sequencing | 2 | 04-06-2011 12:24 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: China Join Date: Oct 2010
Posts: 22
|
![]()
Hey. RNA-Seq is very efficient to estimate gene expression level in terms of RPKM/FPKM and to perform differential expression analysis then. which of the following method do you prefer to calculate the RPKM/FPKM value/perform DEG analysis especially when a single gene has multiple isoforms?
RPKM calculation a). Reads falling on any exon of the gene are counted and then normalized by the total length of all exons and library size; b). Only reads falling on the constitutive exons are counted and normalized by the length of constitutive exons and library size; c). Reads falling on the epresentative isoform (eg. longest isoform) of a gene are counted and normalized by isoform length and library size d). by ERANGE software (http://www.nature.com/nmeth/journal/...meth.1226.html) e). by cufflinks/cuffdiff software, which made effort to locate reads in isoform/transcrip level. Differential Expression Gene Analysis a) DEGseq, b) EdgeR, c), Cuffdiff By the way, what's your comments on Cufflinks/Cuffidff software? Actually, I heard of both positive and critical opinions, which makes me coufused on whether it is reliable. |
![]() |
![]() |
![]() |
#2 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
DESeq(2), edgeR, or limma/voom are your best bets, so you can ignore the RPKM calculation issue. Regarding cufflinks/cuffdiff, the earlier versions were pretty suboptimal, though the more recent versions seem vastly improved.
|
![]() |
![]() |
![]() |
#3 | |
Member
Location: China Join Date: Oct 2010
Posts: 22
|
![]() Quote:
Can you explain why previous versions of cufflinks/cuffdiff were suboptimal. Is it because cufflinks/cuffdiff works on transcript level, while it is not possible to accurately define all the transcripts based on RNA-Seq? or because the underlying statistic model is not good enough? |
|
![]() |
![]() |
![]() |
#4 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
It seems to have been more of a change in implementation, though I can't say for sure. Early on, different point versions were giving very different results, some of which were non-sensical. From reading the forums here, it seems that the most recent versions are more stable and use a somewhat different approach, though I'm not an expert on cufflinks/cuffdiff.
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Austria Join Date: Apr 2009
Posts: 181
|
![]()
You should read the paper "Computational methods for transcriptome annotation and quantification using RNA-seq" in Nature Methods.
In that paper the cuffdiff authors argue why transcript expression method is conceptually better than the exon union/intersection method. |
![]() |
![]() |
![]() |
#6 | |
Member
Location: China Join Date: Oct 2010
Posts: 22
|
![]() Quote:
Last edited by chenjy; 07-31-2013 at 07:59 AM. |
|
![]() |
![]() |
![]() |
#7 | |
Member
Location: China Join Date: Oct 2010
Posts: 22
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 | |
Member
Location: SF Bay Area Join Date: Feb 2012
Posts: 62
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#9 | |
Senior Member
Location: Austria Join Date: Apr 2009
Posts: 181
|
![]() Quote:
RNA-seq suffers from random sampling so if the overall gene expression level is low, then the chance is higher that reads will get assigned to another transcript even in technical replicates. Eg. if you only have 2-4 supporting reads for a splice junction, and the next time you sample it, you get 0 supporting reads for that same junction, I can imagine the reads being flipped to another transcript in the other technical replicate. What was the FPKM status of this transcript? If it is "OK" then that would be troubling... |
|
![]() |
![]() |
![]() |
#10 |
Member
Location: SF Bay Area Join Date: Feb 2012
Posts: 62
|
![]()
The two transcripts happen to be a regular transcript and its accompanying snoRNA ; it's certainly understandable that the reads are misassigned. At least it's not being called DE, but when the FPKM varies between 0 and 20 (or 20 and 40) it casts considerable doubt on the accuracy of the assignment. There are over 1000 reads in the 12kb region.
Code:
SNORA5A SNORA5A - chr7:45139698-45151317 1a 1aU OK 42.1291 0 -1.79769e+308 -1.79769e+308 0.402342 1 no TBRG4 TBRG4 - chr7:45139698-45151317 1a 1aU OK 20.5787 22.3127 0.116716 -0.308804 0.75747 1 no Anyway, this was one of my more depressing observations while attempting to figure out how the heck to validate RNA-seq quantifications. It may just be a bug in the software (given that the genomic regions are changing,it seems like it must be), but even conceptually, no matter how good(/complicated) you make your statistical models, you're going to misassign some reads. I'll go ahead and re-run this with 2.11 tonight just for kicks. It was done in 2.02 originally. Last edited by jparsons; 07-31-2013 at 08:54 AM. Reason: adding last bit. |
![]() |
![]() |
![]() |
#11 | |
Senior Member
Location: Austria Join Date: Apr 2009
Posts: 181
|
![]() Quote:
Wow.. that one is a tricky case. They are both transcribed off the minus strand. I looked at it in the genome browser: http://genome-euro.ucsc.edu/cgi-bin/...3&ensGene=pack But I'm confused about the annotation, the Refseq track shows SNORA5A is sitting between TBRG4 exons. While UCSC genes track shows overlapping of TBRG4 exons with (a larger) SNORA5C. Ensembl also has the smaller SNORA5A and SNORA5C in between exons. What are your settings for cuffdiff? are you using --compatible-hits-norm ? I would hope it would make things agree more with the GTF annotation. The defaults are not always the best and are actually flipped between cufflinks and cuffdiff (I have found from personal experience). Also the defaults claimed are not always the true defaults. For example --max-frag-multihits is listed as unlimited in cuffdiff, but it's actually set to 1 in the program. Last edited by NGSfan; 07-31-2013 at 09:29 AM. |
|
![]() |
![]() |
![]() |
#12 |
Member
Location: SF Bay Area Join Date: Feb 2012
Posts: 62
|
![]()
Unless "default=TRUE" isn't actually true for compatible-hits-norm, yes i am using it. I wasn't using the flag, however.
Interestingly, RSEM also says that the SNO disappears in 1aU. It gets the lengths correct, though. (Different reference because obviously these programs are picky about references so you can't directly compare them without doing way too much extra work): Code:
sample gene_id transcript_id length effectivelength exp.count TPM FPKM 1a: ENSG00000206838 ENST00000384111 134.00 84.34 1.00 1.58 0.95 1aU: ENSG00000206838 ENST00000384111 0.00 0.00 0.00 0.00 0.00 1a: ENSG00000136270 ENST00000258770... 2020.44 1966.21 690.00 44.79 27.03 1aU: ENSG00000136270 ENST00000258770... 1953.33 1900.96 585.00 47.69 28.96 Last edited by jparsons; 08-01-2013 at 09:18 AM. |
![]() |
![]() |
![]() |
#13 | |
Senior Member
Location: Austria Join Date: Apr 2009
Posts: 181
|
![]() Quote:
Are you using the same Ensembl GTF with Cuffdiff as you do with RSEM? |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|