SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Different fpkm values for cuffdiff and cuffcompare madsaan Bioinformatics 3 12-12-2012 04:14 PM
cufflinks, Warning: Skipping large bundle. fabrice RNA Sequencing 9 07-31-2012 01:12 PM
issues found in using cufflinks/cuffcompare/cuffdiff sterding Bioinformatics 5 06-01-2011 08:04 PM
compare expression? cuffcompare or cuffdiff vebaev RNA Sequencing 6 12-21-2010 09:59 PM
cuffcompare vs cuffdiff? jli525 Bioinformatics 0 06-08-2010 10:40 AM

Reply
 
Thread Tools
Old 02-24-2012, 05:25 PM   #1
PFS
Member
 
Location: USA

Join Date: Mar 2010
Posts: 55
Default skipping cufflinks-->cuffcompare ... straight to cuffdiff?

I see that most workflows includes tophat --> cufflinks --> cuffcompare --> cuffdiff.

If I want to perform differential expression analysis on RNASEQ samples based on a known annotation (e.g. Ensembl GTF), can I simply do tophat --> cuffdiff (with the known gtf)?

What would be the difference if I were to do tophat --> cufflinks --> cuffcompare and use that output gtf in cuffdiff?
PFS is offline   Reply With Quote
Old 02-24-2012, 10:16 PM   #2
dietmar13
Senior Member
 
Location: Vienna

Join Date: Mar 2010
Posts: 107
Default depends on your experimental design

I compared several methods for DE with a 12 vs 12 paired data-set and found cuffdiff to produce by far the fewest significant genes.

the ascending order was:
cuffdiff
Noiseq
DESeq
baySeq
edgeR
npSeq
SAMseq
poissonSeq

therefore, if you have a design with biological replicates, every approach beside cuffdiff seems to be more adequate...
dietmar13 is offline   Reply With Quote
Old 02-27-2012, 12:09 AM   #3
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 388
Default

Quote:
Originally Posted by dietmar13 View Post
I compared several methods for DE with a 12 vs 12 paired data-set and found cuffdiff to produce by far the fewest significant genes.

the ascending order was:
cuffdiff
Noiseq
DESeq
baySeq
edgeR
npSeq
SAMseq
poissonSeq

therefore, if you have a design with biological replicates, every approach beside cuffdiff seems to be more adequate...
Just because something produces a shorter list of genes doesn't mean it is a worse approach surely..
Bukowski is offline   Reply With Quote
Old 02-27-2012, 03:39 AM   #4
dietmar13
Senior Member
 
Location: Vienna

Join Date: Mar 2010
Posts: 107
Default of course,

but I analysed the same biological question 12 vs 12 paired (colon cancer vs. normal tissue) with microarray and got ~6000 significant genes.

I would say, 2 significant genes (as I got with cuffdiff) are a little-bit to few and useless for further examinations, thus worse than other approaches.

I also compared the gene lists derived from the other approaches with the gene list which I got from microarrays, and there was no big difference concerning overlap (and I know, that microarray is not the truth).

Furthermore, I estimated robustness of obtained lists with bootstrap validation, and got acceptable validations (even though decreasing values with increasing numbers of significant genes).

therefore, I would say, all gene lists are more or less plausible, also regarding expected differences between cancer and normal tissue expressions.

The only way to validate all genes for sure, would be to make RT-qPCR with the same samples with all genes...
dietmar13 is offline   Reply With Quote
Old 02-27-2012, 03:57 AM   #5
Bukowski
Senior Member
 
Location: Aberdeen, Scotland

Join Date: Jan 2010
Posts: 388
Default

Thanks for elaborating, without figures or reference to a microarray experiment it was rather hard to take on faith
Bukowski is offline   Reply With Quote
Old 02-28-2012, 03:11 PM   #6
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

I'm a bit skeptical about that Cuffdiff run - we've compared the lists produced by Cuffdiff against the lists produced by arrays run on *exactly* the same RNA, and found that not only does Cuffdiff return a superset of the genes returned by the array analysis, the Cuffdiff lists are highly concordant with DESeq and edgeR (like 90% overlap). Are you sure you're running Cuffdiff correctly, and are you using a recent version?
Cole Trapnell is offline   Reply With Quote
Old 02-29-2012, 03:08 AM   #7
dietmar13
Senior Member
 
Location: Vienna

Join Date: Mar 2010
Posts: 107
Default used syntax for cuffdiff

i have used cuffdiff coming with cufflinks 1.3.

in $GTF is a gtf prepared the following way:
(in $genome is a link to the chromosomes)

cuffcompare -s $genome -CG -r Homo_sapiens.hg19.gtf Homo_sapiens.hg19.gtf

cuffdiff -o $outdir -p 12 -N -u $GTF $DIR/SRR317086.sam,$DIR/SRR317087.sam,$DIR/SRR317088.sam,$DIR/SRR317089.sam,$DIR/SRR317090.sam,$DIR/SRR317091.sam,$DIR/SRR317092.sam,$DIR/SRR317093.sam,$DIR/SRR317094.sam,$DIR/SRR317095.sam,$DIR/SRR317096.sam,$DIR/SRR317097.sam $DIR/SRR317098.sam,$DIR/SRR317099.sam,$DIR/SRR317100.sam,$DIR/SRR317101.sam,$DIR/SRR317102.sam,$DIR/SRR317103.sam,$DIR/SRR317104.sam,$DIR/SRR317105.sam,$DIR/SRR317106.sam,$DIR/SRR317107.sam,$DIR/SRR317108.sam,$DIR/SRR317109.sam
dietmar13 is offline   Reply With Quote
Old 02-29-2012, 03:31 AM   #8
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

Hmm, it certainly looks OK. Do you have a lot of genes with status FAIL? Where'd you get the GTF?

Also, how did you map the reads?
Cole Trapnell is offline   Reply With Quote
Old 02-29-2012, 05:47 AM   #9
dietmar13
Senior Member
 
Location: Vienna

Join Date: Mar 2010
Posts: 107
Default

i created the gtf from an ENSEMBL gtf (to make it hg19 compatible, I added chr in front of chromosom numbers).

gawk 'BEGIN { FS = "\t"; OFS="\t" } ; $1 ~ /^([0-9]+|X|Y|MT)$/ { print "chr" $1 , $2 , $3 , $4 , $5 , $6 , $7 , $8 , $9 }' $in > cuffdiff/${out}.tmp

sed 's/chrMT/chrM/' cuffdiff/${out}.tmp > cuffdiff/${out}

rm cuffdiff/${out}.tmp

I mapped the reads with Tophat and used these mapped reads also for analysis with the other R-packages for DE (HTseq-count unique).

gene_exp.diff
11050 FAIL
27 HIDATA
75 LOWDATA
35526 NOTEST
2995 OK


why does it make so many tests (on gene basis or on transcript basis)?
dietmar13 is offline   Reply With Quote
Old 03-01-2012, 08:26 AM   #10
DineshCyanam
Compendia Bio
 
Location: Ann Arbor

Join Date: Oct 2010
Posts: 35
Default

Hi Cole,
I see something similar with my data too. cuffdiff produces very few significant (Significant: Yes) genes although I have not compared with other methods. I am using cufflinks v1.3.0 but the Tophat version I used was v1.1.4.

I ran Tophat more than an year ago and I do know that Tophat has evolved a lot from then. Do you think I would see a significant difference between the latest version of Tophat and v1.1.4?

gene_exp.diff
---------------------------------
248975 FAIL
811 HIDATA
22270 LOWDATA
141410 NOTEST
134335 OK
80 YES (SIGNIFICANT)

Last edited by DineshCyanam; 03-01-2012 at 08:34 AM.
DineshCyanam is offline   Reply With Quote
Old 07-27-2012, 05:30 AM   #11
dietmar13
Senior Member
 
Location: Vienna

Join Date: Mar 2010
Posts: 107
Default cuffdiff 2.0.2 beta

i have now repeated my analysis with cuffdiff 2.0.2 beta and got even zero significant transcripts or genes (CDS). cuffdiff 1.3 found two genes (see above).

the design was a 12 versus 12 matched pairs experiment (normal colon mucosa vs. colon cancer tissue) with only median 2.5 mio reads per sample.

mapper: tophat.

cuffdiff was provided with an ENSEMBL gtf-file, and the analysis run without any error.

SAMseq, edgeR, and limma/voom found > 4,000 genes, DESeq > 2,500 using the raw digital count data (HTseq-count).

I think cuffdiff (1.3 and 2.0.2 beta) is not the right choice for statistical analysis of experimental designs with many disperse biological replicates and low reading depth.
dietmar13 is offline   Reply With Quote
Reply

Tags
cuffcompare, cuffdiff, cufflinks, differential expression, rnaseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:06 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO