SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Checking in Admiralenola Introductions 0 07-09-2012 05:31 AM
Checking the Quality of RRBS libraries before actually running them twang11 Sample Prep / Library Generation 0 02-22-2012 05:18 PM
program for checking primer pairs' uniqueness shuang Bioinformatics 1 08-11-2011 02:05 PM
TopHat Checking for Bowtie [Failed] ercfrtz Bioinformatics 1 03-14-2011 02:43 PM
Quality checking transcriptome assemblies ShellfishGene Bioinformatics 1 02-20-2010 07:44 AM

Reply
 
Thread Tools
Old 12-31-2012, 07:15 AM   #1
jparsons
Member
 
Location: SF Bay Area

Join Date: Feb 2012
Posts: 62
Default Checking Cuffdiff

I am using an interesting dataset to "test" differential isoform expression programs.

Unfortunately, I am not an expert in every (any?) program, so I could use some sanity checking.

I have 3 separate tissues, ABC. I want to use (in this case) cuffdiff to identify isoforms which are uniquely expressed in A/B/C, as I can use other "ground truth" runs to verify these claims.

I ran the program as follows, alternating A, B, and C:
Code:
 cuffdiff -p 8 -c 10 <ucsc.gtf> A1,A2,A3 B1,B2,B3,C1,C2,C3 -o outdir
I'm not using a cufflinks-derived gtf or (exclusively) tophat-mapped reads. I imagine I'm doing it all wrong. I have two main questions:

1) Can I get away with not using the entire cufflinks pathway here? (If not, why doesn't the program complain?)
2) Am I properly comparing the 3 tissues? Does A vs B,C return transcripts DE in only A, as i intend it to?
jparsons is offline   Reply With Quote
Old 01-09-2013, 05:27 AM   #2
rboettcher
Member
 
Location: Berlin

Join Date: Oct 2010
Posts: 71
Default

Hello jparsons,

I used cufflinks and cuffdiff with GSNAP alignments and it worked fine, so you do not need to stick to TopHat necessarily as long as the sam/bam-files have all required columns.
However, I used the cufflinks -> cuffmerge -> cuffdiff variant to check my genes, since that way was suggested by the authors (but not very successful for me).

After following some discussions in this forum, see
http://seqanswers.com/forums/showthread.php?t=20702
and
http://seqanswers.com/forums/showthread.php?t=16528

I concluded that cufflinks/cuffdiff have a problem in their correction for variance. For my analysis, the bigger my sample groups were, the fewer genes were found significantly DE until none were left. Therefore I assume that pooling group B and C will result in a similar problem due to high variance between both groups.

Besides that, your command looks fine, so please keep us posted on your progress.
rboettcher is offline   Reply With Quote
Old 01-09-2013, 12:23 PM   #3
jparsons
Member
 
Location: SF Bay Area

Join Date: Feb 2012
Posts: 62
Default

Rboettcher,

Thanks for the response. I eventually compared the output from tophat->cufflinks->cuffmerge->cuffdiff to that from only cuffdiff and found that they were (mostly) identical. I am content using cuffdiff without going through the entire pipeline.

I got results for cuffdiff and finally managed to get RSEM to like me for long enough to spit out quantitations. When compared to the "truth" set (sadly only available on the gene level for now), the RSEM/cuffdiff lists are 'decent' individually, coming close to the expected ratio on average, but having numerous outliers. Taking the overlap set of genes called by both RSEM and cuffdiff makes for a much cleaner picture, with far less deviation from the ratio, and fewer false positives.

I'm still working on making metrics that make sense, so 'decent' and 'cleaner' is the best i can offer for now. I imagine I will develop permissive and restrictive "true positive" lists at each ratio and then generate ROCs for each algorithm I can successfully test.

I'm currently worried about algorithms making calls for downregulated genes or calling them as differentially expressed in cases where the assumption that "A>>B+C or A<<B+C" doesn't hold. I don't know how to handle that yet, and it may be the source of the outliers I mentioned before.

Overall, I am actually impressed with cuffdiff's performance, given how much grief it gets here. Neither algorithm is even remotely perfect, neither is obviously superior.
jparsons is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO