SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with Cufflinks output admiral RNA Sequencing 0 06-01-2011 01:19 PM
HTSeq output not correlated with Cufflinks output... Help gen2prot Bioinformatics 5 01-31-2011 09:16 AM
Interesting cufflinks output jetspeeder Bioinformatics 8 12-21-2010 06:51 AM
Cufflinks output - Next wat? ritzriya RNA Sequencing 5 09-13-2010 03:03 AM
cufflinks cuffcompare output Mark Bioinformatics 1 07-19-2010 07:23 AM

Reply
 
Thread Tools
Old 06-29-2012, 03:00 AM   #1
masterpiece
Member
 
Location: malaysia

Join Date: Mar 2009
Posts: 40
Default inconsistancy between cufflinks and cuffdiffs output?

Hi all,

I'm working on differential expression analysis on bacterial transcriptome using cufflinks packages. However I found these result (table below) that confusing me. Some of the expression profiles reported in the cuffdiffs analysis are not consistent with the cufflinks analysis. Below is part of the combined results from cuffdiff (column 2- 10) and cufflinks (column 11 - 14).


Code:
no	gene	FPKM (sampleA)	FPKM (sampleB)	log2(fold_change)	fold_change	test_stat	p_value	q_value	significant	FPKM (SampleA-r1)	FPKM (SampleA-r2)	FPKM (SampleB-r1)	FPKM (SampleB-r2)
1	gene0090	0	0.280696	1.79769e+308	#VALUE!	1.79769e+308	0.26143	1	no	11.2027	11.0176	6.73073	14.1696
2	gene0091	0	0	0	1	0	1	1	no	44.6099	57.6125	27.8334	30.6339
3	gene1910	0	0.440214	1.79769e+308	#VALUE!	1.79769e+308	0.134216	1	no	11.7053	14.0353	33.6464	45.4033
4	gene2086	0	0	0	1	0	1	1	no	2.29351	1.15464	0.654272	6.82639
5	gene2223	0	0.277117	1.79769e+308	#VALUE!	1.79769e+308	0.341546	1	no	0	0	0	0.660894
6	gene2488	0	0	0	1	0	1	1	no	10.4025	11.889	8.49335	17.6908
7	gene3116	0	2.19548	1.79769e+308	#VALUE!	1.79769e+308	0.270146	1	no	43.535	54.3667	31.5445	29.4442
8	gene3375	0	8.83927	1.79769e+308	#VALUE!	1.79769e+308	0.156186	0.371228	no	0	0	8.83593	8.62098
9	genet08	0	47943.9	1.79769e+308	#VALUE!	1.79769e+308	0.216948	0.441909	no	215745	289492	283647	224841
10	genet15	0	0	0	1	0	1	1	no	0	32165.7	0	44968.3
11	genet19	0	0	0	1	0	1	1	no	0	0	0	22484.1
12	genet30	0	20919.6	1.79769e+308	#VALUE!	1.79769e+308	0.26143	0.486515	no	34511.2	109316	0	49890.9
13	genet35	0	47943.9	1.79769e+308	#VALUE!	1.79769e+308	0.216948	0.441909	no	61641.3	0	81041.9	44968.3
14	gene1061	0	4.64699	1.79769e+308	#VALUE!	1.79769e+308	0.0974253	1	no	0	0	239.346	308.699
15	gene2057	0	0	0	1	0	1	1	no	10.5358	12.5268	7.81085	17.5835
16	gene1219	279.782	126.536	-1.14476	0.452264919	4.02067	5.80E-05	0.000723236	yes	294.917	272.507	98.3558	163.011
17	gene0454	281.243	122.751	-1.19608	0.436459592	4.50118	6.76E-06	0.000105364	yes	319.036	254.564	112.823	133.568
18	gene0696	281.492	119.564	-1.23531	0.424751221	4.65254	3.28E-06	5.51E-05	yes	299.484	271.801	101.908	141.367
19	gene1908	295.083	175.718	-0.747857	0.595487447	2.83873	0.0045293	0.0283197	yes	293.55	302.417	149.607	207.991
20	gene3330	298.322	59.7793	-2.31915	0.200385497	8.05916	6.66E-16	4.98E-14	yes	316.381	288.917	47.3887	75.7079
21	gene2943	304.366	157.484	-0.950603	0.517416153	3.27473	0.00105763	0.00865789	yes	286.907	325.549	166.413	140.802
22	gene1286a	304.854	1411.33	2.21087	4.629543685	-6.29067	3.16E-10	1.26E-08	yes	368.954	256.093	1544.83	1187.39
23	gene2931	307.767	74.4498	-2.0475	0.241902905	6.94379	3.82E-12	1.86E-10	yes	398.538	236.189	80.5042	63.9738
24	gene2526	310.296	73.1319	-2.08507	0.235684698	5.75042	8.90E-09	2.70E-07	yes	293.458	331.069	70.5113	74.949
25	gene3402	323.668	104.076	-1.63688	0.321551116	5.94465	2.77E-09	9.15E-08	yes	318.393	334.796	101.337	105.236
26	gene0394	324.037	294.276	-0.138986	0.90815723	0.301882	0.762742	0.881605	no	335.479	320.832	244.124	357.452
27	gene2014	324.814	336.958	0.0529539	1.037386787	-0.190118	0.849217	0.931596	no	345.453	313.737	348.649	311.732
28	gene0458	325.072	154.641	-1.07184	0.475711893	4.01458	5.96E-05	0.000732211	yes	364.987	297.467	128.933	186.893
29	gene3221	326.916	193.231	-0.758593	0.591072498	2.64956	0.00805976	0.0447439	yes	315.978	342.966	138.708	265.134
30	gene1964	330.623	151.484	-1.12602	0.458177971	3.99545	6.46E-05	0.000786006	yes	322.485	344.348	132.149	174.834
For example in row no 9 (genet08), in cufflinks analysis FPKM in both sampleA replicates reported as high as 20000, but in cuffdiffs analysis the FPKM reported as 0. There are several rows in the table that show the same inconsistency between cuffdiffs and cufflinks analysis, I don't have the exact figure but i can estimate its around 5% of the total gene list have this kind of inconsistency.

Below is the command that I used to run cufflinks and cuffdiffs


cufflinks command (example for SampleA_r1)

Code:
cufflinks -G combined.gtf SampleA_r1-accepted_hits.bam
cuffdiff command

Code:
cuffdiff -o SampleA-vs-SampleB-cufflinks cuffcmp.combined.gtf SampleA-r1-accepted_hits.bam,SampleA-r2-accepted_hits.bam SampleB-r1-accepted_hits.bam,SampleB-r2-accepted_hits.bam
Hope someone can share their thought on this.

Thanks in advance

- kamal
masterpiece is offline   Reply With Quote
Old 07-30-2012, 05:55 AM   #2
sqcrft
Member
 
Location: boston

Join Date: May 2012
Posts: 29
Default

I got the same problem.
sqcrft is offline   Reply With Quote
Old 08-01-2012, 01:07 AM   #3
masterpiece
Member
 
Location: malaysia

Join Date: Mar 2009
Posts: 40
Default

Great!, I thought I was the only person who got this problem. Bad thing is, I haven't solved the problem yet, but at the moment I just use results from cufflinks run as I looking into differential expression analysis
masterpiece is offline   Reply With Quote
Old 08-01-2012, 03:59 AM   #4
sqcrft
Member
 
Location: boston

Join Date: May 2012
Posts: 29
Default

No, please don't use the previous result.
I believe the problem is already fixed in the new version. Try the new V2.02 version.

there are also huge difference between V1.3 and V2.02, it is long story, but I belive v2.02 is indeed better.
sqcrft is offline   Reply With Quote
Old 08-01-2012, 04:03 AM   #5
sqcrft
Member
 
Location: boston

Join Date: May 2012
Posts: 29
Default

I meant, the version 2.02 cuffdiff also output the FPKM of all the replicates, so there is no need to run the cufflinks anymore.
sqcrft is offline   Reply With Quote
Old 08-01-2012, 11:52 PM   #6
masterpiece
Member
 
Location: malaysia

Join Date: Mar 2009
Posts: 40
Default

Hmm... I've tried rerun the same command using cufflink version 2.02 (previously version 2.01), however the problem still remain . At the moment I'm using the approach that I mentioned above

On another note, I hope for future cufflinks version (cuffdiff run) they will report out replicate variation as well (Something like DESeq output reporting). Sometime I want to see how much is the gene expression variation between replicate.
masterpiece is offline   Reply With Quote
Old 08-02-2012, 04:13 AM   #7
sqcrft
Member
 
Location: boston

Join Date: May 2012
Posts: 29
Default

Are you using different gene annotation gtf files in cuffdiff and cufflinks?
combined.gtf and cuffcmp.combined.gtf
sqcrft is offline   Reply With Quote
Old 08-03-2012, 02:31 AM   #8
masterpiece
Member
 
Location: malaysia

Join Date: Mar 2009
Posts: 40
Default

Yes, I'm using different gtf files as the cuffdiff required for me to combined the gtf files by using the cuffmerge command. But technically both gtf files (combined.gtf and cuffcmp.combined.gtf ) have the same information except additional columns in the cuffcmp.combined.gtf added by cuffmerge.

I don't think my way of running the program is the problem, as we have already validate some of the FPKM data, seem that the results are very good. The problem here is only some of the genes which have inconsistent FPKM output from cuffdiff and cufflinks. From my calculation about 50 genes (~1% of total genes) have this kind inconsistency. I attached the simplified version of table I posted on my 1st post (ahh.. should have done it on the 1st place).

As an example in row no 9 (genet08), in cufflinks analysis FPKM in both sampleA replicates reported as high as 20000, but in cuffdiffs analysis the FPKM reported as 0
Attached Files
File Type: txt example of problem list.txt (1.7 KB, 18 views)
masterpiece is offline   Reply With Quote
Old 08-03-2012, 04:24 AM   #9
sqcrft
Member
 
Location: boston

Join Date: May 2012
Posts: 29
Default

Here is my experience, hope it may help a little bit.
The version 1.3 cuffdiff only give FPKM of a condition, but not the FPKMs of individual replicates. I realized that some condition FPKMs from cuffdiff are significantly different from FPKMs of the biological replicates from cufflinks.

The version 2.02 cuffdiff will output both condition FPKM and replicate FPKMs. For me, the problem is solved, since I don't need to use cufflinks to get the replicate FPKM any more.

I am not sure whether you tried the same gtf files in cufflinks and cuffdiff. Do you think you will get the same FPKM if you do that way?
sqcrft is offline   Reply With Quote
Old 08-10-2012, 12:06 PM   #10
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

sqcrft, the FPKM of each replicate that cuffdiff produces (this is the genes.read.group.tracking file) does not even match the FPKM that cufflinks produces. And the FPKM of the genes.read.grouptracking file that cuffdiff produces does not remotely come close to the "Values" that the gene_expr.diff files produce.

I am honestly baffled by all of this. Here is an example for gene GPR89C.

My Cufflinks files produce the following FPKM values for
Sample 1: 16,16,12 and for Sample 2: 16,21,15.

My Cuffdiff produces the genes.readgroup.tracking file that sqcrft is referring to and it produces the following FPKM values for
Sample 1: .2,.1,.08 and for Sample 2: .16,.19,.19

Lastly, Cuffdiff produces the gene.expr.diff file, the one that groups the values by condition and the one most people are interested in, produces the following values:
Sample 1: 0.12 and Sample 2: 3.44

I haven't checked other genes yet, but I will. I'm thinking the best thing might be to just use the output files from Cufflinks and use that for analysis.
billstevens is offline   Reply With Quote
Old 09-05-2012, 09:06 PM   #11
masterpiece
Member
 
Location: malaysia

Join Date: Mar 2009
Posts: 40
Default

sqcrft

Apology for late reply due to busy with other project.

I've tried run according to your suggestion, but the problem still there. No luck I think. Well I think at the moment I just focus on 99% genes which shown to be ok (so far).

billsteven,

Looks like we are in the same boat. Do tell me if you find solution for this prob

Last edited by masterpiece; 09-06-2012 at 03:26 AM.
masterpiece is offline   Reply With Quote
Old 09-11-2012, 05:39 AM   #12
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

My solution was switching to DESeq. My two cents are that its much preferable to my needs. Very simple to follow, very logical, and the authors are very active on the forum. Additionally, it has a lot of quality control metrics that I did not find with Cufflinks/CummeRbund, plus Cufflinks really messed me up by having a new version that gives such drastically different results. Again, that's just my two cents.
billstevens is offline   Reply With Quote
Old 09-11-2012, 06:05 AM   #13
glados
Member
 
Location: Aperture Science

Join Date: Mar 2012
Posts: 59
Default

I have had similar experience with Cuffdiff and have as well switched to DESeq as Bill Stevens did. Got completely different results compared to Cuffdiff which makes it pretty much impossible for me to say which method produces the most correct results.

Last edited by glados; 09-12-2012 at 03:31 AM.
glados is offline   Reply With Quote
Old 09-11-2012, 09:44 AM   #14
sudders
Member
 
Location: Sheffield, UK

Join Date: Dec 2011
Posts: 32
Default

We've been seeing this for quite a while now. There is another thread about this here: http://seqanswers.com/forums/showthread.php?t=16962

There was some claim that using 2.0.2 fixed this, although I havn't checked this yet.
There was also a dicussion on if it was due to unrealsable high FPKMs on short genes for which there is an explaination, although we see it in all genes.

The FPKMs from CuffDiff are much more believable than those from Cufflinks if you ask me. Our pipelines run but DESeq and CuffDiff, and I generally use the CuffDiff quantitiations for everything except differentially expression, for that we choose a CuffDIff, edgeR or DESeq based on the problem, often we use two or three methods and look at the overlap. There is generally a pretty good corrolation in the fold changes between CuffDiff and the count based methods, but quite a big difference in the p-values.

Of course if you are interested in doing more complex GLM type designs, you'll need to use something other than CuffDIff. On the other hand, if you are working in a system where you think differential splicing is going to be important, then CuffDiff is pretty much your only choice.
sudders is offline   Reply With Quote
Old 09-22-2012, 02:44 PM   #15
masterpiece
Member
 
Location: malaysia

Join Date: Mar 2009
Posts: 40
Default

Bill Steven,

Yup, I did tried using the DESeq for analysis and I prefer the way it report the results compared to cuffdiff. They report out "variation between replicates", which is not provided in cuffdiff. IMHO as molecular biologist, I think the stats is important for me to know how different is the expression between replicates,

But as mention by Sudders, for my coming project I may need to look into differential splicing, for this I have to go with cuffdiff. I think there is no such things as one software suits all.

Galdos,

I've tried to compared both software. Yes, the results are different because both using different algorithm. The way DESeq treat the number of read count is also different. DESeq only count reads that only uniquely mapped, Cufflink/cuffdiff they have their own way to count reads that are not uniquely mapped.

From my comparison test, they maybe different in term of FPKM/RPKM, but most of them have same profiles. So I think both of them still good. To be sure, you need to validate them with lab work.

Sudders,

Unfortunately the latest version still didn't solve the problem. At first I'm using 1.4.1, but even after I update to 2.01 and 2.02, the problem is still there.

I wish I don't have to use several software to get one job done because I hate when people questioning me "So tell me, which one gives the best result and explain me why (answer not more than 50 words and use only layman term)?"

Last edited by masterpiece; 09-22-2012 at 02:55 PM.
masterpiece is offline   Reply With Quote
Old 09-22-2012, 08:11 PM   #16
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

Quote:
Originally Posted by masterpiece View Post
"So tell me, which one gives the best result and explain me why (answer not more than 50 words and use only layman term)?"
Haha, this might have been the best thing I've seen on SeqAnswers.

There's also DEXSeq, which does splicing, so at least you can have some consistency in your method. Whatever you do, don't stick with 1.4, as even the authors will tell you, that version was not good.

One thing you mentioned about lab work, I'm assuming you mean qPCR. How do you handle it with fold changes less than 2? I get many genes that DESeq says are differentially expressed, and the fold change might be as low as 1.25, and no one in my lab seems to think I can validate that with qPCR.
billstevens is offline   Reply With Quote
Old 09-23-2012, 04:00 AM   #17
masterpiece
Member
 
Location: malaysia

Join Date: Mar 2009
Posts: 40
Default

Quote:
Originally Posted by billstevens View Post

One thing you mentioned about lab work, I'm assuming you mean qPCR. How do you handle it with fold changes less than 2? I get many genes that DESeq says are differentially expressed, and the fold change might be as low as 1.25, and no one in my lab seems to think I can validate that with qPCR.
May I know do you have replicates in your analysis. Without replicates, the statistics shown will not be strong enough to support your results.

Normally in my case, apart from using p-value or FDR to get a list of differently expressed genes, I'll also include in log2foldchange set at +/-2. Maybe you should try include log2foldchange and see how many differently expressed genes left.
masterpiece is offline   Reply With Quote
Old 09-23-2012, 07:58 AM   #18
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

Yes, I definitely have replicates.

If I only kept log2foldchange set at 2, I would lose the majority of my genes, and DESeq becomes much less powerful.

I guess the fundamental question I'm asking here is, if you can't do it verify via qPCR, should you call the gene differentially expressed. And if so, then the "power" of all of these software programs is kinda useless.
billstevens is offline   Reply With Quote
Old 09-24-2012, 11:28 PM   #19
masterpiece
Member
 
Location: malaysia

Join Date: Mar 2009
Posts: 40
Default

billstevens

Unfortunately I don't think I can help you on this. My task is generating results from sequence reads, run the statistics analysis. Later i pass them to other person in our lab for validation test through qPCR. I my self don't have any experience running the test. And I'm not sure if genes with fold change as low as 1.25 can be validated through qPCR.

As in our case we set p-val = 0.05 and log2foldchange =+/-2 for the genes to be differently express, so we don't face the problem that you have mentioned.
masterpiece is offline   Reply With Quote
Old 09-27-2012, 07:53 PM   #20
vinay052003
Member
 
Location: Atlanta, US

Join Date: Jan 2010
Posts: 59
Default

Hi guys,
After reading through the discussion above I am realizing that I should also try DESeq. I have two multi-replicate samples and ran cuffdiff on them. Cuffdiff produces a read count files in output that has three types of read counts (actual, internally adjusted andexternlly adjusted) for each transcript in each replicate. Which one should I use for DESeq?
vinay052003 is offline   Reply With Quote
Reply

Tags
cuffdiff, cufflinks

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:09 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO