![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem with Cufflinks output | admiral | RNA Sequencing | 0 | 06-01-2011 02:19 PM |
HTSeq output not correlated with Cufflinks output... Help | gen2prot | Bioinformatics | 5 | 01-31-2011 10:16 AM |
Interesting cufflinks output | jetspeeder | Bioinformatics | 8 | 12-21-2010 07:51 AM |
Cufflinks output - Next wat? | ritzriya | RNA Sequencing | 5 | 09-13-2010 04:03 AM |
cufflinks cuffcompare output | Mark | Bioinformatics | 1 | 07-19-2010 08:23 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: malaysia Join Date: Mar 2009
Posts: 40
|
![]()
Hi all,
I'm working on differential expression analysis on bacterial transcriptome using cufflinks packages. However I found these result (table below) that confusing me. Some of the expression profiles reported in the cuffdiffs analysis are not consistent with the cufflinks analysis. Below is part of the combined results from cuffdiff (column 2- 10) and cufflinks (column 11 - 14). Code:
no gene FPKM (sampleA) FPKM (sampleB) log2(fold_change) fold_change test_stat p_value q_value significant FPKM (SampleA-r1) FPKM (SampleA-r2) FPKM (SampleB-r1) FPKM (SampleB-r2) 1 gene0090 0 0.280696 1.79769e+308 #VALUE! 1.79769e+308 0.26143 1 no 11.2027 11.0176 6.73073 14.1696 2 gene0091 0 0 0 1 0 1 1 no 44.6099 57.6125 27.8334 30.6339 3 gene1910 0 0.440214 1.79769e+308 #VALUE! 1.79769e+308 0.134216 1 no 11.7053 14.0353 33.6464 45.4033 4 gene2086 0 0 0 1 0 1 1 no 2.29351 1.15464 0.654272 6.82639 5 gene2223 0 0.277117 1.79769e+308 #VALUE! 1.79769e+308 0.341546 1 no 0 0 0 0.660894 6 gene2488 0 0 0 1 0 1 1 no 10.4025 11.889 8.49335 17.6908 7 gene3116 0 2.19548 1.79769e+308 #VALUE! 1.79769e+308 0.270146 1 no 43.535 54.3667 31.5445 29.4442 8 gene3375 0 8.83927 1.79769e+308 #VALUE! 1.79769e+308 0.156186 0.371228 no 0 0 8.83593 8.62098 9 genet08 0 47943.9 1.79769e+308 #VALUE! 1.79769e+308 0.216948 0.441909 no 215745 289492 283647 224841 10 genet15 0 0 0 1 0 1 1 no 0 32165.7 0 44968.3 11 genet19 0 0 0 1 0 1 1 no 0 0 0 22484.1 12 genet30 0 20919.6 1.79769e+308 #VALUE! 1.79769e+308 0.26143 0.486515 no 34511.2 109316 0 49890.9 13 genet35 0 47943.9 1.79769e+308 #VALUE! 1.79769e+308 0.216948 0.441909 no 61641.3 0 81041.9 44968.3 14 gene1061 0 4.64699 1.79769e+308 #VALUE! 1.79769e+308 0.0974253 1 no 0 0 239.346 308.699 15 gene2057 0 0 0 1 0 1 1 no 10.5358 12.5268 7.81085 17.5835 16 gene1219 279.782 126.536 -1.14476 0.452264919 4.02067 5.80E-05 0.000723236 yes 294.917 272.507 98.3558 163.011 17 gene0454 281.243 122.751 -1.19608 0.436459592 4.50118 6.76E-06 0.000105364 yes 319.036 254.564 112.823 133.568 18 gene0696 281.492 119.564 -1.23531 0.424751221 4.65254 3.28E-06 5.51E-05 yes 299.484 271.801 101.908 141.367 19 gene1908 295.083 175.718 -0.747857 0.595487447 2.83873 0.0045293 0.0283197 yes 293.55 302.417 149.607 207.991 20 gene3330 298.322 59.7793 -2.31915 0.200385497 8.05916 6.66E-16 4.98E-14 yes 316.381 288.917 47.3887 75.7079 21 gene2943 304.366 157.484 -0.950603 0.517416153 3.27473 0.00105763 0.00865789 yes 286.907 325.549 166.413 140.802 22 gene1286a 304.854 1411.33 2.21087 4.629543685 -6.29067 3.16E-10 1.26E-08 yes 368.954 256.093 1544.83 1187.39 23 gene2931 307.767 74.4498 -2.0475 0.241902905 6.94379 3.82E-12 1.86E-10 yes 398.538 236.189 80.5042 63.9738 24 gene2526 310.296 73.1319 -2.08507 0.235684698 5.75042 8.90E-09 2.70E-07 yes 293.458 331.069 70.5113 74.949 25 gene3402 323.668 104.076 -1.63688 0.321551116 5.94465 2.77E-09 9.15E-08 yes 318.393 334.796 101.337 105.236 26 gene0394 324.037 294.276 -0.138986 0.90815723 0.301882 0.762742 0.881605 no 335.479 320.832 244.124 357.452 27 gene2014 324.814 336.958 0.0529539 1.037386787 -0.190118 0.849217 0.931596 no 345.453 313.737 348.649 311.732 28 gene0458 325.072 154.641 -1.07184 0.475711893 4.01458 5.96E-05 0.000732211 yes 364.987 297.467 128.933 186.893 29 gene3221 326.916 193.231 -0.758593 0.591072498 2.64956 0.00805976 0.0447439 yes 315.978 342.966 138.708 265.134 30 gene1964 330.623 151.484 -1.12602 0.458177971 3.99545 6.46E-05 0.000786006 yes 322.485 344.348 132.149 174.834 Below is the command that I used to run cufflinks and cuffdiffs cufflinks command (example for SampleA_r1) Code:
cufflinks -G combined.gtf SampleA_r1-accepted_hits.bam Code:
cuffdiff -o SampleA-vs-SampleB-cufflinks cuffcmp.combined.gtf SampleA-r1-accepted_hits.bam,SampleA-r2-accepted_hits.bam SampleB-r1-accepted_hits.bam,SampleB-r2-accepted_hits.bam Thanks in advance - kamal |
![]() |
![]() |
![]() |
#2 |
Member
Location: boston Join Date: May 2012
Posts: 29
|
![]()
I got the same problem.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: malaysia Join Date: Mar 2009
Posts: 40
|
![]()
Great!, I thought I was the only person who got this problem. Bad thing is, I haven't solved the problem yet, but at the moment I just use results from cufflinks run as I looking into differential expression analysis
|
![]() |
![]() |
![]() |
#4 |
Member
Location: boston Join Date: May 2012
Posts: 29
|
![]()
No, please don't use the previous result.
I believe the problem is already fixed in the new version. Try the new V2.02 version. there are also huge difference between V1.3 and V2.02, it is long story, but I belive v2.02 is indeed better. |
![]() |
![]() |
![]() |
#5 |
Member
Location: boston Join Date: May 2012
Posts: 29
|
![]()
I meant, the version 2.02 cuffdiff also output the FPKM of all the replicates, so there is no need to run the cufflinks anymore.
|
![]() |
![]() |
![]() |
#6 |
Member
Location: malaysia Join Date: Mar 2009
Posts: 40
|
![]()
Hmm... I've tried rerun the same command using cufflink version 2.02 (previously version 2.01), however the problem still remain
![]() On another note, I hope for future cufflinks version (cuffdiff run) they will report out replicate variation as well (Something like DESeq output reporting). Sometime I want to see how much is the gene expression variation between replicate. |
![]() |
![]() |
![]() |
#7 |
Member
Location: boston Join Date: May 2012
Posts: 29
|
![]()
Are you using different gene annotation gtf files in cuffdiff and cufflinks?
combined.gtf and cuffcmp.combined.gtf |
![]() |
![]() |
![]() |
#8 |
Member
Location: malaysia Join Date: Mar 2009
Posts: 40
|
![]()
Yes, I'm using different gtf files as the cuffdiff required for me to combined the gtf files by using the cuffmerge command. But technically both gtf files (combined.gtf and cuffcmp.combined.gtf ) have the same information except additional columns in the cuffcmp.combined.gtf added by cuffmerge.
I don't think my way of running the program is the problem, as we have already validate some of the FPKM data, seem that the results are very good. The problem here is only some of the genes which have inconsistent FPKM output from cuffdiff and cufflinks. From my calculation about 50 genes (~1% of total genes) have this kind inconsistency. I attached the simplified version of table I posted on my 1st post (ahh.. should have done it on the 1st place). As an example in row no 9 (genet08), in cufflinks analysis FPKM in both sampleA replicates reported as high as 20000, but in cuffdiffs analysis the FPKM reported as 0 |
![]() |
![]() |
![]() |
#9 |
Member
Location: boston Join Date: May 2012
Posts: 29
|
![]()
Here is my experience, hope it may help a little bit.
The version 1.3 cuffdiff only give FPKM of a condition, but not the FPKMs of individual replicates. I realized that some condition FPKMs from cuffdiff are significantly different from FPKMs of the biological replicates from cufflinks. The version 2.02 cuffdiff will output both condition FPKM and replicate FPKMs. For me, the problem is solved, since I don't need to use cufflinks to get the replicate FPKM any more. I am not sure whether you tried the same gtf files in cufflinks and cuffdiff. Do you think you will get the same FPKM if you do that way? |
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: Baltimore Join Date: Mar 2012
Posts: 120
|
![]()
sqcrft, the FPKM of each replicate that cuffdiff produces (this is the genes.read.group.tracking file) does not even match the FPKM that cufflinks produces. And the FPKM of the genes.read.grouptracking file that cuffdiff produces does not remotely come close to the "Values" that the gene_expr.diff files produce.
I am honestly baffled by all of this. Here is an example for gene GPR89C. My Cufflinks files produce the following FPKM values for Sample 1: 16,16,12 and for Sample 2: 16,21,15. My Cuffdiff produces the genes.readgroup.tracking file that sqcrft is referring to and it produces the following FPKM values for Sample 1: .2,.1,.08 and for Sample 2: .16,.19,.19 Lastly, Cuffdiff produces the gene.expr.diff file, the one that groups the values by condition and the one most people are interested in, produces the following values: Sample 1: 0.12 and Sample 2: 3.44 I haven't checked other genes yet, but I will. I'm thinking the best thing might be to just use the output files from Cufflinks and use that for analysis. |
![]() |
![]() |
![]() |
#11 |
Member
Location: malaysia Join Date: Mar 2009
Posts: 40
|
![]()
sqcrft
Apology for late reply due to busy with other project. I've tried run according to your suggestion, but the problem still there. No luck I think. Well I think at the moment I just focus on 99% genes which shown to be ok (so far). billsteven, Looks like we are in the same boat. Do tell me if you find solution for this prob Last edited by masterpiece; 09-06-2012 at 04:26 AM. |
![]() |
![]() |
![]() |
#12 |
Senior Member
Location: Baltimore Join Date: Mar 2012
Posts: 120
|
![]()
My solution was switching to DESeq. My two cents are that its much preferable to my needs. Very simple to follow, very logical, and the authors are very active on the forum. Additionally, it has a lot of quality control metrics that I did not find with Cufflinks/CummeRbund, plus Cufflinks really messed me up by having a new version that gives such drastically different results. Again, that's just my two cents.
|
![]() |
![]() |
![]() |
#13 |
Member
Location: Aperture Science Join Date: Mar 2012
Posts: 59
|
![]()
I have had similar experience with Cuffdiff and have as well switched to DESeq as Bill Stevens did. Got completely different results compared to Cuffdiff which makes it pretty much impossible for me to say which method produces the most correct results.
Last edited by glados; 09-12-2012 at 04:31 AM. |
![]() |
![]() |
![]() |
#14 |
Member
Location: Sheffield, UK Join Date: Dec 2011
Posts: 32
|
![]()
We've been seeing this for quite a while now. There is another thread about this here: http://seqanswers.com/forums/showthread.php?t=16962
There was some claim that using 2.0.2 fixed this, although I havn't checked this yet. There was also a dicussion on if it was due to unrealsable high FPKMs on short genes for which there is an explaination, although we see it in all genes. The FPKMs from CuffDiff are much more believable than those from Cufflinks if you ask me. Our pipelines run but DESeq and CuffDiff, and I generally use the CuffDiff quantitiations for everything except differentially expression, for that we choose a CuffDIff, edgeR or DESeq based on the problem, often we use two or three methods and look at the overlap. There is generally a pretty good corrolation in the fold changes between CuffDiff and the count based methods, but quite a big difference in the p-values. Of course if you are interested in doing more complex GLM type designs, you'll need to use something other than CuffDIff. On the other hand, if you are working in a system where you think differential splicing is going to be important, then CuffDiff is pretty much your only choice. |
![]() |
![]() |
![]() |
#15 |
Member
Location: malaysia Join Date: Mar 2009
Posts: 40
|
![]()
Bill Steven,
Yup, I did tried using the DESeq for analysis and I prefer the way it report the results compared to cuffdiff. They report out "variation between replicates", which is not provided in cuffdiff. IMHO as molecular biologist, I think the stats is important for me to know how different is the expression between replicates, But as mention by Sudders, for my coming project I may need to look into differential splicing, for this I have to go with cuffdiff. I think there is no such things as one software suits all. Galdos, I've tried to compared both software. Yes, the results are different because both using different algorithm. The way DESeq treat the number of read count is also different. DESeq only count reads that only uniquely mapped, Cufflink/cuffdiff they have their own way to count reads that are not uniquely mapped. From my comparison test, they maybe different in term of FPKM/RPKM, but most of them have same profiles. So I think both of them still good. To be sure, you need to validate them with lab work. Sudders, Unfortunately the latest version still didn't solve the problem. At first I'm using 1.4.1, but even after I update to 2.01 and 2.02, the problem is still there. I wish I don't have to use several software to get one job done because I hate when people questioning me "So tell me, which one gives the best result and explain me why (answer not more than 50 words and use only layman term)?" ![]() Last edited by masterpiece; 09-22-2012 at 03:55 PM. |
![]() |
![]() |
![]() |
#16 | |
Senior Member
Location: Baltimore Join Date: Mar 2012
Posts: 120
|
![]() Quote:
There's also DEXSeq, which does splicing, so at least you can have some consistency in your method. Whatever you do, don't stick with 1.4, as even the authors will tell you, that version was not good. One thing you mentioned about lab work, I'm assuming you mean qPCR. How do you handle it with fold changes less than 2? I get many genes that DESeq says are differentially expressed, and the fold change might be as low as 1.25, and no one in my lab seems to think I can validate that with qPCR. |
|
![]() |
![]() |
![]() |
#17 | |
Member
Location: malaysia Join Date: Mar 2009
Posts: 40
|
![]() Quote:
Normally in my case, apart from using p-value or FDR to get a list of differently expressed genes, I'll also include in log2foldchange set at +/-2. Maybe you should try include log2foldchange and see how many differently expressed genes left. |
|
![]() |
![]() |
![]() |
#18 |
Senior Member
Location: Baltimore Join Date: Mar 2012
Posts: 120
|
![]()
Yes, I definitely have replicates.
If I only kept log2foldchange set at 2, I would lose the majority of my genes, and DESeq becomes much less powerful. I guess the fundamental question I'm asking here is, if you can't do it verify via qPCR, should you call the gene differentially expressed. And if so, then the "power" of all of these software programs is kinda useless. |
![]() |
![]() |
![]() |
#19 |
Member
Location: malaysia Join Date: Mar 2009
Posts: 40
|
![]()
billstevens
Unfortunately I don't think I can help you on this. My task is generating results from sequence reads, run the statistics analysis. Later i pass them to other person in our lab for validation test through qPCR. I my self don't have any experience running the test. And I'm not sure if genes with fold change as low as 1.25 can be validated through qPCR. As in our case we set p-val = 0.05 and log2foldchange =+/-2 for the genes to be differently express, so we don't face the problem that you have mentioned. |
![]() |
![]() |
![]() |
#20 |
Member
Location: Atlanta, US Join Date: Jan 2010
Posts: 59
|
![]()
Hi guys,
After reading through the discussion above I am realizing that I should also try DESeq. I have two multi-replicate samples and ran cuffdiff on them. Cuffdiff produces a read count files in output that has three types of read counts (actual, internally adjusted andexternlly adjusted) for each transcript in each replicate. Which one should I use for DESeq? |
![]() |
![]() |
![]() |
Tags |
cuffdiff, cufflinks |
Thread Tools | |
|
|