SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Cufflinks / Cuffdiff problem Morten Bioinformatics 42 07-13-2013 10:51 PM
Cufflinks and CuffDiff snape_ar Bioinformatics 2 11-10-2011 08:04 PM
Cufflinks/Cuffdiff...what next? nsl Bioinformatics 0 06-20-2011 07:41 AM
Cufflinks then Cuffdiff plefebvre RNA Sequencing 1 04-20-2011 05:56 AM

Reply
 
Thread Tools
Old 06-06-2011, 08:16 AM   #1
lewewoo
Member
 
Location: Moon

Join Date: Apr 2011
Posts: 60
Default Cufflinks and CuffDiff bugs?

<##I posted the following as a reply thread, but I would like to make a new thread to have more attentions to solve the problems, thanks!##>

1. New released 1.3.0, after Cuffcompare, FPKM column contains all 0, missing FPKM values even tracking files have them;

2. in all the versions of CuffDiff, if you compare different conditions against the same control samples, the FPKM in the same control samples in different comparing is different; for example,
CuffDiff I: condition 1 v.s. condition control;
CuffDiff II: condition 2 v.s condition control;

after CuffDiff, when FPKM numbers are tracked, the FPKM of Gene X in condition control in CuffDiff I is different from the FPKM of Gene X in condition control in CuffDiff II. GeneX roughly are 20-30% in total annotated genes and the rest are the same.
anybody has explanation or suggestions for this? Thanks!

Last edited by lewewoo; 06-06-2011 at 08:18 AM.
lewewoo is offline   Reply With Quote
Old 06-20-2011, 12:42 PM   #2
jbrwn
Member
 
Location: Denver, CO

Join Date: Mar 2011
Posts: 37
Default

i've also been hoping for a response to this thread (as well as the other thread you posted this question in).

cufflinks 1.0.3 is not giving FPKM values other than zero for paired-end reads from SOLiD. 1.0.3 works fine with single-end data. the same paired-end data runs fine through cufflinks 0.9.3 and FPKM values are calculated just fine.

Code:
cufflinks --output-dir $out --num-threads 8 --GTF-guide $gtf --multi-read-correct --library-type fr-secondstrand --upper-quartile-norm --label l --frag-bias-correct $hg19All.fa $bam

(assume my variable references are correct)
anyone have any ideas to try as a workaround? anyone else having similar issues?
jbrwn is offline   Reply With Quote
Old 06-21-2011, 08:26 AM   #3
lshen
Member
 
Location: Toronto

Join Date: Jan 2008
Posts: 30
Default

Seems lot of confusing changes in cufflinks. Have not able to find a fix yet.
lshen is offline   Reply With Quote
Old 06-23-2011, 01:56 PM   #4
jbrwn
Member
 
Location: Denver, CO

Join Date: Mar 2011
Posts: 37
Default

I was having a similar problem as stated in #1 by lewewoo -- cufflinks was not generating accurate FPKMs. Specifically, they were all zero.

Cufflinks 1.0.3 using SOLiD pair-ends reads at 50 bp x 35 bp mapped using Bioscope.

1. Add XS flag as per Cufflinks manual
Code:
samtools view -F 0x04 -h unedited.bam | awk 'BEGIN{OFS="\t"} (!/^@/){minus=and($2, 0x10); print $0"\tXS:A:"(minus ? "-":"+") } (/^@/){ print }' | samtools view -bhS - > xs.bam
This runs through Cufflinks and gives FPKM = 0 for everything.

2. Increment NH flag by 1 as per Cufflinks developer Adam Roberts
Code:
samtools view -F 0x04 -h xs.bam | awk 'BEGIN{OFS="\t"}(! /^@/){ split($12,a,":"); $12 = a[1]":"a[2]":"a[3]+1; print $0 } (/^@/){ print }' > xs.nh.sam
This seems to be working, but I don't have the output of a full run yet.
jbrwn is offline   Reply With Quote
Old 07-13-2011, 03:00 PM   #5
jaldrich
Junior Member
 
Location: Arizona

Join Date: Sep 2009
Posts: 5
Default

I'm seeing the same thing as lewewoo #2 statement.

I am getting different FPKM values for the same control used against two different samples in two different cuffdiff runs. Is this expected? Does cuffdiff consider all samples provided to calculate the FPKM? If this is true - what is the best workflow for getting FPKM values for samples that you want to performer further analysis on outside of cufflinks suite? Should I run cufflinks on individual samples and work with those FPKMs or should I put all samples I'm interested in analyzing into cuffdiff and use those FPKMs since they might be normalized across samples?

Any suggests or ideas to what is happening would be great!!
Thanks!
jaldrich is offline   Reply With Quote
Old 07-13-2011, 11:24 PM   #6
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

The FPKMs should have normal ranges included. Do those ranges overlap?
gringer is offline   Reply With Quote
Old 07-14-2011, 09:11 AM   #7
jaldrich
Junior Member
 
Location: Arizona

Join Date: Sep 2009
Posts: 5
Default

Good point. Thanks, gringer - quick look and the ranges do seem to overlap. I did a scatter plot and there is concordance between values with a very tight spread at extremes but quite a big spread at the middle. I guess I just expected much more agreement across the range - especially since it is the same sample.

Sorry, to ask again - but does this mean that cuffdiff does not consider both samples when calculating FPKM? (I assume this but not 100% positive this assumption is correct) What would be the recommended workflow to just get FPKM values for further analysis? Can I use cuffdiff (maybe with all the samples analyzed together if some cross sample normalization is occurring) or should I use cufflinks? BTW - I should mention - I was not using -N option (quantile normalization) in cuffdiff.

Thanks so much for the help!! This has been a big source of discussion - that is which approach to take to get FPKMs. Really appreciate it!

Last edited by jaldrich; 07-14-2011 at 09:21 AM.
jaldrich is offline   Reply With Quote
Old 07-14-2011, 11:33 PM   #8
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

I would recommend using cuffdiff for analysing FPKM, because the FPKM calculations may make assumptions that are not obvious to the people who didn't write the cufflinks/cuffdiff code.

It's probably worth having a look at a couple of runs to see the difference with and without quantile normalisation. I would expect that cufflinks is "good enough" without this, because they haven't included it as a default option even though it's relatively simple to calculate.

There's a bit of information on how things are calculated on the cufflinks website:
http://cufflinks.cbcb.umd.edu/manual.html#fpkm_track
Quote:
Cuffdiff calculates the FPKM of each transcript, primary transcript, and gene in each sample. Primary transcript and gene FPKMs are computed by summing the FPKMs of transcripts in each primary transcript group or gene group.
http://cufflinks.cbcb.umd.edu/howitworks.html#hdif
Quote:
Cuffdiff requires that transcripts in the input GTF be annotated with certain attributes in order to look for changes in primary transcript expression, splicing, coding output, and promoter use.... The above attributes, along with the gene_id required by the GTF specification, make each transcript a member of a "gene group", "primary transcript group", and "CDS group".
And later...
Quote:
Cuffdiff pools the fragments before calculating the individual isoform abundances and then examines the likelihood surface of the replicate pool via importance sampling.
Note the magic word right at the end of that, sampling. This suggests that you should expect slightly different results by running cuffdiff on the same data (it is unlikely that the sampling will be done in exactly the same way on each run).
gringer is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:31 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO