SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
cufflinks-1.0.3 produces very high FPKM values when compared to cufflinks-0.9.3. Why? pinki999 Bioinformatics 5 06-09-2012 06:48 AM
Cufflinks cufflinks v1.0.3 - segmentation fault bias correction chrNT annotations adrian Bioinformatics 0 06-08-2011 01:28 PM

Reply
 
Thread Tools
Old 10-11-2009, 09:50 PM   #1
jling
Junior Member
 
Location: 21218

Join Date: Oct 2009
Posts: 9
Default Cufflinks

Hi,

Has anybody tried the new cufflinks package?
http://cufflinks.cbcb.umd.edu/

I tried using it on my .gtf file from tophat but for some reason the conf_lo and conf_hi all end up as 0.0000, leading me to think that I've done something wrong.

Has cufflinks worked for anybody else?

Thanks!
jling is offline   Reply With Quote
Old 10-12-2009, 07:19 AM   #2
tebuffer
Member
 
Location: Bethesda, USA

Join Date: Jun 2009
Posts: 13
Default works for me !

Tophat gave me the alignment file in .sam format (along with the .wig and .bed files). This RNA-seq alignment (.sam) was my input to cufflinks. cufflinks then gives me the .gtf file (along with other files .expr, .refmap, .tmap) that has the transcript data. cuffcompare then takes the .gtf files from two different experiments and computes the differential expression.

It works for me.

TEB
tebuffer is offline   Reply With Quote
Old 10-12-2009, 07:49 AM   #3
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

The conf_hi and conf_lo will be 0.0 when there is only one isoform in a locus. When there is more than one transcript in a locus, and they share a bunch of exons, then any alignments in those regions could have come from either isoform. The inference assigns the alignments to individual transcripts, and the confidence interval refers to confidence in the assignment. Thus, if there is only one transcript in a locus, there is only one transcript to assign that locus' alignments to, and so Cufflinks has total confidence in the assignment of reads to transcripts in the locus, and so the confidence interval has zero length.

Are you sure ALL of the transcripts have conf_hi and conf_lo equal to 0.0? Or is just those that aren't in alternatively spliced genes, etc.
Cole Trapnell is offline   Reply With Quote
Old 10-12-2009, 08:12 AM   #4
jling
Junior Member
 
Location: 21218

Join Date: Oct 2009
Posts: 9
Default

Yes, all the conf_lo/high are equal to 0. I suppose the data could only contain a single isoform for each locus. Thanks for the response, I guess I'll look over the data again
jling is offline   Reply With Quote
Old 10-12-2009, 03:03 PM   #5
jling
Junior Member
 
Location: 21218

Join Date: Oct 2009
Posts: 9
Default

Can someone explain what these warning messages refer to?

> Warning: ITERMAX reached in abundance estimation, estimation hasn't fully converged

> Warning: restimation failed, importance samples have zero weight

Last edited by jling; 10-12-2009 at 03:21 PM.
jling is offline   Reply With Quote
Old 10-12-2009, 03:39 PM   #6
Cole Trapnell
Senior Member
 
Location: Boston, MA

Join Date: Nov 2008
Posts: 212
Default

The first warning means that in that locus the maximum likelihood calculation for the relative abundances of the locus' transcripts hasn't fully converged. However, in my experience, and in looking at a lot of simulation data, etc., this warning is generally safe to ignore - throwing enough CPU cycles to reach convergence generally doesn't improve the accuracy significantly. I will probably expose a few of the parameters, including the convergence threshold, as user-specified parameters for those users who wish to spend more cycles on the MLE calculation.

The second warning is another story. The calculation of relative abundances in a locus relies on a method called "importance sampling" to compute a confidence box around the relative abundances, and this procedure can fail under certain circumstances. One way for this to happen is when you simply don't have more than one or two reads in the salient features of particular isoform, but lots in the other isoforms for the locus (i.e. one of them is VERY low abundance relative to the others). So Cufflinks has a bunch of checks to detect when the estimation in a locus is likely to be unreliable, and these show up as one of several warnings.

Note that when you get a warning like the importance sampling one, the confidence intervals will be very large - so it should be clear looking at the results where Cufflinks couldn't distinguish the abundances correctly.
Cole Trapnell is offline   Reply With Quote
Old 10-12-2009, 04:03 PM   #7
jling
Junior Member
 
Location: 21218

Join Date: Oct 2009
Posts: 9
Default

Thanks so much for the response. I figured out why I was getting all my conf_hi and conf_low values as 0. Apparently I forgot to run the initial cufflinks with the .gtf reference parameter (-G) and thus duplicate loci were never combined once I used cuffcompare even though I used the -r parameter for cuffcompare. Foolish mistake.

Thanks again!

Edit: This is some great software. It's so nice to have ERANGE and Tophat combined.
jling is offline   Reply With Quote
Reply

Tags
conf_hi, conf_lo, cufflink, cufflinks, output

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO