SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Understanding Cufflinks input/output file structure (http://seqanswers.com/forums/showthread.php?t=47794)

ErikFas 10-27-2014 07:40 AM

Understanding Cufflinks input/output file structure
 
I've been mostly doing differential expression with featureCounts + DESeq2 so far, but now I want to calculate the FPKM of my various samples / replicates as well. I was told I should do this with Cufflinks, so I gave it a go. While I can get some results, I'm a bit unsure exactly how I got them and exactly what they represent...

First, the input files, coming last in the command (I'm running from the Terminal in OS X Mavericks). As far as I understand, I can either run a separate cufflinks command for each replicate for each sample, but I should also be able to run it as a single command, in some way. I read the documentation and searched around on SEQanswers, but I didn't really get any definitives. For example, are these different commands valid?

Code:

cufflinks <options> <input1.bam> <input2.bam>
cufflinks <options> <input1.bam>,<input2.bam>
cufflinks <options> <input*.bam>

... and if so, what's the difference? If have I two samples (two different cell lines) with 3 replicates each, how should I run the command(s)?

Secondly, the output. I've gotten the four files specified in the documentation, but the one I'm supposed to be interested in (genes.fpkm_tracking) only has one "FPKM" column. I suppose this makes sense when you run each replicate as a separate command, but having used some trial and error (mostly error, I suppose :p) with the above commands, I still only get one FPKM column (discounting the "FPKM_conf_lo" etc. columns). Is this how the output should be? Should it different for a single .bam-file and multiple inputs, or am I misunderstanding how the program works in some way?

And, lastly, the --GTF or --GTF-guide options. Which do I use? I'm using Illumina paired-end, stranded data (so I'm using the --library-type fr-firststrand option), and I'm interested in knowing the FPKM for each replicate for each sample. I've thus far used the --GTF option and the same reference annotation as was used in the alignment (Tophat2). Is this the correct thinking?

Thanks in advance!

sdriscoll 10-27-2014 11:48 PM

I think you have to run cufflinks one time per BAM file. That's how I have always done it and that's how it would handle things internally anyways since each set of alignments would be interpreted independently of the others. If you can supply multiple BAMs in a single command then I'd expect the program to pool them and evaluate them as a single sample.

--GTF is a mode for quantification only. --GTF-guide is a mode for quantification plus de-novo assembly of isoforms from alignments using a supplied GTF as a guide (like giving it a starting point for de-novo assembly). --GTF-guide uses a sligntly different assembly strategy than the default (without --GTF or --GTF-guide).

ErikFas 10-28-2014 01:15 AM

Okay, then I'll continue doing a single command for every file, with the -G flag. Thanks for the clarification!


All times are GMT -8. The time now is 06:32 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.