I've been mostly doing differential expression with featureCounts + DESeq2 so far, but now I want to calculate the FPKM of my various samples / replicates as well. I was told I should do this with Cufflinks, so I gave it a go. While I can get some results, I'm a bit unsure exactly how I got them and exactly what they represent...
First, the input files, coming last in the command (I'm running from the Terminal in OS X Mavericks). As far as I understand, I can either run a separate cufflinks command for each replicate for each sample, but I should also be able to run it as a single command, in some way. I read the documentation and searched around on SEQanswers, but I didn't really get any definitives. For example, are these different commands valid?
Code:
cufflinks <options> <input1.bam> <input2.bam>
cufflinks <options> <input1.bam>,<input2.bam>
cufflinks <options> <input*.bam>
... and if so, what's the difference? If have I two samples (two different cell lines) with 3 replicates each, how should I run the command(s)?
Secondly, the output. I've gotten the four files specified in the documentation, but the one I'm supposed to be interested in (
genes.fpkm_tracking) only has one
"FPKM" column. I suppose this makes sense when you run each replicate as a separate command, but having used some trial and error (mostly error, I suppose

) with the above commands, I still only get one FPKM column (discounting the
"FPKM_conf_lo" etc. columns). Is this how the output should be? Should it different for a single .bam-file and multiple inputs, or am I misunderstanding how the program works in some way?
And, lastly, the
--GTF or
--GTF-guide options. Which do I use? I'm using Illumina paired-end, stranded data (so I'm using the
--library-type fr-firststrand option), and I'm interested in knowing the FPKM for each replicate for each sample. I've thus far used the
--GTF option and the same reference annotation as was used in the alignment (Tophat2). Is this the correct thinking?
Thanks in advance!