Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding Cufflinks input/output file structure

    I've been mostly doing differential expression with featureCounts + DESeq2 so far, but now I want to calculate the FPKM of my various samples / replicates as well. I was told I should do this with Cufflinks, so I gave it a go. While I can get some results, I'm a bit unsure exactly how I got them and exactly what they represent...

    First, the input files, coming last in the command (I'm running from the Terminal in OS X Mavericks). As far as I understand, I can either run a separate cufflinks command for each replicate for each sample, but I should also be able to run it as a single command, in some way. I read the documentation and searched around on SEQanswers, but I didn't really get any definitives. For example, are these different commands valid?

    Code:
    cufflinks <options> <input1.bam> <input2.bam>
    cufflinks <options> <input1.bam>,<input2.bam>
    cufflinks <options> <input*.bam>
    ... and if so, what's the difference? If have I two samples (two different cell lines) with 3 replicates each, how should I run the command(s)?

    Secondly, the output. I've gotten the four files specified in the documentation, but the one I'm supposed to be interested in (genes.fpkm_tracking) only has one "FPKM" column. I suppose this makes sense when you run each replicate as a separate command, but having used some trial and error (mostly error, I suppose ) with the above commands, I still only get one FPKM column (discounting the "FPKM_conf_lo" etc. columns). Is this how the output should be? Should it different for a single .bam-file and multiple inputs, or am I misunderstanding how the program works in some way?

    And, lastly, the --GTF or --GTF-guide options. Which do I use? I'm using Illumina paired-end, stranded data (so I'm using the --library-type fr-firststrand option), and I'm interested in knowing the FPKM for each replicate for each sample. I've thus far used the --GTF option and the same reference annotation as was used in the alignment (Tophat2). Is this the correct thinking?

    Thanks in advance!
    Last edited by ErikFas; 10-27-2014, 06:44 AM.

  • #2
    I think you have to run cufflinks one time per BAM file. That's how I have always done it and that's how it would handle things internally anyways since each set of alignments would be interpreted independently of the others. If you can supply multiple BAMs in a single command then I'd expect the program to pool them and evaluate them as a single sample.

    --GTF is a mode for quantification only. --GTF-guide is a mode for quantification plus de-novo assembly of isoforms from alignments using a supplied GTF as a guide (like giving it a starting point for de-novo assembly). --GTF-guide uses a sligntly different assembly strategy than the default (without --GTF or --GTF-guide).
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      Okay, then I'll continue doing a single command for every file, with the -G flag. Thanks for the clarification!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM
      • seqadmin
        The Impact of AI in Genomic Medicine
        by seqadmin



        Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
        02-26-2024, 02:07 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 03-14-2024, 06:13 AM
      0 responses
      34 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-08-2024, 08:03 AM
      0 responses
      72 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-07-2024, 08:13 AM
      0 responses
      81 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-06-2024, 09:51 AM
      0 responses
      68 views
      0 likes
      Last Post seqadmin  
      Working...
      X