Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding Cufflinks input/output file structure

    I've been mostly doing differential expression with featureCounts + DESeq2 so far, but now I want to calculate the FPKM of my various samples / replicates as well. I was told I should do this with Cufflinks, so I gave it a go. While I can get some results, I'm a bit unsure exactly how I got them and exactly what they represent...

    First, the input files, coming last in the command (I'm running from the Terminal in OS X Mavericks). As far as I understand, I can either run a separate cufflinks command for each replicate for each sample, but I should also be able to run it as a single command, in some way. I read the documentation and searched around on SEQanswers, but I didn't really get any definitives. For example, are these different commands valid?

    Code:
    cufflinks <options> <input1.bam> <input2.bam>
    cufflinks <options> <input1.bam>,<input2.bam>
    cufflinks <options> <input*.bam>
    ... and if so, what's the difference? If have I two samples (two different cell lines) with 3 replicates each, how should I run the command(s)?

    Secondly, the output. I've gotten the four files specified in the documentation, but the one I'm supposed to be interested in (genes.fpkm_tracking) only has one "FPKM" column. I suppose this makes sense when you run each replicate as a separate command, but having used some trial and error (mostly error, I suppose ) with the above commands, I still only get one FPKM column (discounting the "FPKM_conf_lo" etc. columns). Is this how the output should be? Should it different for a single .bam-file and multiple inputs, or am I misunderstanding how the program works in some way?

    And, lastly, the --GTF or --GTF-guide options. Which do I use? I'm using Illumina paired-end, stranded data (so I'm using the --library-type fr-firststrand option), and I'm interested in knowing the FPKM for each replicate for each sample. I've thus far used the --GTF option and the same reference annotation as was used in the alignment (Tophat2). Is this the correct thinking?

    Thanks in advance!
    Last edited by ErikFas; 10-27-2014, 06:44 AM.

  • #2
    I think you have to run cufflinks one time per BAM file. That's how I have always done it and that's how it would handle things internally anyways since each set of alignments would be interpreted independently of the others. If you can supply multiple BAMs in a single command then I'd expect the program to pool them and evaluate them as a single sample.

    --GTF is a mode for quantification only. --GTF-guide is a mode for quantification plus de-novo assembly of isoforms from alignments using a supplied GTF as a guide (like giving it a starting point for de-novo assembly). --GTF-guide uses a sligntly different assembly strategy than the default (without --GTF or --GTF-guide).
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      Okay, then I'll continue doing a single command for every file, with the -G flag. Thanks for the clarification!

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      17 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      46 views
      0 likes
      Last Post seqadmin  
      Working...
      X