Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can I use FPKM to represent gene expression

    Dear All
    I am a newbie to the RNA-seq data analysis field. Currently, I'm in
    charge of analyzing some human NGS samples (single end) in a disease-control comparative setting. I have 10 BAM files (biological replicates) from tophat, each having the size~4GB.

    I followed the tophat-cufflinks-cuffcompare-cuffdiff pipeline (using
    hg19 reference) to find the differentially expressed genes between experimental and control conditions.

    I have no problem getting assembled results from cufflinks for each sample but I am stuck at the final cuffdiff step (the problem seems to be an insufficient memory issue as I constantly received bad-alloc feedback from the shell)

    So I wonder if I can just use the FPKM value from the cufflink genes.fpkm_tracking file of each sample as the gene expression values and use traditional statistical methods to identify differentially expressed genes between two groups? (e.g. multiple
    t-test, SAM analysis etc.)

    Thanks in advance

  • #2
    You'd probably want to read here:


    And here:


    My quick-glance summary from that second FAQ is the following:
    Current count-based differential expression tools are poorly suited to differential expression analysis in genomes with alternatively spliced genes. The main reason for this is that when a gene has multiple isoforms, a change in the total number of reads or fragments from that gene doesn't always correspond to a change in expression for that gene. Conversely, a gene's expression may change, but the total number of fragments generated by its isoforms may be very similar. In order to detect changes accurately, it's necessary to estimate how many fragments came from each individual splice variant in each sample. Current count-based tools don't do this (to our knowledge - please send us email if you know of one!). Even if they did, fragments that come from parts of genes that are shared by more than one splice variant can't generally assigned to a single isoform, so the fragment counts for each isoform are only estimates, and there is some uncertainty in the counts. Isoforms that are very similar will have a great deal of uncertainty surrounding their fragment counts. This uncertainty needs to be accounted for when testing for differential expression. So while you could use Cufflinks to estimate isoform-level counts, you'd be throwing away Cufflinks' uncertainty, and thus have more confidence in the differences you see than you really should. This will probably lead to many false positives in your analysis. Furthermore, we do not normalize simply by the length to calculate FPKM but an effective length, as explained in our publications. Calculting counts from FPKM by multiplying by the length will give incorrect results. We strongly encourage you to consider using Cuffdiff to find differentially expressed genes and transcripts.
    In other words, if you're using cufflinks, it is also recommended to use cuffdiff. Note that tophat seems to be under somewhat heavy development at the moment. If you're not using the latest versions (cufflinks 1.0.3, tophat 1.3.1), there may be bugs that have been fixed to solve the memory issues.

    Comment


    • #3
      Recently I was running cuffdiff with 6 SOLiD BioScope 1.3 mapped BAM files (3 control and 3 treatment, total of about 40.2Gb with the smallest file being about 5Gb and the largest about 12Gb) and was getting bad_alloc failures too.

      I just took a look at our cluster's swap setup and then made a temporary swap big enough to let cuffdiff run. The machine I was using has 24Gb RAM, but had a small swap (not sure why, it shipped from Penguin that way), so I made an empty file of 24Gb and appended that to swap and cuffdiff ran just fine after that (used all the RAM of course, and about 13-14Gb of the swap, so I was overly generous but it worked).

      So, you may be able to run cuffdiff by just creating a nice massive temporary swap file for the run.
      Last edited by mbblack; 07-01-2011, 08:16 AM.
      Michael Black, Ph.D.
      ScitoVation LLC. RTP, N.C.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      49 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      67 views
      0 likes
      Last Post seqadmin  
      Working...
      X