Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • PFS
    Member
    • Mar 2010
    • 55

    How do I get one FPKM value per gene?

    I have been running Cufflink on a set of samples. I would like to compare the gene expression across samples. I am using the FPKM values as a measure of the gene abundance, but cuffcompare output provide more than one FPKM value per gene (for those genes that have isoforms). So, how do I go from 2+FPKM values per gene to one single value?

    Thanks!
  • PFS
    Member
    • Mar 2010
    • 55

    #2
    I should have mentioned in my previous post, that I have tried to compare the FPKMS reported by Cufflinks in the *genes.expr files. I was wondering if cuffcompare is a better way to do that, and if so, how do I summarize the expression per gene rather than per transcript?
    Thanks

    Comment

    • Thomas Doktor
      Senior Member
      • Apr 2009
      • 105

      #3
      You should run cuffdiff and look at the tracking files for genes. They contain the summed FPKM values of transcripts from the same gene.

      Comment

      • PFS
        Member
        • Mar 2010
        • 55

        #4
        Thanks! I will do that.

        Just out of curiosity, why in the cufflinks output files *_genes.expr (which reports the gene-level coordinates and expression values), sometimes I get more than one row for the same gene? It's like in some cases (noncoding exons??) the FPKM values from the transcripts corresponding to the same gene do not get summed, although the transcripts are assigned to the same gene.

        Thanks in advance for your help.

        Comment

        • Cole Trapnell
          Senior Member
          • Nov 2008
          • 213

          #5
          Originally posted by PFS View Post
          Thanks! I will do that.

          Just out of curiosity, why in the cufflinks output files *_genes.expr (which reports the gene-level coordinates and expression values), sometimes I get more than one row for the same gene? It's like in some cases (noncoding exons??) the FPKM values from the transcripts corresponding to the same gene do not get summed, although the transcripts are assigned to the same gene.

          Thanks in advance for your help.
          This is a known bug in Cufflinks and will be fixed in the next release.

          Comment

          • Kasimir
            Junior Member
            • Aug 2010
            • 1

            #6
            I have been running Cufffdiff on a set of samples using the newest available release (cuffdiff v0.8.3 (1332); 7/2/2010). The file genes.fpkm_tracking includes in some cases additional FPKM result columns as described by PFS.
            I have two questions about it.
            Is there a prospective release date for a bug-fixed cuffdiff version?
            Does it influence the subsequent differential expression/splicing analysis?
            Many thanks in advance,
            Kasimir

            Comment

            • frankyue50
              Member
              • Nov 2008
              • 34

              #7
              I run into the same problem. I wonder if I could just add the two isoforms values.

              Originally posted by PFS View Post
              I have been running Cufflink on a set of samples. I would like to compare the gene expression across samples. I am using the FPKM values as a measure of the gene abundance, but cuffcompare output provide more than one FPKM value per gene (for those genes that have isoforms). So, how do I go from 2+FPKM values per gene to one single value?

              Thanks!

              Comment

              • mgogol
                Senior Member
                • Mar 2008
                • 197

                #8
                update

                Is this supposed to have been fixed in cufflinks 0.8.3? Doesn't seem fixed to me... I'm still seeing multiple FPKMs a single gene in the _genes.expr files.

                Comment

                • jb2
                  Member
                  • Jun 2010
                  • 25

                  #9
                  I have also been getting some duplicates when examining the genes.expr file. Aligned using tophat to hg19 and used -G option in cufflinks 0.9.2 with ensembl 59 gtf file.

                  Any ideas?

                  See some examples here:


                  gene_id bundle_id chr left right FPKM FPKM_conf_lo FPKM_conf_hi status

                  ENSG00000143198 33524 chr1 165600097 165631033 127.41 0 298.498 FAIL
                  ENSG00000143198 33524 chr1 165614897 165617907 0 0 0 OK
                  ENSG00000162105 36862 chr11 70313960 70963623 9.58183 0 170.285 FAIL
                  ENSG00000162105 36862 chr11 70753739 70754197 0 0 0 OK
                  ENSG00000162105 36862 chr11 70798845 70798972 0 0 0 OK
                  ENSG00000165899 38298 chr12 80633119 80648905 0 0 0 OK
                  ENSG00000165899 38299 chr12 80655759 80672003 0 0 0 OK
                  ENSG00000165899 38300 chr12 80707295 80726842 0 0 0 OK
                  ENSG00000165899 38301 chr12 80730291 80772870 0 0 0 OK
                  ENSG00000211890 40491 chr14 106050068 106058270 259.752 227.422 292.082 OK
                  ENSG00000211890 40491 chr14 106055295 106056387 0 0 0 OK
                  ENSG00000249751 54186 chr5 138784244 138784863 20.4268 11.3876 29.466 OK
                  ENSG00000249751 54187 chr5 138837129 138842328 22.1737 12.7559 31.5915 OK
                  ENSG00000131508 54192 chr5 138906015 139008018 35.9963 23.7319 48.2606 OK
                  ENSG00000131508 54192 chr5 138945438 138946512 0 0 0 OK

                  Comment

                  • middlemale
                    Member
                    • Feb 2010
                    • 16

                    #10
                    duplicate errors

                    jb2, I was facing duplicate errors too. In my case , later I run cufflinks without -G option , then that is fine. you may have a try.

                    Comment

                    • mgogol
                      Senior Member
                      • Mar 2008
                      • 197

                      #11
                      I ended up writing a script to sum the FPKMS for a given gene id, which I think is right...

                      Here's my (unpolished) code (a perl script and a shell script).

                      This botches the confidence intervals, by the way.
                      Last edited by mgogol; 11-05-2010, 05:52 AM.

                      Comment

                      • jb2
                        Member
                        • Jun 2010
                        • 25

                        #12
                        Originally posted by mgogol View Post
                        This botches the confidence intervals, by the way.
                        Yeah, that is what I was worried about, because I was considering taking those into account with my data. I will take a look at your script though since it saves me the time of writing my own.

                        Hopefully Cole or others can take a look at this and let us know what the problem might be.

                        Comment

                        • yjlui
                          Junior Member
                          • May 2010
                          • 5

                          #13
                          Cufflinks

                          I was wondering if anyone knows what the status in genes.expr and transcripts.expr (output files of Cufflinks) means? I can't find the meaning in the manual. A possible meaning is "can be one of OK (test successful), NOTEST (not enough alignments for testing), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents testing", but this is actually the description of "test status" which is a column in the Cuffdiff output files.


                          What shall I do with genes (or transcripts) whose status is FAIL? Shall I assume that their FPKM is 0 or take the FPKM of these genes regardless of their status?


                          Cufflinks v0.9.1b was used in my experiments, but the problem of getting multiple FPKM for some genes still exists. Running Cufflinks without a GTF file seems to solve this problem, but then I don't know how to link the FPKM to the corresponding Ensembl ID. If I provide a GTF file when running Cufflinks, I'll get multiple FPKM and FAIL status for some genes.


                          What shall I do with genes that have multiple FPKM? Shall I add the FPKM together or choose only the FPKM that matches the start and end position of these genes?


                          Thank you very much for your time.
                          Last edited by yjlui; 11-11-2010, 07:53 AM.

                          Comment

                          • adarob
                            Member
                            • Jul 2010
                            • 71

                            #14
                            Does someone have a small example dataset that I can run this on to find the problem?

                            Comment

                            • yjlui
                              Junior Member
                              • May 2010
                              • 5

                              #15
                              Thanks for the prompt reply, Adam! Just emailed you a small dataset built from my SAM file.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 08:59 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              22 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              32 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...