Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I get one FPKM value per gene?

    I have been running Cufflink on a set of samples. I would like to compare the gene expression across samples. I am using the FPKM values as a measure of the gene abundance, but cuffcompare output provide more than one FPKM value per gene (for those genes that have isoforms). So, how do I go from 2+FPKM values per gene to one single value?

    Thanks!

  • #2
    I should have mentioned in my previous post, that I have tried to compare the FPKMS reported by Cufflinks in the *genes.expr files. I was wondering if cuffcompare is a better way to do that, and if so, how do I summarize the expression per gene rather than per transcript?
    Thanks

    Comment


    • #3
      You should run cuffdiff and look at the tracking files for genes. They contain the summed FPKM values of transcripts from the same gene.

      Comment


      • #4
        Thanks! I will do that.

        Just out of curiosity, why in the cufflinks output files *_genes.expr (which reports the gene-level coordinates and expression values), sometimes I get more than one row for the same gene? It's like in some cases (noncoding exons??) the FPKM values from the transcripts corresponding to the same gene do not get summed, although the transcripts are assigned to the same gene.

        Thanks in advance for your help.

        Comment


        • #5
          Originally posted by PFS View Post
          Thanks! I will do that.

          Just out of curiosity, why in the cufflinks output files *_genes.expr (which reports the gene-level coordinates and expression values), sometimes I get more than one row for the same gene? It's like in some cases (noncoding exons??) the FPKM values from the transcripts corresponding to the same gene do not get summed, although the transcripts are assigned to the same gene.

          Thanks in advance for your help.
          This is a known bug in Cufflinks and will be fixed in the next release.

          Comment


          • #6
            I have been running Cufffdiff on a set of samples using the newest available release (cuffdiff v0.8.3 (1332); 7/2/2010). The file genes.fpkm_tracking includes in some cases additional FPKM result columns as described by PFS.
            I have two questions about it.
            Is there a prospective release date for a bug-fixed cuffdiff version?
            Does it influence the subsequent differential expression/splicing analysis?
            Many thanks in advance,
            Kasimir

            Comment


            • #7
              I run into the same problem. I wonder if I could just add the two isoforms values.

              Originally posted by PFS View Post
              I have been running Cufflink on a set of samples. I would like to compare the gene expression across samples. I am using the FPKM values as a measure of the gene abundance, but cuffcompare output provide more than one FPKM value per gene (for those genes that have isoforms). So, how do I go from 2+FPKM values per gene to one single value?

              Thanks!

              Comment


              • #8
                update

                Is this supposed to have been fixed in cufflinks 0.8.3? Doesn't seem fixed to me... I'm still seeing multiple FPKMs a single gene in the _genes.expr files.

                Comment


                • #9
                  I have also been getting some duplicates when examining the genes.expr file. Aligned using tophat to hg19 and used -G option in cufflinks 0.9.2 with ensembl 59 gtf file.

                  Any ideas?

                  See some examples here:


                  gene_id bundle_id chr left right FPKM FPKM_conf_lo FPKM_conf_hi status

                  ENSG00000143198 33524 chr1 165600097 165631033 127.41 0 298.498 FAIL
                  ENSG00000143198 33524 chr1 165614897 165617907 0 0 0 OK
                  ENSG00000162105 36862 chr11 70313960 70963623 9.58183 0 170.285 FAIL
                  ENSG00000162105 36862 chr11 70753739 70754197 0 0 0 OK
                  ENSG00000162105 36862 chr11 70798845 70798972 0 0 0 OK
                  ENSG00000165899 38298 chr12 80633119 80648905 0 0 0 OK
                  ENSG00000165899 38299 chr12 80655759 80672003 0 0 0 OK
                  ENSG00000165899 38300 chr12 80707295 80726842 0 0 0 OK
                  ENSG00000165899 38301 chr12 80730291 80772870 0 0 0 OK
                  ENSG00000211890 40491 chr14 106050068 106058270 259.752 227.422 292.082 OK
                  ENSG00000211890 40491 chr14 106055295 106056387 0 0 0 OK
                  ENSG00000249751 54186 chr5 138784244 138784863 20.4268 11.3876 29.466 OK
                  ENSG00000249751 54187 chr5 138837129 138842328 22.1737 12.7559 31.5915 OK
                  ENSG00000131508 54192 chr5 138906015 139008018 35.9963 23.7319 48.2606 OK
                  ENSG00000131508 54192 chr5 138945438 138946512 0 0 0 OK

                  Comment


                  • #10
                    duplicate errors

                    jb2, I was facing duplicate errors too. In my case , later I run cufflinks without -G option , then that is fine. you may have a try.

                    Comment


                    • #11
                      I ended up writing a script to sum the FPKMS for a given gene id, which I think is right...

                      Here's my (unpolished) code (a perl script and a shell script).

                      This botches the confidence intervals, by the way.
                      Last edited by mgogol; 11-05-2010, 05:52 AM.

                      Comment


                      • #12
                        Originally posted by mgogol View Post
                        This botches the confidence intervals, by the way.
                        Yeah, that is what I was worried about, because I was considering taking those into account with my data. I will take a look at your script though since it saves me the time of writing my own.

                        Hopefully Cole or others can take a look at this and let us know what the problem might be.

                        Comment


                        • #13
                          Cufflinks

                          I was wondering if anyone knows what the status in genes.expr and transcripts.expr (output files of Cufflinks) means? I can't find the meaning in the manual. A possible meaning is "can be one of OK (test successful), NOTEST (not enough alignments for testing), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents testing", but this is actually the description of "test status" which is a column in the Cuffdiff output files.


                          What shall I do with genes (or transcripts) whose status is FAIL? Shall I assume that their FPKM is 0 or take the FPKM of these genes regardless of their status?


                          Cufflinks v0.9.1b was used in my experiments, but the problem of getting multiple FPKM for some genes still exists. Running Cufflinks without a GTF file seems to solve this problem, but then I don't know how to link the FPKM to the corresponding Ensembl ID. If I provide a GTF file when running Cufflinks, I'll get multiple FPKM and FAIL status for some genes.


                          What shall I do with genes that have multiple FPKM? Shall I add the FPKM together or choose only the FPKM that matches the start and end position of these genes?


                          Thank you very much for your time.
                          Last edited by yjlui; 11-11-2010, 07:53 AM.

                          Comment


                          • #14
                            Does someone have a small example dataset that I can run this on to find the problem?

                            Comment


                            • #15
                              Thanks for the prompt reply, Adam! Just emailed you a small dataset built from my SAM file.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X