Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is cufflinks fundamentally flawed?

    I have been running tophat/cufflinks/cuffdiff on fungal and human RNAseq data. Some of the FPKM values seemed high so I decided to look at the alignment files (accepted_hits.bam) to count numbers of reads hitting selected genes. I am unable to come up with anything near the values produced in the cuffdiff output. For example one cufflinks locus (XLOC) had a reported FPKM of 421 yet there were zero reads mapping in the corresponding genomic region.

    A related issue is that some of the reported loci span genomic regions well beyond the borders of transcripts defined in the supplied .gtf file. However, inspection of the alignment file reveals no reads that support the extension of the transcript.

    Am I missing something here? Specifically, does cufflinks use information other than that contained in the accepted_hits.bam file to calculate FPKM and define its (XLOC) loci?

  • #2
    I'm currently following up on this by generating a control dataset containing known transcript abundances. Stay tuned...

    Comment


    • #3
      Cufflinks IS flawed

      So, using artificially-generated control datasets, I find that cufflinks is flawed in two ways:

      First, it's FPKM values are inflated. Problem is the magnitude of inflation varies from gene-to-gene - there is no consistency in the error.

      Second the "locus" interval defined in the cuffdiff output is often just plain wrong. In many instances, the reported "locus" frequently spans multiple transcripts and intergenic regions, even though the dataset contains reads from only one transcript. In other words, neither the .gtf file, nor the input sequence data support expansion of the "locus" to cover multiple genes.

      Comment


      • #4
        Hi, I've met almost the same problem in addition in gtf file for mm9 from UCSC annotation I have:

        chr10 unknown exon 80640798 80640979 . + . gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168";
        chr10 unknown CDS 80641426 80641637 . + 2 gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168";
        chr10 unknown exon 80641426 80641637 . + . gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168";
        chr10 unknown exon 80641706 80641758 . + . gene_id "Snord37"; gene_name "Snord37"; transcript_id "NR_028549"; tss_id "TSS16143";
        chr10 unknown CDS 80641826 80642004 . + 0 gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168";
        chr10 unknown exon 80641826 80642004 . + . gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168";
        chr10 unknown CDS 80642091 80642196 . + 1 gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168";
        chr10 unknown exon 80642091 80642196 . + . gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168";
        chr10 unknown CDS 80642289 80642541 . + 0 gene_id "Eef2"; gene_name "Eef2"; p_id "P7224"; transcript_id "NM_007907"; tss_id "TSS5168";

        Snord37 gene inside Eef2 and length of the Snord37 gene is just 52 but in cuffdiff output I've got:
        Snord37 Snord37 Snord37 chr10:80639375-80645254 Control IL33 OK 0 8173.79 1.79769e+308 1.79769e+308 0.0786496 0.428305 no

        locus size is 5879. Also cufflinks found 8173.79 FPKM in bam file for the Snord37 but there just 2 reads.

        I have a couple other examples. I've tested it on 1.2.1, 1.3.0, 2.0.0 versions of cufflinks the result is the same.

        Comment


        • #5
          I'm glad to hear to someone else can verify my suspicions. I have contacted the tophat cufflink support site about this but I do not expect them to reply because they ignored a previous question I submitted about a month ago.

          Comment


          • #6
            Is there other tool like cufflinks ? to compare the results.

            Comment


            • #7
              I haven't tried cufflinks but have heard others complaining at conferences.

              I have been impressed with edgeR and use that in production here.

              Comment


              • #8
                Originally posted by colindaven View Post
                I haven't tried cufflinks but have heard others complaining at conferences.

                I have been impressed with edgeR and use that in production here.

                yes edgeR and DESeq work pretty well. But is there a tool to perform a reference-based transcriptome assembly (like cufflinks)

                Comment


                • #9
                  Have you looked at MapSplice? http://www.netlab.uky.edu/p/bioinfo/MapSplice
                  Last edited by GenoMax; 06-11-2012, 04:21 AM.

                  Comment


                  • #10
                    Originally posted by NicoBxl View Post
                    yes edgeR and DESeq work pretty well. But is there a tool to perform a reference-based transcriptome assembly (like cufflinks)
                    Have you tried Scripture?

                    Comment


                    • #11
                      Hi all,
                      I just finished my first cufflinks run on RNAseq data and I also encountered results that make me doubt the validity of cufflinks' and cuffdiff's output.

                      Therefore I'm also considering to switch my analysis pipeline and rerun the analysis. However, during an Agilent seminar last week it was mentioned that Scripture would be an alternative which is heavy weight and requires serious computational ressources in order to perform the assembly. So my question is: does anybody already have experiences with Scripture and if so could you give recommendations towards the machine specifications needed?

                      Best regards

                      Comment


                      • #12
                        MapSplice is good for gene structure analysis but doesn't do differential expression analysis.

                        Comment


                        • #13
                          Originally posted by pbluescript View Post

                          Yes but I have several problems to run it. I'll open a new thread now with my problems.

                          edit > here's the thread for scripture : http://seqanswers.com/forums/showthr...5998#post75998
                          Last edited by NicoBxl; 06-12-2012, 12:47 AM.

                          Comment


                          • #14
                            Upon quick inspection, it appears to me that Scripture simply assembles transcripts but does not quantify and compare expression levels. Is that the case?
                            Last edited by drdna; 06-12-2012, 12:49 PM.

                            Comment


                            • #15
                              Originally posted by NicoBxl View Post
                              yes edgeR and DESeq work pretty well. But is there a tool to perform a reference-based transcriptome assembly (like cufflinks)
                              One alternative would be to run cufflinks and then use the transcripts.gtf or combined.gtf from cuffcompare as your input for something like HTSeq-count. That will give you a list of transcripts with raw reads which can then be used in either DESeq or EdgeR.

                              This approach would avoid any potential problems with Cufflinks quantification/differential expression while giving the advantage of a reference-based transcriptome assembly.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X