Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Portah
    Member
    • Jan 2012
    • 14

    #16
    After a lot of digging about wrong FPKMs and cufflink in the forum and documentation. I tried to check cds_exp.diff and was surprised that FPKMs there and gene list (after infinity filtering +-1.79E+308) are near expected values. Maybe we incorrectly interpret how cufflinks split reads between intersect regions which are a lot in GTF file (CDS, exons, stop-codons...) ?

    Comment

    • drdna
      Member
      • May 2012
      • 76

      #17
      If you follow this thread, you will see that there is a problem with this approach because cufflinks/cuffmerge produces erroneous .gtf files which contains instances where multiple transcripts are merged into one (despite the lack of any evidence to support such mergings).

      Comment

      • drdna
        Member
        • May 2012
        • 76

        #18
        To Portah. Good find, I'm going to go check the cds_exp.diff file for my runs and see if it makes more sense. Regardless, there is still an issue with transcript/reference annotation merging.

        Comment

        • chadn737
          Senior Member
          • Jan 2009
          • 392

          #19
          Originally posted by drdna View Post
          If you follow this thread, you will see that there is a problem with this approach because cufflinks/cuffmerge produces erroneous .gtf files which contains instances where multiple transcripts are merged into one (despite the lack of any evidence to support such mergings).
          My bad, I thought the primary concern was the FPKM calculation, which I have never trusted and why I have always stuck with count based methods of differential expression. But if there is also a problem with merging the transcripts then that seems far more fundamental even.

          I'm curious though, how are you guys running cufflinks? I'm assuming you are using the -g/--GTF-guide argument? Or does this problem persist even if you give it the -G/--GTF argument and tell it not to look for novel transcripts and stick to the supplied GTF file?

          Comment

          • drdna
            Member
            • May 2012
            • 76

            #20
            chadn737, Yes and yes. I have been running cuffmerge using a reference gtf and the --no-novel-juncs flag.

            So if you are using a count-based method of DE analysis, do you align your reads with gene sequences, as opposed to a genome assembly? I'd be interested in hearing a little bit more about your approach.

            Comment

            • drdna
              Member
              • May 2012
              • 76

              #21
              Originally posted by Portah View Post
              After a lot of digging about wrong FPKMs and cufflink in the forum and documentation. I tried to check cds_exp.diff and was surprised that FPKMs there and gene list (after infinity filtering +-1.79E+308) are near expected values. Maybe we incorrectly interpret how cufflinks split reads between intersect regions which are a lot in GTF file (CDS, exons, stop-codons...) ?
              Portah, my control dataset consists of genes with no introns and was run with the --no-novel-juncs flag. Consequently, the FPKM values for cdss are exactly the same as those for genes.

              Comment

              • Portah
                Member
                • Jan 2012
                • 14

                #22
                I'm going to check it twice maybe I'm wrong.

                Comment

                • chadn737
                  Senior Member
                  • Jan 2009
                  • 392

                  #23
                  Originally posted by drdna View Post
                  chadn737, Yes and yes. I have been running cuffmerge using a reference gtf and the --no-novel-juncs flag.

                  So if you are using a count-based method of DE analysis, do you align your reads with gene sequences, as opposed to a genome assembly? I'd be interested in hearing a little bit more about your approach.
                  cuffmerge with --no-novel-juncs? Thats a Tophat option. What I am talking about is when you run cufflinks you have two options if you supply a GTF file. The --GTF-guide does reference guided assembly that will use the GTF as a base, but also look for novel transcripts. There is also a --GTF option that will make cufflinks use only the annotated transcripts in the GTF and ignore any novel transcripts. Its this later option that I am curious about. If you run cufflinks and restrict it so it doesn't look for novel transcripts, do you still have the problems afterwards.

                  I've seen both approaches. I've also seen people align both to the CDS and genome and then integrate the two. The simplest approach really is to just realign back to the genome using Tophat or BWA.

                  After that, use something like HTSeq-count or Bedtools to count up the number of reads mapping to each gene (or exon) which then serves as input into any number of count-based DE tools (DESeq, EdgeR, Bayseq, etc). Or if you want to look for differential exon usage, DEXseq.

                  This approach is good if your primary interest is doing differential expression and if novel unannotated transcripts are not what you are after.

                  Comment

                  • drdna
                    Member
                    • May 2012
                    • 76

                    #24
                    Oops my bad - I'm getting my analyses mixed up. I only tried cuffmerge with the -G option flag. I'll try the -GTF instead. However, I doubt that it will make any difference because, as I mentioned before, there are no reads in the adjoining regions that cuffmerge merges into the true transcripts. One thing I've noticed is that one of the gtfs I'm working with is discontinuous, in the sense that adjacent genes do not occur sequentially in the gtf file. I don't know why, that's just the way the downloaded file was constructed. I'm beginning to suspect that cuffmerge/cufflinks assumes that gtfs always contains genes in sequential order and has hiccups at the discontinuities. I plan to test this by reconfiguring the gtfs in sequential fashion. This might also explain why Portah has a problem with the Snord37 gene - because it lies inside another gene. I suspect that cufflinks/cuffmerge doesn't allow for this possibility and gets its locus coordinates confused.
                    Last edited by drdna; 06-12-2012, 05:29 PM.

                    Comment

                    • Portah
                      Member
                      • Jan 2012
                      • 14

                      #25
                      I'm wrong numbers in cds_exp and gene_exp are the same, total list of genes are different, that genes which are wrap Snord's have disappeared.

                      Looks like there is no other way then write own FPKM counter to check myself and others

                      Comment

                      • swaraj
                        Member
                        • Feb 2012
                        • 50

                        #26
                        A few points I would like to make.

                        1. I have tried both cufflinks and scripture to assemble transcripts from RNAseq data in Tetraodon. Cufflinks outperforms scripture in terms of the assembly quality.

                        2. Scripture on an average produces more number of trancripts in each locus compared to Cufflinks. Cufflinks is better at building novel intergenic transcripts.

                        3. As written in a previous reply, it is good to use HTseq, BEDtools coverage bed and DEseq R package for the differential analysis as compared to Cuffdiff, which gives bloated FPKM values for many transcripts.

                        4. The Rsubread package is a fast accurate alternative to Cufflinks. (http://www.bioconductor.org/packages.../Rsubread.html)

                        Comment

                        • drdna
                          Member
                          • May 2012
                          • 76

                          #27
                          Originally posted by Portah View Post
                          I'm wrong numbers in cds_exp and gene_exp are the same, total list of genes are different, that genes which are wrap Snord's have disappeared.

                          Looks like there is no other way then write own FPKM counter to check myself and others
                          I'm still trying to figure out why cufflinks FPKMs are so far off. Presumably, the program is making some kind of statistical correction. In my case this doesn't make sense because I have only one sample per condition due to the study being a small-scale pilot project. Does anyone have an idea how cufflinks calculates FPKMs?

                          Comment

                          • jy123
                            Junior Member
                            • Nov 2010
                            • 8

                            #28
                            How do you run Scrpture?

                            Hi swaraj,

                            Can you explain the following options in Scrpture:
                            -maskFileDir <Mask File directory>
                            -windows <Comma separated list of windows to evaluate>
                            Last edited by jy123; 06-13-2012, 09:30 AM. Reason: need quote

                            Comment

                            • swaraj
                              Member
                              • Feb 2012
                              • 50

                              #29
                              I am afraid I cant help you with this. These options belong to the chipscan task of scripture, which I have not used before and the documentation does not tell about. I have relied previously on Sicer and Macs for analyzing chip seq data.

                              Comment

                              • drdna
                                Member
                                • May 2012
                                • 76

                                #30
                                Originally posted by jy123 View Post
                                Hi swaraj,

                                Can you explain the following options in Scrpture:
                                -maskFileDir <Mask File directory>
                                -windows <Comma separated list of windows to evaluate>
                                I suggest you post this in a Scripture-specific thread.

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  Yesterday, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 12:03 PM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, Yesterday, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...