Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cuffquant count data as input for DESeq/DEXseq

    Hi community,

    the latest cufflinks release (2.2.0) comes with two novel tools, cuffquant and cuffnorm. The latter can be used to generate expression and count tables at the level of transcripts, primary transcripts and genes, that are normalized for library size.

    I was wondering whether these normalized counts can be used with one of the 'count-based' methods like DESeq/DEXSeq/edgeR, circumventing their normalization methods.

    In other words, can I use e.g. the DESeq nbinomTest() function with these cuffnorm-generated data?

    Thanks.

  • #2
    According to the cuffnorm documentation:
    Cuffnorm will report both FPKM values and normalized, estimates for the number of fragments that originate from each gene, transcript, TSS group, and CDS group. Note that because these counts are already normalized to account for differences in library size, they should not be used with downstream differential expression tools that require raw counts as input.
    So they specifically warn against using cuffnorm counts for tools that require raw counts.

    I actually have a follow-up question. If these are just normalized counts, why can't they be used? When they get re-normalized again by another tool, wouldn't they just come out the same as if they weren't normalized? The initial normalization shouldn't lose any information.

    Comment


    • #3
      No, this is not an appropriate thing to do for either DESeq or edgeR. They assume raw counts are used as input, and these have a particular distribution that is assumed by the programs. The programs use the assumed distribution to estimate biological variation and determine statistical significance. While your normalised count values may be similar (or the same), the probability calculations will likely be off.

      Comment


      • #4
        There seems to be a common misconception that tools like DESeq(2) actually store the normalized counts somewhere. They don't, in fact, which is why trying to input normalized counts will lead to no end of problems.

        Comment


        • #5
          Following up from the original question..

          If we were to use cuffnorm with --library-norm-method parameter specifying classic-fpkm, can the count data be used for DESeq/DESeq(2)?

          classic-fpkm - Library size factor is set to 1 - no scaling applied to FPKM values or fragment counts. (default for Cufflinks)
          Does this mean the library size normalization was not applied? and therefore can the count data be considered as raw count??

          Comment


          • #6
            Do you think there is a way to get the raw count in a readable format for DESeq ?
            Or a way to read the binary file ? I can t find that !

            Comment


            • #7
              What binary file? If you mean the BAM file, just use featureCounts or htseq-count.

              Comment


              • #8
                No the raw count table:

                Cuffquant produces writes a single output file, abundances.cxb, into the output directory. CXB files are binary files, and can be passed to Cuffnorm or Cuffdiff for further processing.
                I would like to analyse the raw count with DESeq2

                Comment


                • #9
                  I'm sure it's theoretically possible to read the CXB file, but since its format seems to have never been documented, you'd have to go through the source code and reverse-engineer its format. It'd be faster to just ignore it.

                  Comment


                  • #10
                    Thanks for your answer !|
                    So there is no way to get the read counts from cuff-tools ? Maybe I miss something here..

                    Comment


                    • #11
                      Hard to say, there are a lot of undocumented areas of those programs. It's quick enough to just use featureCounts.

                      Comment


                      • #12
                        Oh ! so featureCounts is a tool to construct a count table from a sam/bam file ?
                        Thank you !

                        Comment


                        • #13
                          Yes, it's similar to htseq-count, though significantly faster.

                          Comment


                          • #14
                            To repeat myself, you shouldn't be using cufflinks output as input to DESeq2, because DESeq is expecting raw count data, and depends on that for its model.

                            If you want to do isoform-level analysis with a DESeq-like workflow, look at DEXSeq, which has its own method of counting by using raw counts for exon bins.

                            Comment


                            • #15
                              Another option is to use limma/voom, which accepts fractional counts.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 08:47 AM
                              0 responses
                              14 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              54 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X