Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • middlemale
    Member
    • Feb 2010
    • 16

    #16
    Hi, by chance can anybody clarify my confusion about cuffdiff.

    the input for cufdiff is <transcripts.gtf> and 2 or more sam files. in my case, there are 2 transcripts gtf files (s_7.gtf for widetype and s_8.gtf for treated) and 2 sam files (s_7.sam for widetype and s_8.sam for treated). presumably I should use s_8.gtf for the differentially expressed genes? is this right? it is unclear about the description of transcripts.gtf file in cuflinks manual.
    cheers

    Comment

    • Cole Trapnell
      Senior Member
      • Nov 2008
      • 213

      #17
      Originally posted by middlemale View Post
      Hi, by chance can anybody clarify my confusion about cuffdiff.

      the input for cufdiff is <transcripts.gtf> and 2 or more sam files. in my case, there are 2 transcripts gtf files (s_7.gtf for widetype and s_8.gtf for treated) and 2 sam files (s_7.sam for widetype and s_8.sam for treated). presumably I should use s_8.gtf for the differentially expressed genes? is this right? it is unclear about the description of transcripts.gtf file in cuflinks manual.
      cheers
      It's not a good idea to use a GTF that was output by Cufflinks from just one of your samples. Cuffdiff will ignore reads that don't fall on the transcripts in the file you give it, which means that if a transcript was fully assembled in only one of the samples, it could be ignored. This is why we provide, with Cuffcompare's output, a file *.combined.gtf. This file is essentially the "union" of the transfrags in your sample files. It's a good idea, however, to 'curate' this file a bit. We take any transfrag from this file that is a full length match to a transcript in UCSC, Ensembl, or VEGA (as determined by Cuffcompare), or that is present in two or more samples. This cleans up the file considerably. Feed the "scrubbed" file into Cuffdiff and your samples will be subjected to perform differential analysis on that file.

      Comment

      • middlemale
        Member
        • Feb 2010
        • 16

        #18
        Excellent. many thanks . much clear for me now.

        Comment

        • Wei-HD
          Member
          • Oct 2009
          • 59

          #19
          Dear Cole,

          How to 'curate' this file a bit, you mean only keep the transfrag with the class code of "="?

          Comment

          • paulwood
            Junior Member
            • Aug 2010
            • 3

            #20
            Good
            We will try this!

            Comment

            • poisson200
              Member
              • Feb 2010
              • 63

              #21
              Just a whacky idea, is there any way Simon Anders (DEseq) and Cole Trapnell (cufflinks) can collaborate?

              Cufflinks is great as it assigns reads to transcripts/isoforms based on paired end maximum likelihood calculations (so reads are assigned to the most likely transcript by MLE and using paired end info, if I understand it). A few people have asked that cufflinks outputs a raw file with reads assigned to each transcript/isoform. Though, as I remember this can't be done as it is mathematical and some reads can equally map to two transcripts. I don't know why we can't use that info anyway and apply the DESeq so-called negative binomial distribution statistics on these assignments. Or is that already the stats used by cuffdiff for differential expression?

              To me, combining cufflinks MLE assigning of read to transcripts and DESeq stats could be a powerful differential gene expression analyses.

              Just an idea from someone who does not know too much.
              Last edited by poisson200; 08-04-2010, 12:40 PM. Reason: DEGseq should be DESeq

              Comment

              • Simon Anders
                Senior Member
                • Feb 2010
                • 995

                #22
                Hi,

                thanks for the praise; though I first need to point out that our package is called "DESeq", and should not be confused with "DEGSeq", another Bioconductor package.

                As you point out correctly, the ambiguity in transcript assiignment makes a combination of the methods of cufflinks and DESeq quite non-trivial.

                Hence, we don't have any good method at the moment for statistical testing for changes in isoform abundances.

                We have an idea how to achieve this which we are testing at the moment, and which we will hopefully soon be able to present. The cufflinks people have announced elsewhere on SeqAnswers that they are also working on a method. My feeling is that their approach and ours will be quite different, so it will be interesting to see what works better. Stay tuned.

                Simon

                Comment

                • Wei-HD
                  Member
                  • Oct 2009
                  • 59

                  #23
                  I am a bench worker, and do not have much statistics and bioinformatics background; therefore, I would not do apple-to-apple comparison.

                  But I have tried both of two methods above, and they showed me different results. We have several genes expressed at low level in basal condition, but dramatically increased under stimulated condition (already confirmed by experiments). From Cufflinks, the FPKM of some genes in basal condition are 0, therefore I do not know how to deal with the fold change, simply discard these genes might lose something; DESeq gave me the result which had similar trend with array data.

                  But I have to say I do not have biological replicates so far. We do not know how to conclude from the statistical significance level. For the biological replicates, I checked some publications recently, they used two biological replicates, but some earlier papers did not repeat the RNA-seq. Hopefully some experienced SEQers give me some advices about the p value and biological replicates, how to choose the cut-off of the FDR...

                  No matter which methods, I want to thank many program developers and SEQers for teaching me the analysis and answering my endless questions......
                  Last edited by Wei-HD; 08-04-2010, 01:06 PM.

                  Comment

                  • lpachter
                    Member
                    • Feb 2010
                    • 40

                    #24
                    @poisson200: You are absolutely right that it is desirable to perform differential expression at the transcript level, possibly allowing for more general assumptions about the relationship between variance and expression level than the single parameter Poisson distribution. It is actually not very difficult to do this in the context of Cufflinks, and we are exploring various alternative approaches. It should be noted that it is by no means obvious that the negative binomial distribution is the right one to use. It allows only for modeling overdispersion and the recent paper by Srivastava and Chen

                    highlights the inadequacy of the negative-binomial assumption used in DEGseq, edgeR and other programs.

                    Comment

                    Latest Articles

                    Collapse

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-05-2026, 10:09 AM
                    0 responses
                    14 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-04-2026, 08:59 AM
                    0 responses
                    27 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    33 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 11:40 AM
                    0 responses
                    23 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...