Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi, by chance can anybody clarify my confusion about cuffdiff.

    the input for cufdiff is <transcripts.gtf> and 2 or more sam files. in my case, there are 2 transcripts gtf files (s_7.gtf for widetype and s_8.gtf for treated) and 2 sam files (s_7.sam for widetype and s_8.sam for treated). presumably I should use s_8.gtf for the differentially expressed genes? is this right? it is unclear about the description of transcripts.gtf file in cuflinks manual.
    cheers

    Comment


    • #17
      Originally posted by middlemale View Post
      Hi, by chance can anybody clarify my confusion about cuffdiff.

      the input for cufdiff is <transcripts.gtf> and 2 or more sam files. in my case, there are 2 transcripts gtf files (s_7.gtf for widetype and s_8.gtf for treated) and 2 sam files (s_7.sam for widetype and s_8.sam for treated). presumably I should use s_8.gtf for the differentially expressed genes? is this right? it is unclear about the description of transcripts.gtf file in cuflinks manual.
      cheers
      It's not a good idea to use a GTF that was output by Cufflinks from just one of your samples. Cuffdiff will ignore reads that don't fall on the transcripts in the file you give it, which means that if a transcript was fully assembled in only one of the samples, it could be ignored. This is why we provide, with Cuffcompare's output, a file *.combined.gtf. This file is essentially the "union" of the transfrags in your sample files. It's a good idea, however, to 'curate' this file a bit. We take any transfrag from this file that is a full length match to a transcript in UCSC, Ensembl, or VEGA (as determined by Cuffcompare), or that is present in two or more samples. This cleans up the file considerably. Feed the "scrubbed" file into Cuffdiff and your samples will be subjected to perform differential analysis on that file.

      Comment


      • #18
        Excellent. many thanks . much clear for me now.

        Comment


        • #19
          Dear Cole,

          How to 'curate' this file a bit, you mean only keep the transfrag with the class code of "="?

          Comment


          • #20
            Good
            We will try this!

            Comment


            • #21
              Just a whacky idea, is there any way Simon Anders (DEseq) and Cole Trapnell (cufflinks) can collaborate?

              Cufflinks is great as it assigns reads to transcripts/isoforms based on paired end maximum likelihood calculations (so reads are assigned to the most likely transcript by MLE and using paired end info, if I understand it). A few people have asked that cufflinks outputs a raw file with reads assigned to each transcript/isoform. Though, as I remember this can't be done as it is mathematical and some reads can equally map to two transcripts. I don't know why we can't use that info anyway and apply the DESeq so-called negative binomial distribution statistics on these assignments. Or is that already the stats used by cuffdiff for differential expression?

              To me, combining cufflinks MLE assigning of read to transcripts and DESeq stats could be a powerful differential gene expression analyses.

              Just an idea from someone who does not know too much.
              Last edited by poisson200; 08-04-2010, 12:40 PM. Reason: DEGseq should be DESeq

              Comment


              • #22
                Hi,

                thanks for the praise; though I first need to point out that our package is called "DESeq", and should not be confused with "DEGSeq", another Bioconductor package.

                As you point out correctly, the ambiguity in transcript assiignment makes a combination of the methods of cufflinks and DESeq quite non-trivial.

                Hence, we don't have any good method at the moment for statistical testing for changes in isoform abundances.

                We have an idea how to achieve this which we are testing at the moment, and which we will hopefully soon be able to present. The cufflinks people have announced elsewhere on SeqAnswers that they are also working on a method. My feeling is that their approach and ours will be quite different, so it will be interesting to see what works better. Stay tuned.

                Simon

                Comment


                • #23
                  I am a bench worker, and do not have much statistics and bioinformatics background; therefore, I would not do apple-to-apple comparison.

                  But I have tried both of two methods above, and they showed me different results. We have several genes expressed at low level in basal condition, but dramatically increased under stimulated condition (already confirmed by experiments). From Cufflinks, the FPKM of some genes in basal condition are 0, therefore I do not know how to deal with the fold change, simply discard these genes might lose something; DESeq gave me the result which had similar trend with array data.

                  But I have to say I do not have biological replicates so far. We do not know how to conclude from the statistical significance level. For the biological replicates, I checked some publications recently, they used two biological replicates, but some earlier papers did not repeat the RNA-seq. Hopefully some experienced SEQers give me some advices about the p value and biological replicates, how to choose the cut-off of the FDR...

                  No matter which methods, I want to thank many program developers and SEQers for teaching me the analysis and answering my endless questions......
                  Last edited by Wei-HD; 08-04-2010, 01:06 PM.

                  Comment


                  • #24
                    @poisson200: You are absolutely right that it is desirable to perform differential expression at the transcript level, possibly allowing for more general assumptions about the relationship between variance and expression level than the single parameter Poisson distribution. It is actually not very difficult to do this in the context of Cufflinks, and we are exploring various alternative approaches. It should be noted that it is by no means obvious that the negative binomial distribution is the right one to use. It allows only for modeling overdispersion and the recent paper by Srivastava and Chen

                    highlights the inadequacy of the negative-binomial assumption used in DEGseq, edgeR and other programs.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 08:47 AM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    59 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    54 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X