Unconfigured Ad

**middlemale** · 03-09-2010, 07:24 AM

Hi, by chance can anybody clarify my confusion about cuffdiff.

the input for cufdiff is <transcripts.gtf> and 2 or more sam files. in my case, there are 2 transcripts gtf files (s_7.gtf for widetype and s_8.gtf for treated) and 2 sam files (s_7.sam for widetype and s_8.sam for treated). presumably I should use s_8.gtf for the differentially expressed genes? is this right? it is unclear about the description of transcripts.gtf file in cuflinks manual.
cheers

**Cole Trapnell** · 03-09-2010, 08:02 AM

Originally posted by middlemale View Post

Hi, by chance can anybody clarify my confusion about cuffdiff.

the input for cufdiff is <transcripts.gtf> and 2 or more sam files. in my case, there are 2 transcripts gtf files (s_7.gtf for widetype and s_8.gtf for treated) and 2 sam files (s_7.sam for widetype and s_8.sam for treated). presumably I should use s_8.gtf for the differentially expressed genes? is this right? it is unclear about the description of transcripts.gtf file in cuflinks manual.
cheers

It's not a good idea to use a GTF that was output by Cufflinks from just one of your samples. Cuffdiff will ignore reads that don't fall on the transcripts in the file you give it, which means that if a transcript was fully assembled in only one of the samples, it could be ignored. This is why we provide, with Cuffcompare's output, a file *.combined.gtf. This file is essentially the "union" of the transfrags in your sample files. It's a good idea, however, to 'curate' this file a bit. We take any transfrag from this file that is a full length match to a transcript in UCSC, Ensembl, or VEGA (as determined by Cuffcompare), or that is present in two or more samples. This cleans up the file considerably. Feed the "scrubbed" file into Cuffdiff and your samples will be subjected to perform differential analysis on that file.

**middlemale** · 03-09-2010, 08:44 AM

Excellent. many thanks . much clear for me now.

**Wei-HD** · 05-17-2010, 09:22 AM

Dear Cole,

How to 'curate' this file a bit, you mean only keep the transfrag with the class code of "="?

**paulwood** · 08-02-2010, 01:21 PM

Good

We will try this!

**poisson200** · 08-02-2010, 11:42 PM

Just a whacky idea, is there any way Simon Anders (DEseq) and Cole Trapnell (cufflinks) can collaborate?

Cufflinks is great as it assigns reads to transcripts/isoforms based on paired end maximum likelihood calculations (so reads are assigned to the most likely transcript by MLE and using paired end info, if I understand it). A few people have asked that cufflinks outputs a raw file with reads assigned to each transcript/isoform. Though, as I remember this can't be done as it is mathematical and some reads can equally map to two transcripts. I don't know why we can't use that info anyway and apply the DESeq so-called negative binomial distribution statistics on these assignments. Or is that already the stats used by cuffdiff for differential expression?

To me, combining cufflinks MLE assigning of read to transcripts and DESeq stats could be a powerful differential gene expression analyses.

Just an idea from someone who does not know too much.

**Simon Anders** · 08-04-2010, 11:58 AM

Hi,

thanks for the praise; though I first need to point out that our package is called "DESeq", and should not be confused with "DEGSeq", another Bioconductor package.

As you point out correctly, the ambiguity in transcript assiignment makes a combination of the methods of cufflinks and DESeq quite non-trivial.

Hence, we don't have any good method at the moment for statistical testing for changes in isoform abundances.

We have an idea how to achieve this which we are testing at the moment, and which we will hopefully soon be able to present. The cufflinks people have announced elsewhere on SeqAnswers that they are also working on a method. My feeling is that their approach and ours will be quite different, so it will be interesting to see what works better. Stay tuned.

Simon

**Wei-HD** · 08-04-2010, 12:56 PM

I am a bench worker, and do not have much statistics and bioinformatics background; therefore, I would not do apple-to-apple comparison.

But I have tried both of two methods above, and they showed me different results. We have several genes expressed at low level in basal condition, but dramatically increased under stimulated condition (already confirmed by experiments). From Cufflinks, the FPKM of some genes in basal condition are 0, therefore I do not know how to deal with the fold change, simply discard these genes might lose something; DESeq gave me the result which had similar trend with array data.

But I have to say I do not have biological replicates so far. We do not know how to conclude from the statistical significance level. For the biological replicates, I checked some publications recently, they used two biological replicates, but some earlier papers did not repeat the RNA-seq. Hopefully some experienced SEQers give me some advices about the p value and biological replicates, how to choose the cut-off of the FDR...

No matter which methods, I want to thank many program developers and SEQers for teaching me the analysis and answering my endless questions......

**lpachter** · 08-05-2010, 04:04 AM

@poisson200: You are absolutely right that it is desirable to perform differential expression at the transcript level, possibly allowing for more general assumptions about the relationship between variance and expression level than the single parameter Poisson distribution. It is actually not very difficult to do this in the context of Cufflinks, and we are exploring various alternative approaches. It should be noted that it is by no means obvious that the negative binomial distribution is the right one to use. It allows only for modeling overdispersion and the recent paper by Srivastava and Chen

http://nar.oxfordjournals.org/cgi/content/full/gkq670

highlights the inadequacy of the negative-binomial assumption used in DEGseq, edgeR and other programs.

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 27 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 23 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News