Unconfigured Ad

**RockChalkJayhawk** · 12-01-2010, 08:01 AM

Found the same thing

Hey All,

I noticed the same thing as konrad98 a while ago. I ran CuffDiff and DESeq on 2x36 bp PE-Reads from replicate samples. Cuffdiff says that ~30% are differentially expressed, whereas DESeq found 0. To find out why, I made MA plots for my samples. The X-axis is the average RPKM for the two samples and the Y-axis is the log2 Fold Change.

As you can see, CuffDiff (for some reason) is much more variable in its calculation of RPKM. Note, these are the RPKM values from the genes.expr file and gene-level counting in exons.

Attached Files

Comp.pdf (955.0 KB, 266 views)

**lpachter** · 12-01-2010, 11:07 AM

Dear RCJ,

Thanks for attaching the MA plot. Do you recall what version of Cuffdiff you were running? And also whether you compared on the same gene sets? Which organism was this from? It would also be really useful for us (Cufflinks people) if you could share the actual data so we can take a closer look at what is going on.
I'm just a bit confused because although I would expect differences in the M axis, I don't understand differences in the A axis. Also, I don't understand the raw x-axis values. Was the RPKM for most genes in your experiment less than 1?

Thanks!

**RockChalkJayhawk** · 12-01-2010, 01:42 PM

Originally posted by lpachter View Post

Dear RCJ,

Thanks for attaching the MA plot. Do you recall what version of Cuffdiff you were running? And also whether you compared on the same gene sets? Which organism was this from? It would also be really useful for us (Cufflinks people) if you could share the actual data so we can take a closer look at what is going on.
I'm just a bit confused because although I would expect differences in the M axis, I don't understand differences in the A axis. Also, I don't understand the raw x-axis values. Was the RPKM for most genes in your experiment less than 1?

Thanks!

These are from human data with the newest version of Cufflinks/Cuffdiff. I left out the transformation step for the last MA plot. I am posting a new one along with the R code I used to generate it. Sorry about that mess up.

Notice, since these are replicates, no genes (or very few) should be differentially expressed. Since DESeq uses raw counts, the x-axis units will be larger than FPKM of CuffDiff.

Red areas are those that were called as significant in the gene_exp.diff file (CuffDiff) or padj (DESeq). No genes were differentially expressed by DESeq.

Code:

DMSO_1vsDMSO_2.Gx<-read.table("DMSO_1vsDMSO_2.Gx",header=T)
CuffDiff<-read.table("CuffDiff/DMSO1vDMSO2/gene_exp.diff",header=T)
jpeg(file="Comparison.jpeg",width = 620, height = 280,quality = 70)

par(mfrow=c(1,2))
M<-log2(CuffDiff$value_2)-log2(CuffDiff$value_1)
A<-.5*(log2(CuffDiff$value_2)+log2(CuffDiff$value_1))
plot(A,M,main="CuffDiff",col=ifelse(CuffDiff$significant == "yes" ,"white","black"))
text(A,M,col=c("red"),cex=.8,label=ifelse(CuffDiff$significant == "yes" ,CuffDiff$gene,""))

M<-log2(DMSO_1vsDMSO_2.Gx$baseMeanB)-log2(DMSO_1vsDMSO_2.Gx$baseMeanA)
A<-.5*(log2(DMSO_1vsDMSO_2.Gx$baseMeanB)+log2(DMSO_1vsDMSO_2.Gx$baseMeanA))
plot(A,M,main="DESeq",col=ifelse(DMSO_1vsDMSO_2.Gx$padj <.05 ,"white","black"))
text(A,M,col=c("red"),cex=.8,label=ifelse(DMSO_1vsDMSO_2.Gx$padj <.05 ,DMSO_1vsDMSO_2.Gx$id,""))
dev.off()

Attached Files

Comparison.jpeg (18.6 KB, 232 views)

**adarob** · 12-01-2010, 01:48 PM

By "newest version" do you mean 0.9.2? There was a small bug in that release which in some cases caused some of the alignments to be ignored. If it is not too much trouble, could you rerun with 0.9.3 (released yesterday)?

**RockChalkJayhawk** · 12-01-2010, 01:51 PM

My bad. It is 0.9.2. I'll have to update next week.

**RockChalkJayhawk** · 12-06-2010, 02:59 PM

Update on Cufflinks v0.9.3

Originally posted by adarob View Post

By "newest version" do you mean 0.9.2? There was a small bug in that release which in some cases caused some of the alignments to be ignored. If it is not too much trouble, could you rerun with 0.9.3 (released yesterday)?

This figure was generated using the same protocol as above.
Using v0.9.3 really made things better. Now I have a FDR of 2.13% (427/20014), much better than the 30% I had previously. There is still a bit more variation than from the counting methods, but at least I can work with this.

Thanks for the updates!

Attached Files

CuffDiff.v0.9.3.jpeg (8.0 KB, 229 views)

**lahoman** · 02-07-2011, 09:31 AM

How did you count the reads from the assembly file?

Hi, xinchen,

Once I get the sam file using TopHat, how should be we count the reads for each gene? You know, I am new to RNA-Seq analysis. My question might be naive

Thank you so much,

Lahoman

**nkwuji** · 02-08-2011, 02:23 AM

Originally posted by lahoman View Post

Hi, xinchen,

Once I get the sam file using TopHat, how should be we count the reads for each gene? You know, I am new to RNA-Seq analysis. My question might be naive

Thank you so much,

Lahoman

Hi, Lahoman,

Here is the introduction for cufflinks.

404 Not Found

http://cufflinks.cbcb.umd.edu/manual.html

It is better to read through it. I thought there might be some shortcuts or easy ways to use it. But later on, I found it really necessary to read all the instructions carefully, and make sure about the parameters you should use.

Cheers,
Jun

**lahoman** · 02-08-2011, 12:47 PM

Cufflink results filtering?

Hi,RockChalkJayhawk,

After I use tophat to map Human RNA-Seq to the genome, then cufflinks for the transcript analysis, I checked the file of transcripts.expr. There are
263,506 transcripts. That's a lot. Do I need to filter the results based on FPKM? What criteria should I use? 1.0 or 0.5? I have no idea about it.

Thanks,

Lahoman

**RockChalkJayhawk** · 02-08-2011, 01:24 PM

Originally posted by lahoman View Post

Hi,RockChalkJayhawk,

After I use tophat to map Human RNA-Seq to the genome, then cufflinks for the transcript analysis, I checked the file of transcripts.expr. There are
263,506 transcripts. That's a lot. Do I need to filter the results based on FPKM? What criteria should I use? 1.0 or 0.5? I have no idea about it.

Thanks,

Lahoman

Unfortunately, there is no right answer. Everything is subjective. Just use good judgement...

**chrisbala** · 02-09-2011, 05:58 AM

I think this issue of excessive cufflinks predicted transcripts is one many people are wrestling with. I have not figured out a satisfactory way to deal with this other than just using ONLY existing Ensembl gene models (boo!). My interpretation of the problem is that pre-mRNA reads (and perhaps other forms of "background") give cufflinks trouble (understandably). I don't know if using paired-end data (which I did not do) and 100s of millions of reads (like the cufflinks paper did) as opposed to 10s of millions of read would help. If others have thoughts I would love to hear them..

**lpachter** · 02-10-2011, 05:15 AM

The reason for excessive transcripts is that in genes with low coverage, Cufflinks will likely predict multiple disjoint transcripts- that is in fact the only thing it can do because it is not a gene finder and has no reason to glue them together (its also true that "background" causes problems and makes assembly difficult).

To address these issues, and also the fact that many users want to discover novel isoforms with respect to a known annotation (e.g. ENSEMBL), we have developed a new reference annotation guided assembly mode that will take as input your annotation and assemble with respect to that. It results in much cleaner assemblies by merging partial transcript fragments into the known annotations, yet is able to reveal novel isoforms. It sounds like it addresses the issue (and desired feature) you are reporting on.

It will be released in the next week or two as soon as we finish some more testing, so please check back with us before the end of the month.

**lahoman** · 02-10-2011, 07:46 AM

the mapping position of RNA-Seq spanning junction

Hi,
I checked the accepted_hits.sam generated by TopHat. For one reads pair, their mapping results are:

USI-EAS376_0001:3:112:7215:15873#0 129 chr1 14790 255 40M140N35M = 155253852 0 CCGGGCCCCTCACCAGCCCCAGGTCCTTTCCCAGAGATGCCCTTGCGCCTCATGACCAGCTTGTTGAAGAGATCC bbbbbabbbbbbbbbbbbbbbababbbbbbbb^bbbbcbbbb`cR`\aaY`\a`bbbcbb_\_^baa[`Wa]X_a NM:i:0 XS:A:- NS:i:0

The CIGAR code is 40M140N35M. We can see it spans two exons. 14790 on chr1 is the position in the first exon. How can we know the information about the second exon?

Thanks a lot,

Lahoman

**lahoman** · 02-10-2011, 07:58 AM

Hi,

Suddenly I realize that the second exon is from the same chromosome. The distance between the first and second exon is 140bp.
The one more question is, if a RNA-Seq reads is spliced and aligned to different chromsome, TopHat won't report it. Am I right? In other words, TopHat can't be used for genefusion analysis.

Thanks,

Lahoman

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 33 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 97 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 117 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 111 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News