Seqanswers Leaderboard Ad

**natstreet** · 12-30-2011, 12:23 AM

I have understood the cuffdiff documentation the same way - each comma separated list of files represent replicates (not necessarily technical) of a sample group.

To me, cummerbund is giving you what I would expect it to - a box plot per condition that you input to cuffdiff and an indication of the variation within each of those groups. If you want to see the variation between the individual replicates I think you will need to run cuffdiff again treating each replicate as a separate sample. This meakes sense when you look at the output files produced by cuffdiff as these are what cummerbund is reading in.

**turnersd** · 12-30-2011, 03:51 AM

@natstreet: Thanks. I believe I can get what I want by doing as you say - cuffdiff with each bam file separate. I had in mind something that I would get from arrayQualityMetrics for array data - distribution info for each sample.

**lgoff** · 12-30-2011, 12:13 PM

You are correct about the output of cuffdiff and cummeRbund. cuffdiff takes the replicate information and uses this in it's modeling of the sequencing data to more accurately represent the FPKM and confidence intervals for an entire condition (not sample). In the case of biological replicates (which I might add, you are adding correctly in the cuffdiff arguments) the resulting conf_hi and conf_lo intervals represent biological variability for FPKM values from a given condition. In the absense of biological replicates (in which case the boxplot and heatmap would show you individual samples) the conf_hi and conf_lo values would represent estimate of the variability of the given gene/feature based on the fitting of the model, but would not have any measure of biological variability. Hope this helps...

FYI, I am interested in generating/aggregating sample-specific statistics to present to the user in future versions of cummeRbund to try and include more QC-type information.

Cheers,
Loyal

**Carlos Borroto** · 01-03-2012, 01:23 PM

Originally posted by lgoff View Post

FYI, I am interested in generating/aggregating sample-specific statistics to present to the user in future versions of cummeRbund to try and include more QC-type information.

This would be great, at the moment I'm running cufflinks twice, once with biological replicates as a group, what I really want, and then with each sample as a group to be able to do QC.

Thanks,
Carlos

**turnersd** · 01-03-2012, 01:37 PM

Originally posted by Carlos Borroto View Post

This would be great, at the moment I'm running cufflinks twice, once with biological replicates as a group, what I really want, and then with each sample as a group to be able to do QC.

Thanks,
Carlos

Yes, I'll second that. I'm doing the same thing now.

**vyellapa** · 03-05-2012, 10:56 AM

Cuff-diff with different samples as replicates

Code:

cuffdiff ref.gtf \
c1.bam,c2.bam,c3.bam,c4.bam \
m1.bam,m2.bam,m3.bam,m4.bam

Some of the groups working with cuff-diff are using using samples in sub-phenotype as biological replicates. Is this acceptable?
For example, if AMD wet and dry are two sub-phenotypes of AMD, and there are 4 aligned bams from different individuals available for each sub-phenotype, would the above code work. In other words, will the above code produce differential expression values between the two sub-phenotypes?

**dejavu2010** · 05-22-2012, 08:49 AM

I had the same question. First time, I treated 3 biological replicates as a group with "," to separate, with -L KO, WT. Second time, I treated every single sample with a condition, but labeled -L KO, KO, KO, WT, WT, WT.
The results are very different. For the KO gene in gene_exp.diff from the first analysis, it has 2 rows (2 different ids, XLOC_016359 and XLOC_016932), no significance. The second analysis, it has 30 rows (the same 2 ids repeated 15 times). It did 15 comparisons, KO1 vs KO2...WT2 vs WT3. all the comparisons between KO and WT are significant (supposed to be, right since it is KO). I wonder how cuffdiff works?
In the first setting, -L KO, WT ko1,ko2,ko3 wt1,wt2,wt3, did it only take the ko1 and ko2? Probably not, as when i compared values (value1 and 2) to the second analysis, no matches. In the second analysis, it seemed right and all comparsions are between 2 samples. Should I take the value from single sample, treat it as readout and do some statistics on those? like, median, rank test, p value, q- value?

I wonder the difference from 2 analysis, which one is more accurate?

**zhuya5607** · 06-04-2012, 01:26 AM

strange Boxplot in CummeRbund

I got a weird thing with the Boxplot. No box with medians displayed, only 'dotsplot'. I tried with or without replicates= T option, both produced the exactly same image as the attached image. Weird, isn't it? at least show me the 'dotsplot' of the replicates. = =!
I have three replicates at four time points of a cell line after drug treatment,3+3+3+3. I ran cuffdiff in this way, T0(3 replicates at time point 0) versus T1(3 replicates), T0(3 replicates)versus T2(3 re), T0(3 replicates)versus T3(3 re). I did't use time series option, which seems a better way to do it after I checked on this forum.
I use CummeRbund analyze the three comparison seperately, all three boxplots I got are like the attached image without box and medians.

another problem with gene expression barplot of EGFR, see the attached file.

Can anyone tell me where is wrong or I did some mistakes? Thank you in advance!

Attached Files

**lgoff** · 06-04-2012, 04:04 AM

Originally posted by zhuya5607 View Post

I got a weird thing with the Boxplot. No box with medians displayed, only 'dotsplot'. I tried with or without replicates= T option, both produced the exactly same image as the attached image. Weird, isn't it? at least show me the 'dotsplot' of the replicates. = =!
I have three replicates at four time points of a cell line after drug treatment,3+3+3+3. I ran cuffdiff in this way, T0(3 replicates at time point 0) versus T1(3 replicates), T0(3 replicates)versus T2(3 re), T0(3 replicates)versus T3(3 re). I did't use time series option, which seems a better way to do it after I checked on this forum.
I use CummeRbund analyze the three comparison seperately, all three boxplots I got are like the attached image without box and medians.

another problem with gene expression barplot of EGFR, see the attached file.

Can anyone tell me where is wrong or I did some mistakes? Thank you in advance!

It looks like the boxes are there, but most of the signal is compressed around log fpkm of -4. And looking at the plot of EGFR, something is not correct with how the gene was quantified. I wonder if it is a malformed GTF file? Can you send me your gtf and the cuffData.db file created by cummeRbund via email, and I will try to take a look at it and see if I can figure out what is going on.

-Loyal

**zhuya5607** · 06-04-2012, 05:44 AM

Originally posted by lgoff View Post

It looks like the boxes are there, but most of the signal is compressed around log fpkm of -4. And looking at the plot of EGFR, something is not correct with how the gene was quantified. I wonder if it is a malformed GTF file? Can you send me your gtf and the cuffData.db file created by cummeRbund via email, and I will try to take a look at it and see if I can figure out what is going on.

-Loyal

Thank you, Loyal!
I sent you a message, Please check.

**adumitri** · 06-19-2012, 01:28 PM

Hi,

I was wondering if anyone had more thoughts about dejavu2010's comment... It is still not clear to me if Cuffdiff is built to analyze differential expression for groups of biological replicates.

More specifically:
We are interested in comparing a group of 10 RNA-Seq samples from healthy individuals with a group of 10 RNA-Seq samples from diseased ones (in the initial phase without accounting for covariates). We do expect the samples to have significant expression variability within the two groups. While the Cuffdiff manual seemed to imply that one should use the "," notation to separate technical samples, we used this notation for the several biological replicates available for each of the two conditions.
The result was shocking - there was no gene/transcript that had a non-adjusted p-value < 0.07! This result simply does not make sense, especially since we have prior information about the used samples in terms of expression (we assayed them with microarrays previously and saw a lot of differential expression). Therefore, I believe that we did not use Cuffdiff properly and I am trying to figure out what to do differently to use Cuffdiff with biological replicates. Or maybe Cuffdiff is not really meant for this type of analysis and we should just go with gene-centric differential expression analyses, for example DESeq?

Please advise if you have a better understanding of what is going on.

Thank you,
Alexandra

**ftorri** · 07-20-2012, 10:17 AM

Hi all,

I had exactly the same problems you described.
The expression plot I gor looks weird: the histograms of the four samples are stacked one on the top of the other (with confusing colors), while on the manual there were N (N=number of replicates) histograms for each gene.

How is possible to get a plot similar to the one on the manual?

Thanks,
Fed

**fpenagarican** · 11-17-2012, 02:31 PM

Alexandra,

Did you find any good answer for your question?

I want to compare 8 RNA-seq samples for treatment A with 8 RNA-seq samples for treatment B. I am performing these analyses using Cuffidiff and also edgeR. Surprisingly, using Cuffdiff, I do not find any gene with a nominal P-value < 0.01. However, using edgeR, I find more than 200 genes with FDR < 0.05.

Does someone have any idea of what is going on?

Any comments or suggestions are welcome.

Originally posted by adumitri View Post

Hi,

I was wondering if anyone had more thoughts about dejavu2010's comment... It is still not clear to me if Cuffdiff is built to analyze differential expression for groups of biological replicates.

More specifically:
We are interested in comparing a group of 10 RNA-Seq samples from healthy individuals with a group of 10 RNA-Seq samples from diseased ones (in the initial phase without accounting for covariates). We do expect the samples to have significant expression variability within the two groups. While the Cuffdiff manual seemed to imply that one should use the "," notation to separate technical samples, we used this notation for the several biological replicates available for each of the two conditions.
The result was shocking - there was no gene/transcript that had a non-adjusted p-value < 0.07! This result simply does not make sense, especially since we have prior information about the used samples in terms of expression (we assayed them with microarrays previously and saw a lot of differential expression). Therefore, I believe that we did not use Cuffdiff properly and I am trying to figure out what to do differently to use Cuffdiff with biological replicates. Or maybe Cuffdiff is not really meant for this type of analysis and we should just go with gene-centric differential expression analyses, for example DESeq?

Please advise if you have a better understanding of what is going on.

Thank you,
Alexandra

**adumitri** · 11-19-2012, 06:32 AM

Hi fpenagarican,

Unfortunately, I did not receive any relevant answer to my question. Still, I can confirm your results - with Cufflinks (and the corresponding Cuffdiff) v.2.0.0, there were no differentially expressed genes/transcripts in our analyses, while the comparison performed with DESeq showed a fair number of significant genes. In addition, Cuffdiff required a long time to run, which did not help us troubleshoot the problem.

Since we wanted to use covariates for our analyses, we ended up focusing our efforts on DESeq (I think edgeR can also accommodate covariates), although this meant we could only perform a gene-centric initial analysis. With the updated versions of Cufflinks, things might be different - this is something I will need to work on in the near future.

Sorry I could not help more. If you find additional details, please post them to this thread.

Alexandra

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Biological replicates with cuffdiff, cummeRbund

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News