SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Problem with cummeRbund and RSQLite noplacetoh1de Bioinformatics 9 11-03-2014 01:33 PM
cummeRbund get isoforms paolo.kunder Bioinformatics 6 04-23-2014 07:59 AM
Problem with cummeRbund and RSQLite :( dadada4ever Bioinformatics 3 07-31-2013 09:13 AM
cummeRbund installation m_elena_bioinfo Bioinformatics 3 08-02-2012 01:42 AM
CummeRbund error without CDS dvanic Bioinformatics 5 03-12-2012 09:24 AM

Reply
 
Thread Tools
Old 04-17-2012, 04:30 AM   #1
glados
Member
 
Location: Aperture Science

Join Date: Mar 2012
Posts: 59
Default CummeRbund questions

Hi all.
I'm new to R and really enjoying the cummerbund package. I have three questions regarding the graphs cummerbund produces.

1) Even though I have several biological replicates, my error bars in the expression plots for genes are huge. When looking at a specific gene, having run cuffdiff with 4+4 samples, the error bars are really long, all the way to the ground and high up. Why is this and why is there no difference when I have several replicates as opposed to none? Shouldn't the error bars decrease? How do I edit the graph so the error bars accurately can display the variance between replicates?

2) I cannot find the argument for y limits in the csVolcano plot. I have used the argument: xlimits = c(-10, 10) and it works perfectly but what is the equivalent for y? I have tried ylimits, ymin, ymax and nothing works.

3) When testing 3 conditions against each other, I get different number of differential expressed genes depending on if I use the getSig() function or counting the number of rows as they did in the Protocol paper. Which one is more correct?

Code:
diffGeneIDs <- getSig(cuff, level="genes", alpha=0.05)
gives 315 genes, while
Code:
gene_diff_data <- diffData(genes(cuff))
sig_gene_data <- subset(gene_diff_data, (significant == 'yes'))
nrow(sig_gene_data)
gives 372 genes.

Thank you for any help!
glados is offline   Reply With Quote
Old 04-17-2012, 12:29 PM   #2
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

For 3, the getSig way is the preferred method now. You can set your own significance, while the old method had cuffdiff do it.

Loyal explains it here:
http://seqanswers.com/forums/showthr...ghlight=getSig

Following up on this, does CummeRbund account for the selection bias the way GoSeq does?
billstevens is offline   Reply With Quote
Old 04-17-2012, 05:01 PM   #3
lgoff
Member
 
Location: Cambridge, MA

Join Date: Feb 2008
Posts: 82
Default

[QUOTE=glados;70775]Hi all.
I'm new to R and really enjoying the cummerbund package. I have three questions regarding the graphs cummerbund produces.

1) Even though I have several biological replicates, my error bars in the expression plots for genes are huge. When looking at a specific gene, having run cuffdiff with 4+4 samples, the error bars are really long, all the way to the ground and high up. Why is this and why is there no difference when I have several replicates as opposed to none? Shouldn't the error bars decrease? How do I edit the graph so the error bars accurately can display the variance between replicates?
[\QUOTE]
This is very dependent on the depth of your library and the sequencing quality in general. What sort of libraries are you using and how deep?

[QUOTE=glados;70775]
2) I cannot find the argument for y limits in the csVolcano plot. I have used the argument: xlimits = c(-10, 10) and it works perfectly but what is the equivalent for y? I have tried ylimits, ymin, ymax and nothing works.
[\QUOTE]
This was not included in the arguments. The reason that the x-limits argument is included is because this is a filtering step on the database backend. For the y-axis rescaling try this:

Code:
cuff<-readCufflinks()
v<-csVolcano(genes(cuff),'Sample_A','Sample_B')
v+ scale_y_continuous(limits = c(0, 20))
#This is the 'ggplot2' way to control the limits.  This is one of the conveniences of making these plotting methods using ggplot2 is that you have complete control over the returned object.
Quote:
Originally Posted by glados View Post
3) When testing 3 conditions against each other, I get different number of differential expressed genes depending on if I use the getSig() function or counting the number of rows as they did in the Protocol paper. Which one is more correct?

Code:
diffGeneIDs <- getSig(cuff, level="genes", alpha=0.05)
gives 315 genes, while
Code:
gene_diff_data <- diffData(genes(cuff))
sig_gene_data <- subset(gene_diff_data, (significant == 'yes'))
nrow(sig_gene_data)
gives 372 genes.

Thank you for any help!
bill is correct, the 'appropriate' way to do this now is to use the getSig function. This gives you more control and conducts an appropriate multiple-testing-correction depending on the specific comparisons you are making for the differential test.

Hope this helps!

Cheers,
Loyal
lgoff is offline   Reply With Quote
Old 04-18-2012, 06:49 AM   #4
glados
Member
 
Location: Aperture Science

Join Date: Mar 2012
Posts: 59
Default

Thank you Loyal! This is very helpful. My Volcano plot looks much better now.

Quote:
Originally Posted by lgoff View Post
This is very dependent on the depth of your library and the sequencing quality in general. What sort of libraries are you using and how deep?
I think I will have to wait with this question nr 1. After a day of testing and comparing with 2+2 samples and receiving an awful lot more significant genes with fewer samples (blue volcano instead of red), it is apparent that I have made some mistake somewhere and I will try to look into this more.
glados is offline   Reply With Quote
Old 04-24-2012, 04:56 AM   #5
glados
Member
 
Location: Aperture Science

Join Date: Mar 2012
Posts: 59
Default

Loyal, if you have time I would appreciate some help. I've been trying to figure out this thing for the last few days. I have looked carefully through every script and everything has been run exactly the same with only different input and output files. I have 2 conditions which I want to see differences between. When I run CuffDiff with 4+4 samples (4 biological replicates in each condition) I get a total of only 140 significant genes with function getSig() in cummerbund. When I decrease the replicates to 3+3 samples I get 11 200 sig. genes. When I decrease it one more step to 2+2 samples I get 20 000 sig. genes. A dramatic difference to say the least. Do you have any idea of why this is?

Additionally, the volcano plots look very different depending on the run. For the 4+4 run, the "limit" on the y-axis where the dots become blue (significant) is very high, 3.5 (which equals to p much less than 0.001). For the 2+2 run the same value on the y-axis is 1.5. Another 2+2 run I did with different samples gave a volcano plot with y-value of precisely 2. I'm using the same R code for every run (simply: csVolcano(genes(cuff), 'x-value', 'y-value') so I'm not sure why I get so drastically different number of sig. genes and why the volcano plots y-axis significance limit is very different between different runs. I have attached 2 volcano plots if you wonder what I mean that are zoomed in somewhat (with cropped out header), one from 2+2 samples, one from 4+4 samples . edit: every sample has approximately around 15-20 million reads mapped (in pairs). Thank you in advance.
Attached Images
File Type: png 4samples.png (117.5 KB, 105 views)
File Type: png 8samples.png (79.4 KB, 72 views)

Last edited by glados; 04-24-2012 at 05:20 AM.
glados is offline   Reply With Quote
Old 06-01-2012, 05:12 AM   #6
zhuya5607
Member
 
Location: Stockholm

Join Date: Jun 2012
Posts: 18
Default

Quote:
Originally Posted by glados View Post
Hi all.
I'm new to R and really enjoying the cummerbund package. I have three questions regarding the graphs cummerbund produces.

1) Even though I have several biological replicates, my error bars in the expression plots for genes are huge. When looking at a specific gene, having run cuffdiff with 4+4 samples, the error bars are really long, all the way to the ground and high up. Why is this and why is there no difference when I have several replicates as opposed to none? Shouldn't the error bars decrease? How do I edit the graph so the error bars accurately can display the variance between replicates?
Have you solved the question 1, I am having the same problem as you have, I have 3 replicates at each condiction, but my error bars is huge. whats the problem?
zhuya5607 is offline   Reply With Quote
Old 06-01-2012, 05:39 AM   #7
lgoff
Member
 
Location: Cambridge, MA

Join Date: Feb 2008
Posts: 82
Default

Hi All,
In general, these remaining questions pertain to cuffdiff and the statistical model that it uses to determine variance, and should probably be asked of the cufflinks people. The error bars plotted by cummeRbund directly reflect the variability of gene:condition as determined by cuffdiff and cannot be changed from within cummeRbund. I would suggest however, that with the newest versions of cuffdiff and cummeRbund, you can expose replicate fpkm values on most plots but using the replicates=T argument where applicable. This should be a good place to start to find out what the distribution of FPKM values for your replicates looks like for a given gene:condition. If the distribution of FPKM values is broad, then the error bars would reflect this variability, if not, then perhaps the cufflinks variance model is a poor fit (and is perhaps reflected in the 'status' field) for a particular gene:condition.

Cheers,
Loyal
lgoff is offline   Reply With Quote
Old 06-02-2012, 05:39 AM   #8
glados
Member
 
Location: Aperture Science

Join Date: Mar 2012
Posts: 59
Default

Thank you Loyal. Yes I understand this is more of a cufflinks problem, not so much cummerbund. I still have not solved it though. I get huge variances in cufflinks 2 also even with many replicates. I don't know why. Will update here if I find out.
glados is offline   Reply With Quote
Old 06-04-2012, 12:59 AM   #9
zhuya5607
Member
 
Location: Stockholm

Join Date: Jun 2012
Posts: 18
Default Boxplot in CummeRbund

I got a weird thing with the Boxplot. No box with medians displayed, only dots. I tried with or without replicates= T option, both produced the exactly same image as the attached image.
I have three replicates at four time points of a cell line after drug treatment,3+3+3+3. I ran cuffdiff in this way, T0(3 replicates at time point 0) versus T1(3 replicates), T0(3 replicates)versus T2(3 re), T0(3 replicates)versus T3(3 re). I did't use time series option, which seems a better way to do it after I checked on this forum.
I use CummeRbund analyze the three comparison seperately, all three boxplots I got are like the attached image without box and medians.
Can you tell me where is wrong or I did some mistakes?


#########################################################

I move this question to cuffdiff thread.
Attached Images
File Type: png Rplot_H0-H24.png (13.4 KB, 57 views)

Last edited by zhuya5607; 06-04-2012 at 01:20 AM. Reason: Wrong place to post the question
zhuya5607 is offline   Reply With Quote
Old 06-16-2012, 09:50 AM   #10
pravee1216
Member
 
Location: India

Join Date: Aug 2010
Posts: 35
Default ExpressionBarPlot in cummeRbund

BTW, what do the error bars in the expressionBar plot represent for each gene? I don't find any documentation specific to this? Any help?
pravee1216 is offline   Reply With Quote
Old 08-10-2012, 03:48 AM   #11
glados
Member
 
Location: Aperture Science

Join Date: Mar 2012
Posts: 59
Default

Quote:
Originally Posted by pravee1216 View Post
BTW, what do the error bars in the expressionBar plot represent for each gene? I don't find any documentation specific to this? Any help?
Based on the Nature Protocol paper I believe it is confidence intervals, but I agree it is not very clear. Also, I don't know if it's possible to change the error bars to represent something else, like standard error, etc.
glados is offline   Reply With Quote
Old 04-02-2013, 10:34 AM   #12
haggardd
Junior Member
 
Location: Oregon

Join Date: Apr 2013
Posts: 5
Default

I know this thread has been dead for a while...here goes nothing.

I'm relatively new to R and to the fantastic cummeRbund package but I have a question regarding the diffData() and getSig() functions.
As stated:
Quote:
3) When testing 3 conditions against each other, I get different number of differential expressed genes depending on if I use the getSig() function or counting the number of rows as they did in the Protocol paper. Which one is more correct?

Code:
diffGeneIDs <- getSig(cuff, level="genes", alpha=0.05)
gives 315 genes, while
Code:
gene_diff_data <- diffData(genes(cuff))
sig_gene_data <- subset(gene_diff_data, (significant == 'yes'))
nrow(sig_gene_data)
gives 372 genes.
Based on the replies, the getSig() function is more accurate due to multiple-testing corrections, etc. However, the getSig() function only returns a vector of the genes rather than a data.frame that includes the log fold change, q-value, FPKM values, etc. (which is very useful and is what diffData() returns). The question I have is if a way exists to get that information from the list of getSig() genes?

Thanks,
Derik
haggardd is offline   Reply With Quote
Old 04-03-2013, 12:16 AM   #13
sBeier
Member
 
Location: Germany

Join Date: Jan 2013
Posts: 42
Default

There is a getGenes-Funktion that will give you a subset of the whole database for only those genes. If cuff is you dataset and sigGenes the list you got with getSig use getGenes(cuff, sigGenes).
sBeier is offline   Reply With Quote
Old 07-24-2013, 09:13 PM   #14
jp.
Senior Member
 
Location: NikoNarita.jp

Join Date: Jul 2013
Posts: 142
Question

I have similar problem, getGenes(cuff, sigGenes) produced sig_diff_genes but the error bars are very high and FPKM value is very low.
In my attached file;
1st group has low error bar and high fpkm value clearly indicates diff_gene. but this was not identified by getGenes(cuff, sigGenes). I manually got it.
2nd and remaining all have high error bars identified by getGenes(cuff, sigGenes), however, does not seems to be really important.
3rd How much the error bar should be ? Can I select diff_genes with high error bars ? How much FPKM should be ?
please do reply, if possible.
Thank you in advanced.
seq.png
jp. is offline   Reply With Quote
Old 02-24-2015, 02:27 PM   #15
pepe84
Junior Member
 
Location: Canada

Join Date: Jul 2014
Posts: 2
Default Heatmap scale

Hi guys,

I am trying to get a heatmap for all my differentially expressed genes (500 total). However I am experiencing difficulties with this. I tried to manually input all gene codes to R, use the function getGenes and then find the heatmap, but I was hopeless. The problem is with the scale of the heat map. I don't know how to modify the scale and it gives me 3 ranges (from 0 to 0.3, 0.3 to 2 and 2 to anything above that). Most of my genes have FPKM values higher than 2, so my heatmap has only one color all across the map

Any idea on how to change the scale of my heatmap?
pepe84 is offline   Reply With Quote
Old 07-10-2017, 02:13 AM   #16
AkilaRanjith
Junior Member
 
Location: India

Join Date: Jul 2017
Posts: 1
Angry cummeRbund- Expression bar plot

Hi All,

I have used cufflink/cuffdiff for the differential gene expression analysis which includes 3 biological replicates per condition. In the expression bar plot error bar is so high. How to calculate the variance between replicates. help me
AkilaRanjith is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO