Seqanswers Leaderboard Ad

**roseadele** · 11-08-2011, 05:46 PM

DESeq

I have a table of contants in excell containing the name of the genes, the conditions and the number of reads per each gene in each consition. How can I use this data in DESeq packcage? How I put the table in R?

**Gators** · 11-08-2011, 08:11 PM

Originally posted by cascoamarillo View Post

So it has been removed from the new version, or what does it mean?

Thanks

Yes, it has been replaced

**arvid** · 11-09-2011, 01:08 AM

Originally posted by roseadele View Post

I have a table of contants in excell containing the name of the genes, the conditions and the number of reads per each gene in each consition. How can I use this data in DESeq packcage? How I put the table in R?

Googling "read data from Excel R" gives me 136 millions answers, and the first ten looked clear and simple. The rest of the information is found in the DESeq manual (search for "Analysing RNA-Seq data with the "DESeq" package"), which is nicely written with clear examples.
You might need a R tutorial if you are not familiar with it; you could start here: http://cran.r-project.org/doc/manuals/R-intro.html.

**crh** · 12-22-2011, 09:12 AM

DESeq w/o replicates - padj

Hi New to RNASeq.

We are looking at data w/o replicates (bad I know, but $$ prohibited).
Can someone explain how I interpret padj values =1. I believe this is a measure of FDR type I error?

In the data below, we appear to have 4 genes that are significantly DE?
I know that w/o replicates we are underestimating the true DE discovery..

Charles

deseq_id gene_counts(nano) gene_counts(ctrl) baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj
9600 174 13 83.74641874 16.32121019 151.1716273 9.262280527 3.211367453 0.000890382 0.740886757
10604 227 19 110.5361169 23.85407643 197.2181574 8.267692025 3.047484649 0.001206005 0.771936026
8063 593 59 294.6365205 74.07318469 515.1998562 6.955281569 2.79810892 0.001591703 0.88218547
9821 245 23 120.8662944 28.87598725 212.8566016 7.371405167 2.881939658 0.001793433 0.88218547
680 61 4 29.00943031 5.021910827 52.9969498 10.55314434 3.399601013 0.002231307 1
8550 402 44 202.2498031 55.24101909 349.2585872 6.322450109 2.660483748 0.002796612 1

**Simon Anders** · 12-22-2011, 11:41 AM

No, you have nothing.

An FDR of 0.1 (i.e., 10%), for example, means that your gene list contains at most an estimated 10% of false positives. To get such a list, you take all genes with padj<.1.

Thus, padj=1 means that you cannot include the gene even if you are willing to accept 99% false positives.

What I never understand is why people claim that lack of money precluded them from doing replicates. First, you now have wasted all the money you paid for the sequencing run, because without replicates it is highly unlikely to ever get useful results.

Second, while it may have been expensive to obtain replicate samples it is not expensive to sequence additional samples. After all, having twice as many samples does not mean that you need to use twice as many lanes. You simply use multiplexing to sequence each sample to only half the depth and still get more statistical power than with fewer samples at more depth. The only extra expense is the additional library prep kits, not the sequencing itself.

**crh** · 12-22-2011, 12:44 PM

No replicates

Duly noted.

c

**vyellapa** · 02-16-2012, 12:56 AM

Originally posted by Simon Anders View Post

The purpose of the 'blind' method was never to offer a proper analysis method for experiments without replication, because is is simply not possible (not just "dangerous") to get conclusions. The whole point of replicates is to allow you to draw the line for significance, i.e., to know how much fold change you need to see to consider an effect real. Without replicates, you can guess, of course, but it has to be a wild guess, unless you are happy with the extremely over-careful guess that e.g. the "blind" method give you.

Is the "guess work" similar to what cuff-diff when replicates are not provided. There seems to be some mathematical modeling that cuff-diff does that I don't completely understand. Is the method 'blind' for a non statistical person to understand mentioned anywhere?

**Simon Anders** · 02-16-2012, 02:05 AM

If no replicates are provided, there is no way to know the real biological variability, and hence there are at least two options:

(i) You can ignore the issue by (implicitly) postulating the biological variance to be zero. Unfortunately, this is the option most commonly chosen in the literature, despite the fact that it is clearly untenable and will lead to nearly all strongly expressed genes being called differentially expressed if you have sequenced deeply. Cuffdiff, in the versions described in the papers, also suffered from this flaw, but I don't know what the current version does. A way to find out might be to compare if you get more or less hits if you apply the tool of your choice first on a dataset with replicates and then on only two samples from this dataset, one from each treatment group. If you get more significant hits with less data, this would hint at biological variation not being properly accounted for.

(ii) If you think that only very few genes are differentially expressed, you can pretend that your two samples are replicates with respect to the majority of genes, and use this to assess variability. You might strongly overestimate variance that way and dramatically lose power. In other words: you only consider those genes as differentially expressed that differ so much more between the two samples than nearly all other samples that they "stick out" very prominently. This is what DESeq's "blind" approach attempts. Obviously, you typically only get very few hits this way, and even these could be just fluke findings. See the vignette and the paper for details.

Wu et al. (BMC Bioinformatics 2010, 11:564) tried to find a middle ground here but I have not heard about any practical experiences with their approach. Anybody here tried that?

**gstitan** · 03-01-2012, 01:55 AM

prb with DESeq with estimateVarianceFunctions

Originally posted by Simon Anders View Post

Start R, load DESeq, and type "?estimateVarianceFunctions". If you don't see anything there about 'metho', you have an old DESeq version.

Simon

Hey Simon,

I try to use DESeq. "?estimateVarianceFunctionse" give me :
...
"Usage:

estimateVarianceFunctions(cds, method = c( "normal", "blind", "pooled" ),
pool = NULL, locfit_extra_args = list(), lp_extra_args = list(),
modelFrame = NULL )"

but when I use it, I obtain a error message :
"cds <- estimateVarianceFunctions(cds,method="blind")
Erreur : tentative d'appliquer un objet qui n'est pas une fonction"

So I don't understand why. Before this line, I do
cds <- newCountDataSet(countsTable,conds)

cds <- estimateSizeFactors(cds)

and it's works but not this method "estimateVarianceFunctions".

Can you help me please ?

Thanks

**Simon Anders** · 03-01-2012, 02:21 AM

You must have managed to override the definition of estimateVarianceFunctions further up in your session. Independent of that, please update to a current version of R and Bioconductor.

**vyellapa** · 06-22-2012, 03:51 PM

If I am trying to find differentially expressed genes between say tumor and relapse samples and I have 3 samples each for tumor and relapse patients. Can I group 3 tumor patients as replicates and do the same for relapse samples to get the differentially expressed genes between tumor and relapse cases?

Would such grouping cause any weird results to inaccurate variance estimation that would result from 1)biological noise 2) between sample variation

**Simon Anders** · 06-24-2012, 01:39 AM

Sure, it is correct to group in this manner. Of course, you will not get any results due to the high between-group variance, but I guess you know that there is no chance of finding differences between tumour types with so few samples.

**BFM** · 07-15-2014, 02:07 PM

Hi i am using DEseq with no replicates

> conds <- factor( c( "A-Mock", "A-Infect", "B-Mock", "B-infect" ) )

i need to compare the diff expression between A-mock and A infect similarly B mock B infected. It doesnt seem to work i am using

res <- nbinomTest( cds, "A-mock", "A-infect", )
> res <- nbinomTest( cds, "B-mock", "B-infect", )

but at the end i am getting only one p value. How to solve this problem. Please help

**Jeremy** · 07-15-2014, 06:38 PM

Those are two different tests, but you are overwriting the first reslt object (res) with the second.

res <- nbinomTest( cds, "A-mock", "A-infect", )
res <- nbinomTest( cds, "B-mock", "B-infect", ) # replaces the above result with the new one

Call one resA and the other resB for example. In the end you might want to merge the two for comparison or just write out both as separate tables.

resA <- nbinomTest( cds, "A-mock", "A-infect", )
resB <- nbinomTest( cds, "B-mock", "B-infect", )

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News