Unconfigured Ad

**crazyhottommy** · 08-20-2014, 10:16 AM

Have a look at my post http://crazyhottommy.blogspot.com/20...-sort-and.html

Originally posted by Grigory View Post

Hello,

I am a bench developmental biologist with essentially 0 bioinformatics background, so please excuse my stupidity.

I generated 12 HTSeq count files (4 experimental conditions - 1 control experiment and 3 gene knockdowns, 3 biological replicates for each condition, single end 50 bp reads, not strand specific) from my tophat alignments. I would like to use DESeq to analyze differential gene expression in my samples. My problems start with me not understanding the explanation of how I should create a CountDataSet from my 12 files, and the pasilla example on the DESeq web page is not helping. Can someone please help me with that?

I would also be grateful for a link to a detailed step-by-step protocol for DESeq on HTSeq counts (not the one from the DESeq page - I have that already, and it is not "for dummies" enough for me).

Thanks!

Grigory

**Grigory** · 08-20-2014, 10:48 AM

Originally posted by crazyhottommy View Post

Have a look at my post http://crazyhottommy.blogspot.com/20...-sort-and.html

Hi,
Thanks! There must have been some misunderstanding. I am already as far as the end of the page you referred to. I have counts for each gene in a table generated by htseq. What I did was the following:

samtools view mapped_reads_sample1.bam | htseq-count -s no - transcriptome.gtf > mapped_reads_sample1_htseq_out

So, I ended up with 12 such files. The question is what I should do now to be able to load this data into DESeq, normalize, merge biological replicates and compare expression in 3 experiments to control.

**kopi-o** · 08-20-2014, 12:32 PM

Use the DESeqDataSetFromHTSeq function from DESeq2, or repeatedly use the "join" Unix/Linux command line tool, or use something from this thread

DE Seq - SEQanswers

http://seqanswers.com/forums/showthread.php?t=22936&highlight=htseq

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

**Wallysb01** · 08-20-2014, 03:11 PM

since htseq is always going to be in the same order, I just use paste (which can takes a space seperated list of your files) and awk. Paste will print the row names for each file, so that's what awk is taking out.

So its:

[CODE]
paste htseq_outputs* | awk '{print $1,$2,$4,$6,$8}' > htseq_all.txt
[\CODE]

just mind the seperators, after that awk command it will be a single space.

**Grigory** · 08-21-2014, 01:45 AM

Thanks Wallysb01! This was helpful. Got my table now. Gave me a bit of learning too since I did not understand in the beginning why I am printing only even columns after the first 2.
Let's see if I can get any further...

**crazyhottommy** · 08-22-2014, 05:25 AM

Originally posted by Grigory View Post

Thanks Wallysb01! This was helpful. Got my table now. Gave me a bit of learning too since I did not understand in the beginning why I am printing only even columns after the first 2.
Let's see if I can get any further...

you should get rid of the last five lines:

Just use this single command to merge the HTSeq count results:
paste *txt | cut -f1,2,4,6,8,10,12,14,16,18,20,22,24,26 | head -n -5 > counts.txt
Note: we assume the gene_id (column 1) are in the same sequence, otherwise use command like "join" . we paste the txt files side by side using paste. The gene_id column will be duplicated, just keep the first column, and cut out the counts column by "cut", then remove the last five lines by head -n -5 and redirect it to a new file. Now,you can feed DESeq this counts.txt directly.

**dpryan** · 08-22-2014, 05:39 AM

If you're using DESeq2, there's no reason to manually merge the files. You can just use a list of files and it'll merge everything (and remove the last 5 lines) for you. It'll also sanity check that things are in the same order and contain the same IDs.

**Grigory** · 08-22-2014, 08:58 AM

Originally posted by dpryan View Post

If you're using DESeq2, there's no reason to manually merge the files. You can just use a list of files and it'll merge everything (and remove the last 5 lines) for you. It'll also sanity check that things are in the same order and contain the same IDs.

Thank you! I have installed DESeq2, and after several blunders managed to make the table from HTseq data.
However, at this point I fell into the next pit hole:
I could define which parts of the data table I wanted to compare as:

> ddsHTSeq$condition <- factor(ddsHTSeq$condition, levels=c("control","morphant-1"))

I only included one of my 3 morphant conditions for comparison with control, as the comparison is pairwise, if I understand it correctly. This worked, and the output looked reasonable - R excluded counts from 2 remaining morphants:

> ddsHTSeq$condition
[1] control control control morphant-1 morphant-1 morphant-1 <NA> <NA> <NA> <NA> <NA> <NA>
Levels: control morphant-1

Here, the DESeq2 manual starts with differential expression analysis as follows:
dds <- DESeq(dds)
res <- results(dds)
resOrdered <- res[order(res$padj),]

For that I obviously need to define dds. The DESeq2 manual says:

Here we show the most basic steps for a differential expression analysis. These steps imply you have a SummarizedExperiment object se with a column condition in colData(se).

dds <- DESeqDataSet(se = se, design = ~ condition)
dds <- DESeq(dds)
res <- results(dds)

but here comes the error:
> dds <- DESeqDataSet(se = se, design = ~ condition)
Error in assays(se) : error in evaluating the argument 'x' during method selection for function 'assays': Error: object se not found.

Can't find my way around this one yet.

Grigory

**dpryan** · 08-22-2014, 10:39 AM

Please show the code you used to get "ddsHTSeq". I'm guessing that that's already a DESeqDataSet object, so you'd just use it as "dds".

**Grigory** · 08-22-2014, 10:58 AM

Hi, thanks for you help! I basically followed the DESeq2 manual.

> sampleFiles <- grep("MO_", list.files(directory), value=TRUE)
> sampleCondition <-sub("(.*MO).*","\\1",sampleFiles)
> sampleTable<-data.frame(sampleName=sampleFiles,
+ fileName=sampleFiles,
+ condition=sampleCondition)
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
+ directory = directory,
+ design = ~ condition)
> ddsHTSeq$condition <- factor(ddsHTSeq$condition, levels=c("control","morhant-1"))
> dds <- DESeq(dds)
Error in attr(object, "betaPrior") <- betaPrior : object 'dds' not found

I then tried to do the following based on this protocol (which is for DESeq, not DESeq2):

> dds <- DESeqDataSet(ddsHTSeq, design = ~ condition)
> dds <- estimateSizeFactors(dds)
> sizeFactors(dds)
> head(counts(dds))
> head(counts(dds,normalized=TRUE))
> dds <- estimateDispersions(dds)
gene-wise dispersion estimates
Error in t(hatmatrix %*% t(y)) : non-conformable arguments

Size factors and normalized counts I could see, but estimating dispersion did not work.

Thanks again!

G.

**dpryan** · 08-22-2014, 11:09 AM

It would seem much easier to just:

Code:

>sampleFiles <- grep("MO_", list.files(directory), value=TRUE)
> sampleCondition <-sub("(.*MO).*","\\1",sampleFiles)
> sampleTable<-data.frame(sampleName=sampleFiles,
+ fileName=sampleFiles,
+ condition=sampleCondition)
> IDX <- which(sampleCondition %in% c("control", "morphant-1"))
> sampleCondition <- sampleCondition[IDX,]
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
+ directory = directory,
+ design = ~ condition)
> dds <- DESeq(ddsHTSeq)

You could also just subset ddsHTSeq.

**Grigory** · 08-22-2014, 11:20 AM

Originally posted by dpryan View Post

It would seem much easier to just:

Code:

>sampleFiles <- grep("MO_", list.files(directory), value=TRUE)
> sampleCondition <-sub("(.*MO).*","\\1",sampleFiles)
> sampleTable<-data.frame(sampleName=sampleFiles,
+ fileName=sampleFiles,
+ condition=sampleCondition)
> IDX <- which(sampleCondition %in% c("control", "morphant-1"))
> sampleCondition <- sampleCondition[IDX,]
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
+ directory = directory,
+ design = ~ condition)
> dds <- DESeq(ddsHTSeq)

You could also just subset ddsHTSeq.

> sampleCondition <- sampleCondition[IDX,]
Error in sampleCondition[IDX, ] : incorrect number of dimensions

**dpryan** · 08-22-2014, 11:27 AM

Oops, the line in question should be:

Code:

sampleTable <- sampleTable[IDX,]

**Grigory** · 08-22-2014, 12:19 PM

Originally posted by dpryan View Post

Oops, the line in question should be:

Code:

sampleTable <- sampleTable[IDX,]

Thank you! Worked.

Topics	Statistics	Last Post
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds by SEQadmin2 Started by SEQadmin2, 07-20-2026, 11:10 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 07-20-2026, 11:10 AM
UC San Diego Bioengineers Map Gene Function in Human Stem Cells by SEQadmin2 Started by SEQadmin2, 07-13-2026, 10:26 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 07-13-2026, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 42 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 28 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM

Unconfigured Ad

DESeq on 4 conditions, 3 replicates each - how?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News