SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
cuffdiff p value for 2 conditions without replicates hug2001 Bioinformatics 17 01-24-2013 04:59 AM
Specifing conditions and normaliztion in DESeq? tamari Bioinformatics 0 10-31-2012 01:04 AM
DESeq: defining conditions mouchkam Bioinformatics 2 04-10-2012 04:45 PM
Is more than two conditions possible in DESEQ? greener RNA Sequencing 5 05-09-2011 03:10 PM
RNASeq experiments with 2 conditions and more than 3 replicates Jouneau Luc RNA Sequencing 1 04-05-2011 03:00 AM

Reply
 
Thread Tools
Old 08-20-2014, 08:32 AM   #1
Grigory
Junior Member
 
Location: Vienna

Join Date: Aug 2014
Posts: 7
Default DESeq on 4 conditions, 3 replicates each - how?

Hello,

I am a bench developmental biologist with essentially 0 bioinformatics background, so please excuse my stupidity.

I generated 12 HTSeq count files (4 experimental conditions - 1 control experiment and 3 gene knockdowns, 3 biological replicates for each condition, single end 50 bp reads, not strand specific) from my tophat alignments. I would like to use DESeq to analyze differential gene expression in my samples. My problems start with me not understanding the explanation of how I should create a CountDataSet from my 12 files, and the pasilla example on the DESeq web page is not helping. Can someone please help me with that?

I would also be grateful for a link to a detailed step-by-step protocol for DESeq on HTSeq counts (not the one from the DESeq page - I have that already, and it is not "for dummies" enough for me).

Thanks!

Grigory
Grigory is offline   Reply With Quote
Old 08-20-2014, 10:16 AM   #2
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

Have a look at my post http://crazyhottommy.blogspot.com/20...-sort-and.html



Quote:
Originally Posted by Grigory View Post
Hello,

I am a bench developmental biologist with essentially 0 bioinformatics background, so please excuse my stupidity.

I generated 12 HTSeq count files (4 experimental conditions - 1 control experiment and 3 gene knockdowns, 3 biological replicates for each condition, single end 50 bp reads, not strand specific) from my tophat alignments. I would like to use DESeq to analyze differential gene expression in my samples. My problems start with me not understanding the explanation of how I should create a CountDataSet from my 12 files, and the pasilla example on the DESeq web page is not helping. Can someone please help me with that?

I would also be grateful for a link to a detailed step-by-step protocol for DESeq on HTSeq counts (not the one from the DESeq page - I have that already, and it is not "for dummies" enough for me).

Thanks!

Grigory
crazyhottommy is offline   Reply With Quote
Old 08-20-2014, 10:48 AM   #3
Grigory
Junior Member
 
Location: Vienna

Join Date: Aug 2014
Posts: 7
Default

Quote:
Originally Posted by crazyhottommy View Post
Hi,
Thanks! There must have been some misunderstanding. I am already as far as the end of the page you referred to. I have counts for each gene in a table generated by htseq. What I did was the following:

Quote:
samtools view mapped_reads_sample1.bam | htseq-count -s no - transcriptome.gtf > mapped_reads_sample1_htseq_out
So, I ended up with 12 such files. The question is what I should do now to be able to load this data into DESeq, normalize, merge biological replicates and compare expression in 3 experiments to control.
Grigory is offline   Reply With Quote
Old 08-20-2014, 12:32 PM   #4
kopi-o
Senior Member
 
Location: Stockholm, Sweden

Join Date: Feb 2008
Posts: 319
Default

Use the DESeqDataSetFromHTSeq function from DESeq2, or repeatedly use the "join" Unix/Linux command line tool, or use something from this thread

http://seqanswers.com/forums/showthr...ighlight=htseq
kopi-o is offline   Reply With Quote
Old 08-20-2014, 03:11 PM   #5
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

since htseq is always going to be in the same order, I just use paste (which can takes a space seperated list of your files) and awk. Paste will print the row names for each file, so that's what awk is taking out.

So its:

[CODE]
paste htseq_outputs* | awk '{print $1,$2,$4,$6,$8}' > htseq_all.txt
[\CODE]

just mind the seperators, after that awk command it will be a single space.

Last edited by Wallysb01; 08-21-2014 at 06:26 AM.
Wallysb01 is offline   Reply With Quote
Old 08-21-2014, 01:45 AM   #6
Grigory
Junior Member
 
Location: Vienna

Join Date: Aug 2014
Posts: 7
Default

Thanks Wallysb01! This was helpful. Got my table now. Gave me a bit of learning too since I did not understand in the beginning why I am printing only even columns after the first 2.
Let's see if I can get any further...
Grigory is offline   Reply With Quote
Old 08-22-2014, 05:25 AM   #7
crazyhottommy
Senior Member
 
Location: Gainesville

Join Date: Apr 2012
Posts: 140
Default

Quote:
Originally Posted by Grigory View Post
Thanks Wallysb01! This was helpful. Got my table now. Gave me a bit of learning too since I did not understand in the beginning why I am printing only even columns after the first 2.
Let's see if I can get any further...
you should get rid of the last five lines:

Just use this single command to merge the HTSeq count results:
paste *txt | cut -f1,2,4,6,8,10,12,14,16,18,20,22,24,26 | head -n -5 > counts.txt
Note: we assume the gene_id (column 1) are in the same sequence, otherwise use command like "join" . we paste the txt files side by side using paste. The gene_id column will be duplicated, just keep the first column, and cut out the counts column by "cut", then remove the last five lines by head -n -5 and redirect it to a new file. Now,you can feed DESeq this counts.txt directly.
crazyhottommy is offline   Reply With Quote
Old 08-22-2014, 05:39 AM   #8
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

If you're using DESeq2, there's no reason to manually merge the files. You can just use a list of files and it'll merge everything (and remove the last 5 lines) for you. It'll also sanity check that things are in the same order and contain the same IDs.
dpryan is offline   Reply With Quote
Old 08-22-2014, 08:58 AM   #9
Grigory
Junior Member
 
Location: Vienna

Join Date: Aug 2014
Posts: 7
Default

Quote:
Originally Posted by dpryan View Post
If you're using DESeq2, there's no reason to manually merge the files. You can just use a list of files and it'll merge everything (and remove the last 5 lines) for you. It'll also sanity check that things are in the same order and contain the same IDs.
Thank you! I have installed DESeq2, and after several blunders managed to make the table from HTseq data.
However, at this point I fell into the next pit hole:
I could define which parts of the data table I wanted to compare as:

> ddsHTSeq$condition <- factor(ddsHTSeq$condition, levels=c("control","morphant-1"))

I only included one of my 3 morphant conditions for comparison with control, as the comparison is pairwise, if I understand it correctly. This worked, and the output looked reasonable - R excluded counts from 2 remaining morphants:

> ddsHTSeq$condition
[1] control control control morphant-1 morphant-1 morphant-1 <NA> <NA> <NA> <NA> <NA> <NA>
Levels: control morphant-1

Here, the DESeq2 manual starts with differential expression analysis as follows:
dds <- DESeq(dds)
res <- results(dds)
resOrdered <- res[order(res$padj),]

For that I obviously need to define dds. The DESeq2 manual says:

Quote:
Here we show the most basic steps for a differential expression analysis. These steps imply you have a SummarizedExperiment object se with a column condition in colData(se).

dds <- DESeqDataSet(se = se, design = ~ condition)
dds <- DESeq(dds)
res <- results(dds)
but here comes the error:
> dds <- DESeqDataSet(se = se, design = ~ condition)
Error in assays(se) : error in evaluating the argument 'x' during method selection for function 'assays': Error: object se not found.

Can't find my way around this one yet.

Grigory
Grigory is offline   Reply With Quote
Old 08-22-2014, 10:39 AM   #10
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Please show the code you used to get "ddsHTSeq". I'm guessing that that's already a DESeqDataSet object, so you'd just use it as "dds".
dpryan is offline   Reply With Quote
Old 08-22-2014, 10:58 AM   #11
Grigory
Junior Member
 
Location: Vienna

Join Date: Aug 2014
Posts: 7
Default

Hi, thanks for you help! I basically followed the DESeq2 manual.

Quote:
> sampleFiles <- grep("MO_", list.files(directory), value=TRUE)
> sampleCondition <-sub("(.*MO).*","\\1",sampleFiles)
> sampleTable<-data.frame(sampleName=sampleFiles,
+ fileName=sampleFiles,
+ condition=sampleCondition)
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
+ directory = directory,
+ design = ~ condition)
> ddsHTSeq$condition <- factor(ddsHTSeq$condition, levels=c("control","morhant-1"))
> dds <- DESeq(dds)
Error in attr(object, "betaPrior") <- betaPrior : object 'dds' not found
I then tried to do the following based on this protocol (which is for DESeq, not DESeq2):

Quote:
> dds <- DESeqDataSet(ddsHTSeq, design = ~ condition)
> dds <- estimateSizeFactors(dds)
> sizeFactors(dds)
> head(counts(dds))
> head(counts(dds,normalized=TRUE))
> dds <- estimateDispersions(dds)
gene-wise dispersion estimates
Error in t(hatmatrix %*% t(y)) : non-conformable arguments
Size factors and normalized counts I could see, but estimating dispersion did not work.

Thanks again!

G.

Last edited by Grigory; 08-22-2014 at 11:12 AM.
Grigory is offline   Reply With Quote
Old 08-22-2014, 11:09 AM   #12
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

It would seem much easier to just:
Code:
>sampleFiles <- grep("MO_", list.files(directory), value=TRUE)
> sampleCondition <-sub("(.*MO).*","\\1",sampleFiles)
> sampleTable<-data.frame(sampleName=sampleFiles,
+ fileName=sampleFiles,
+ condition=sampleCondition)
> IDX <- which(sampleCondition %in% c("control", "morphant-1"))
> sampleCondition <- sampleCondition[IDX,]
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
+ directory = directory,
+ design = ~ condition)
> dds <- DESeq(ddsHTSeq)
You could also just subset ddsHTSeq.
dpryan is offline   Reply With Quote
Old 08-22-2014, 11:20 AM   #13
Grigory
Junior Member
 
Location: Vienna

Join Date: Aug 2014
Posts: 7
Default

Quote:
Originally Posted by dpryan View Post
It would seem much easier to just:
Code:
>sampleFiles <- grep("MO_", list.files(directory), value=TRUE)
> sampleCondition <-sub("(.*MO).*","\\1",sampleFiles)
> sampleTable<-data.frame(sampleName=sampleFiles,
+ fileName=sampleFiles,
+ condition=sampleCondition)
> IDX <- which(sampleCondition %in% c("control", "morphant-1"))
> sampleCondition <- sampleCondition[IDX,]
> ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
+ directory = directory,
+ design = ~ condition)
> dds <- DESeq(ddsHTSeq)
You could also just subset ddsHTSeq.
> sampleCondition <- sampleCondition[IDX,]
Error in sampleCondition[IDX, ] : incorrect number of dimensions
Grigory is offline   Reply With Quote
Old 08-22-2014, 11:27 AM   #14
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Oops, the line in question should be:
Code:
sampleTable <- sampleTable[IDX,]
dpryan is offline   Reply With Quote
Old 08-22-2014, 12:19 PM   #15
Grigory
Junior Member
 
Location: Vienna

Join Date: Aug 2014
Posts: 7
Default

Quote:
Originally Posted by dpryan View Post
Oops, the line in question should be:
Code:
sampleTable <- sampleTable[IDX,]
Thank you! Worked.
Grigory is offline   Reply With Quote
Reply

Tags
deseq, htseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO