SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
HTseq: Very few counts recognised sindrle Bioinformatics 5 01-20-2014 05:00 PM
total number of counts with HTseq oliviera Bioinformatics 17 07-26-2013 07:33 AM
How to rescue multi-reads when using htseq to generate edgeR/DESeq counts? Hilary April Smith Bioinformatics 3 05-06-2013 11:07 AM
understanding HTSeq counts nimmi Bioinformatics 3 11-27-2010 07:24 PM
DESeq: Read counts vs. BP counts burkard Bioinformatics 0 08-05-2010 11:52 PM

Reply
 
Thread Tools
Old 06-08-2014, 12:42 AM   #1
JonB
Member
 
Location: Norway

Join Date: Jan 2010
Posts: 83
Default DESeq: question about using HTSeq counts

In the count files from HTSeq there are a few lines at the end:
__no_feature
__ambiguous
__too_low_aQual
__not_aligned
__alignment_not_unique

Should these lines be removed before loading into DESeq? I seem to get slightly different normalized counts when I create a count data set using newCountDataSet on a count table or newCountDataSetFromHTSeqCount directly on the HTSeq counts. Could this be due to these last lines?
JonB is offline   Reply With Quote
Old 06-08-2014, 01:12 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The functions in DESeq2 that load those files actually remove those lines for you
dpryan is offline   Reply With Quote
Old 06-08-2014, 01:16 AM   #3
JonB
Member
 
Location: Norway

Join Date: Jan 2010
Posts: 83
Default

When I do

tail(counts(cds, normalized=TRUE))

I see that these lines are there, but maybe they are not taken into account when doing analyses in DESeq?
JonB is offline   Reply With Quote
Old 06-08-2014, 01:21 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

What function did you use to load the files and what's the output of sessionInfo()?
dpryan is offline   Reply With Quote
Old 06-08-2014, 01:29 AM   #5
JonB
Member
 
Location: Norway

Join Date: Jan 2010
Posts: 83
Default

> library("DESeq")
> sampleTable = read.csv(file="Gene_count_files/sampletable.txt", header=TRUE, sep="\t")
> cds = newCountDataSetFromHTSeqCount(sampleTable, directory="Gene_count_files/")
> cds = estimateSizeFactors(cds)

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] C

attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] DESeq_1.14.0 lattice_0.20-29 locfit_1.5-9.1 Biobase_2.22.0
[5] GenomicRanges_1.14.4 XVector_0.2.0 IRanges_1.20.7 BiocGenerics_0.8.0

loaded via a namespace (and not attached):
[1] AnnotationDbi_1.24.0 DBI_0.2-7 RColorBrewer_1.0-5 RSQLite_0.11.4
[5] XML_3.95-0.2 annotate_1.40.1 genefilter_1.44.0 geneplotter_1.40.0
[9] grid_3.0.2 splines_3.0.2 stats4_3.0.2 survival_2.37-7
[13] tools_3.0.2 xtable_1.7-3
JonB is offline   Reply With Quote
Old 06-08-2014, 02:07 AM   #6
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Try using the DESeqDataSetFromHTSeqCount() function. At least in the most recent version it strips those lines.
dpryan is offline   Reply With Quote
Old 06-08-2014, 02:18 AM   #7
JonB
Member
 
Location: Norway

Join Date: Jan 2010
Posts: 83
Default

Ok, thanks!

Is it also safe to remove these lines from the raw count files or will this mess up the normalization later?
JonB is offline   Reply With Quote
Old 06-08-2014, 02:43 AM   #8
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838
Default

Unless you have a specific reason not to, you should probably be using DESeq2 rather than DESeq -- it has better statistical models, is more flexible, and makes the process a bit easier.

That said, I would expect that removing the lines will be fine, given that other ways of getting counts into a DESeq structure don't require unmapped read counts to be specified.
gringer is offline   Reply With Quote
Old 06-08-2014, 04:28 AM   #9
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Go ahead and remove them, they should be removed prior to normalization anyway. And as David said, switch to DESeq2, which has a number of improvements.
dpryan is offline   Reply With Quote
Old 06-08-2014, 02:09 PM   #10
JonB
Member
 
Location: Norway

Join Date: Jan 2010
Posts: 83
Default

Thanks guys,
I actually didn't know there was a DESeq2. I will check it out asap
JonB is offline   Reply With Quote
Old 06-09-2014, 03:20 AM   #11
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

I mannually remove these lines. just some scripts should be OK for you.
super0925 is offline   Reply With Quote
Reply

Tags
deseq, htseq, htseq count

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:36 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO