SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bead Enrichment - infinite washes Aniki 454 Pyrosequencing 24 05-03-2012 10:41 AM
DESeq DiffExpress Vs. Fold Change KellerMac Bioinformatics 3 06-10-2011 10:49 AM
fold change value-cuffdiff madsaan Bioinformatics 4 02-10-2011 06:51 AM
fold change for genes with 0 reads rnaseq RNA Sequencing 15 11-05-2010 01:23 AM
fold change for genes with 0 reads rnaseq General 2 11-04-2010 07:44 PM

Reply
 
Thread Tools
Old 08-02-2011, 12:22 PM   #1
biofreak
Member
 
Location: USA

Join Date: Jun 2011
Posts: 44
Default infinite fold change RNAseq

Hi all,
I am analyzing my RNAseq data using DEseq package. Some of the read counts in the data are as follows. (2 groups of 3 samples each)
CTL CTL CTL HF HF HF
gene1 144 0 0 0 0 0
gene2 45 57 0 0 0 0

Now, after running DEseq NB test, fold change for both gene1 and gene2 is 'inf'. which is okay because the read counts for the 3 samples in group2 is 0. However, I am not sure about gene1 data. Only 1 sample showing reads and other two being zero is unreliable.
Should I remove such cases and keep only genes like gene2?
Also, should I remove the genes with all read counts zero before the analysis? (no data for all 6 samples in this case)
thanks,
biofreak is offline   Reply With Quote
Old 08-03-2011, 12:36 AM   #2
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by biofreak View Post
Hi all,
I am analyzing my RNAseq data using DEseq package. Some of the read counts in the data are as follows. (2 groups of 3 samples each)
CTL CTL CTL HF HF HF
gene1 144 0 0 0 0 0
gene2 45 57 0 0 0 0

Now, after running DEseq NB test, fold change for both gene1 and gene2 is 'inf'. which is okay because the read counts for the 3 samples in group2 is 0. However, I am not sure about gene1 data. Only 1 sample showing reads and other two being zero is unreliable.
Should I remove such cases and keep only genes like gene2?
The problem has little to do with the values being zero, you also get genes with values like '200, 1, 3; 2, 3, 2', i.e., one large value and all other values small.

We recently changed how DESeq deals with this kind of data, so I have to give you two answers.

for the released version of DESeq (v1.4.1): The data frame returned by 'nbinomTest' contains two columns called 'resVarA' and 'resVarB'. The values in their should be around 1 and for genes with unusually large estimated within-group variance ("variance outliers"), the value will be much higher. We recommend discounting such genes in the downstream analysis. See vignette for details

for the development version (v1.5.23): We removed the 'resVar' columns as it turned out to be difficult to recommend a good cut-off value. Instead, the test is now informed about high estimated variance and becomes more conservative in this cases, i.e., genes as in your example should no longer reach a significant p value. Again, see the new (extensively revised) vignette for details.

Quote:
Also, should I remove the genes with all read counts zero before the analysis? (no data for all 6 samples in this case)
thanks,
No need to do that, as DESeq skips such genes anyway.

Simon
Simon Anders is offline   Reply With Quote
Old 08-03-2011, 08:47 AM   #3
biofreak
Member
 
Location: USA

Join Date: Jun 2011
Posts: 44
Default

thanks a lot Simon. This information is tremendously helpful.
biofreak is offline   Reply With Quote
Old 08-03-2011, 09:56 AM   #4
biofreak
Member
 
Location: USA

Join Date: Jun 2011
Posts: 44
Default

Hi Simon,
I am using 1.4 version if DESeq. I tried filtering differentially expressed genes based on within group variance values from 'resVarA' and 'resVarB'. As you have mentioned, it is indeed difficult to determine a good cutoff level. I tried everything from very strict (smaller) cutoffs to the cutoff of even 20!
But every time you think that you could have kept some genes (or otherwise).
For example with cutoff of 20 (in either groupA or groupB), following are some of the
discarded genes: (6 samples)

Z-1 Z-2 Z-3 Z-4 Z-5 Z-6
703 531 7 0 0 1
158 328 2 4 0 1
213 284 1 2 3 5
186 240 4 6 6 2
382 194 143 2634 951 4336
229 119 105 2206 543 1837
1646 120 67 129 167 159
137 60 46 1147 321 990
144 64 35 1195 273 955
249 39 24 19 22 8
210 83 40 1418 303 952
46 63 28 417 178 704
49 42 23 381 156 598
498 831 42 247 115 88
261 329 19 82 57 29
72 29 26 471 137 506
279 341 24 70 55 70
101 51 34 364 182 850
345 389 37 84 78 95

My data is already sparse and I am worried that so many genes with good read counts are being discarded. (for good reasons ofcourse!)
Based on your experience, do you have any number which people find reasonable to use as a cut off? or is it very specific to the data?
thanks a lot in advance.

Last edited by biofreak; 08-03-2011 at 09:59 AM.
biofreak is offline   Reply With Quote
Old 08-03-2011, 11:30 AM   #5
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Worries like yours were what prompted us to re-address this whole issue and make the changes to DESeq mentioned above. Maybe you should give the devel version a try. It should be stable now (and you can manually install it on top of a current release Bioc installation.)
Simon Anders is offline   Reply With Quote
Old 07-02-2012, 09:14 AM   #6
aslihan
Member
 
Location: USA

Join Date: Jun 2011
Posts: 23
Default HELP Count Table for DESeq

cat Join_counts_trimmed.txt | head

1/2-SBSRNA4 13 12 12 48
A1BG 12 2 7 12
A1BG-AS1 8 3 12 12
A1CF 2 3 0 1
A2LD1 29 33 47 47
A2M 21 36 19 6
A2ML1 8 11 6 16
A2MP1 0 0 1 0
A4GALT 74 47 11 94
A4GNT 0 0 0 0

I would like to load into R for DESeq analysis. WHY IT doesnt show 5 columns?

> countsTable <- read.table("/countstable/Join_counts_trimmed.txt", sep='\t', header=TRUE, row.names=1)
> countsTable
data frame with 0 columns and 23227 rows


Then I tried differently

> exampleFile = system.file( "/countstable/Join_counts.txt", package="DESeq" )
> exampleFile
[1] ""
> countsTable <- read.delim( exampleFile, header=FALSE, stringsAsFactors=TRUE )

Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
In addition: Warning message:
In file(file, "rt") :
file("") only supports open = "w+" and open = "w+b": using the former


I dont know how to load this joined.txt file?

Thank you so much for any help
aslihan is offline   Reply With Quote
Old 07-02-2012, 10:02 AM   #7
aslihan
Member
 
Location: USA

Join Date: Jun 2011
Posts: 23
Default

I just opened the excel file and import counts_table.txt file and saved as csv file.


> countsTable <- read.table("/countstable/count_table_last.csv", sep=',', header=FALSE, row.names=1)
> head (countsTable) V2 V3 V4 V5
1/2-SBSRNA4 13 12 12 48
A1BG 12 2 7 12
A1BG-AS1 8 3 12 12
A1CF 2 3 0 1
A2LD1 29 33 47 47
A2M 21 36 19 6
aslihan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO