Seqanswers Leaderboard Ad

**frymor** · 07-31-2011, 05:56 AM

Hi,

I have the same problem.

I am using a small subset of my complete bam files. I have two samples, no replicates.

when running the script according to the DESeq manual I get a long list of genes with no results.
I took a sub set of this list, where at least one of the gene counts in not '0'.

Code:

geneCounts <- cbind(as.data.frame(mutGeneCounts), as.data.frame(wtGeneCounts))
temp <- which(geneCounts$mutGeneCounts != 0  | geneCounts$wtGeneCounts !=0)
geneCounts_small <- geneCounts[temp,]
conds <- c(rep("mut", 1), rep("wt", 1)) # conditions

cds <- newCountDataSet( geneCounts_small, conds )

res <- nbinomTest(cds, "mut", "wt") # performing the deregulation test

plot(res$baseMean, res$log2FoldChange, log="x", pch=20, cex=.1, 
     col=ifelse(res$padj<=0.7, "red", "blue"))

resSig <- res[ res_small$padj < .1, ]

When I am looking for the most DE genes, I get this table:

Code:

             id  baseMean baseMeanA baseMeanB foldChange log2FoldChange         pval         padj resVarA resVarB
369 FBgn0031637 1037.5715         0 2075.1430        Inf            Inf 9.848705e-14 1.176920e-11      NA      NA
89  FBgn0022153  903.7494         0 1807.4988        Inf            Inf 3.820621e-13 3.652513e-11      NA      NA
4   FBgn0000286  833.5378         0 1667.0756        Inf            Inf 8.324344e-13 6.631728e-11      NA      NA
365 FBgn0031633  439.2726         0  878.5452        Inf            Inf 3.161212e-10 1.888824e-08      NA      NA
373 FBgn0031643  323.4535         0  646.9069        Inf            Inf 5.330071e-09 2.547774e-07      NA      NA
371 FBgn0031639  308.4510         0  616.9020        Inf            Inf 8.562510e-09 3.720800e-07      NA      NA

1. I understand that the last two are NA because I don't have replicates, but what about the inf values? How do I interprate these values?

2. For this table I get inf or -inf when searching for the most strongly up- and down-regulated genes respectively.

Can it be because I am using only a small portion of the complete data set?
I do get a lot of genes with '0'.

I would appreciate any suggestions

Thanks
Assa

**Simon Anders** · 08-04-2011, 07:59 AM

Originally posted by silverlining View Post

1. what exactly my "id" refers to in the output - ie. transcripts/exons/refgenes?

This is the row number in the count table that you used to create the CountDataSet. If you count table had row names, you would see these instead.

2. how to obtain corresponding refgene for the output so I can look to see if there are changes in genes of interest

Before you call 'makeCountDataSet', set the rownames to the IDs of your genes. The example in the vignette assumes that the file with your counts had the gene IDs in the first columns.

By the way, make sure you count for genes, not for transcript. As RefSeq, IIRC, assigns different IDs to the different transcripts of a gene, your annotation might not be suitable.

Simon

**Simon Anders** · 08-04-2011, 08:02 AM

Originally posted by frymor View Post

1. I understand that the last two are NA because I don't have replicates, but what about the inf values? How do I interprate these values?

Have a look at the raw counts for these genes. It seems you have several thousand counts in sample B and zero counts in sample A. Hence, you have an extreme upregulation. (It is not really 'infinite', of course, but this is what you get from a division by zero.)

It seems suspicious to me that you have so many zeroes. Maybe you made a mistake subsampling your SAM file. (Why did you subsample at all?)

S

**silverlining** · 08-04-2011, 02:41 PM

Originally posted by Simon Anders View Post

Maybe you made a mistake subsampling your SAM file. (Why did you subsample at all?)

S

Thanks for the input. Here is the txt file with counts that I read in -

head(ch3counts)
PBS x5.nm x50.nm x500.nm
100038246 0 2 0 2
10006 1783 2909 2376 1456
100126326 0 0 0 0
100126327 0 0 0 0
100127889 8 13 17 4
100128054 10 22 7 6

the row IDs are the UCSC IDs - the data is not subsampled - it is for all chromosomes. I am not sure why -

1. why these IDs disappear after DESeq analysis?
2. why I get the Inf values - not all are 0 for sure but some are

Again, to remind you, while I await replicates this analysis was done without any replicates.

Thanks!

**silverlining** · 08-04-2011, 02:50 PM

Originally posted by Simon Anders View Post

As RefSeq, IIRC, assigns different IDs to the different transcripts of a gene, your annotation might not be suitable.

Simon

What do you mean by that? To be honest, I am unsure as to whether I have gene or transcript IDs in my analysis right now but am working on finding out. The basic script I used for counts was -

txdb <- makeTranscriptDbFromUCSC(genome = "hg19", tablename = "refGene")
exonRangesList <- exonsBy(txdb, "gene")

Thanks!

**silverlining** · 08-04-2011, 02:55 PM

Finally, this is how a larger set of my data looks - not all are Inf. There are 256 supposedly differentially expressed and the majority are not Inf.

id baseMean baseMeanA baseMeanB foldChange log2FoldChange
22410 653048 1820.96844 0.000000 3641.93688 Inf Inf
22411 653067 1820.96844 0.000000 3641.93688 Inf Inf
22412 653219 1820.96844 0.000000 3641.93688 Inf Inf
22413 653220 1820.96844 0.000000 3641.93688 Inf Inf
22638 9503 1820.96844 0.000000 3641.93688 Inf Inf
22554 8277 1587.34635 2.304498 3172.38821 1376.6068354 10.4269009
1434 283120 2777.58744 118.681666 5436.49321 45.8073550 5.5175074
21708 100132399 1687.38186 21.892735 3352.87098 153.1499401 7.2588010
22485 729422 1682.97984 21.892735 3344.06694 152.7477958 7.2550078
22487 729431 1682.97984 21.892735 3344.06694 152.7477958 7.2550078
22398 645073 1407.01651 17.283738 2796.74927 161.8139148 7.3381919
22489 729447 1192.57795 8.065744 2377.09015 294.7142954 8.2031732
21977 2574 1097.04333 9.217993 2184.86866 237.0221529 7.8888781
21996 26749 1086.03828 9.217993 2162.85857 234.6344213 7.8742709
21676 100101629 1069.16388 9.217993 2129.10976 230.9732329 7.8515819
22488 729442 1092.58725 14.979239 2170.19527 144.8802042 7.1787167
2873 3855 2276.15738 185.512119 4366.80264 23.5391772 4.5569920
22484 729408 853.78004 6.913495 1700.64659 245.9894114 7.9424524
21995 26748 827.05284 9.217993 1644.88769 178.4431386 7.4793206
22486 729428 805.61887 10.370243 1600.86750 154.3712671 7.2702604
21674 100008586 689.12292 9.217993 1369.02785 148.5169030 7.2144833
21981 2578 588.92525 6.913495 1170.93700 169.3697587 7.4040325
22147 4103 435.64232 1.152249 870.13238 755.1599014 9.5606384
21969 2543 480.49966 5.761246 955.23808 165.8040795 7.3733357
21805 114824 494.28184 6.913495 981.65019 141.9904368 7.1496500
14024 7345 318.41269 0.000000 636.82539 Inf Inf

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Re-extracting refgene names after DESeq Analysis

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News