View Single Post
Old 05-11-2015, 05:36 PM   #1
bjackson
Junior Member
 
Location: Denver, CO

Join Date: May 2015
Posts: 6
Default CummeRbund shows 95% NAs for gene name lists

Code:
cuff_data = readCufflinks(dir=paste0(getwd(),"/../cuffdiff/"), 
                          gtfFile = paste0(getwd(),"/../cuffmerge/merged.gtf"), 
                          genome="hg19", rebuild=F)

#get significant IDs
diffGeneIDs = getSig(cuff_data, level="genes", alpha=0.05)

diffGenes = getGenes(cuff_data, diffGeneIDs)

featureNames(diffGenes)[1:40,]
tracking_id gene_short_name
1 XLOC_000265 CROCC
2 XLOC_000328 ALPL
3 XLOC_000443 SYTL1
4 XLOC_000679 ARTN
5 XLOC_000938 RNU6-387P
6 XLOC_001003 FAM73A,RNA5SP21
7 XLOC_001244 GSTM1,GSTM2
8 XLOC_001530 ADAMTSL4,AL356356.1,MIR4257
9 XLOC_001735 FCER1A
10 XLOC_001988 AXDND1
11 XLOC_002089 RGS1
12 XLOC_002211 NFASC,RP11-494K3.2
13 XLOC_002282 HHAT,KCNH1
14 XLOC_002522 RYR2
15 XLOC_002961 PLA2G2A
16 XLOC_003294 SLC2A1
17 XLOC_003772 MIR137HG
18 XLOC_004252 ASH1L,ASH1L-IT1,MIR555
19 XLOC_004467 RP1-117P20.3,SELE
20 XLOC_004799 CR1L
21 XLOC_004802 CD34
22 XLOC_004843 <NA>
23 XLOC_005255 <NA>
24 XLOC_005256 <NA>
25 XLOC_005257 <NA>
26 XLOC_005263 <NA>
27 XLOC_005264 <NA>
28 XLOC_005265 <NA>
29 XLOC_005273 <NA>
30 XLOC_005276 <NA>
31 XLOC_005277 <NA>
32 XLOC_005288 <NA>
33 XLOC_005291 <NA>
34 XLOC_005292 <NA>
35 XLOC_005293 <NA>
36 XLOC_005294 <NA>
37 XLOC_005322 <NA>
38 XLOC_005331 <NA>
39 XLOC_005332 <NA>
40 XLOC_005333 <NA>
41 XLOC_005336 <NA>

Code:
mean(is.na(as.vector(featureNames(diffGenes)[2])))
[1] 0.9691279

This is human data and I have used the same GTF / genome data throughout, so why are so few genes labeled? Only about 3% of the total 6284 differentially expressed genes have labels.
I have followed the protocol in the nature paper (tophat -> cufflinks -> cuffmerge -> cuffdiff)

Thanks

Last edited by bjackson; 05-11-2015 at 05:55 PM.
bjackson is offline   Reply With Quote