Hey, there,
I used STAR and featurecounts on 20 mouse samples from RNAseq.
With many helps from the forums, now I got lists of DEGs.
I realized that there are quite some lincRNA pseudogenes in my DEG lists.
I was planing to exclude them from the following network, enrichment analysis, etc (using X2K package, and DAVID...any other suggestions?). I plan to deal with lincRNA separately from the rest protein coding genes and discard the pseudogenes. What do people usually do with these kind genes in the RNAseq field?
On the other hand, I do have lists of DEGs from BWA/DESeq2 analysis. These lists of DEG use RefSeq ID as the primary keys. I have used Biomart in Bioconductor to do the annotation to add MGI symbol and description. I found that with Biomart there are still some RefSeq IDs mapped to "NA" in the SYMBOL columns. I manually searched a particular RefSEqID with NA symbol in MGI and I did find a gene name associated with it. So it seems that Biomart/ENSEMBL is not very updated. How could I do more update on these missing symbols?
I was following the instruction of http://www.bioconductor.org/help/wor...Gene/#annotate to update the annotation in my DEG files. I used the code they designed:
convertIDs <- function( ids, from, to, db, ifMultiple=c("putNA", "useFirst")) {
stopifnot( inherits( db, "AnnotationDb" ) )
ifMultiple <- match.arg( ifMultiple )
suppressWarnings( selRes <- AnnotationDbi::select(
db, keys=ids, keytype=from, columns=c(from,to) ) )
if ( ifMultiple == "putNA" ) {
duplicatedIds <- selRes[ duplicated( selRes[,1] ), 1 ]
selRes <- selRes[ ! selRes[,1] %in% duplicatedIds, ]
}
return( selRes[ match( ids, selRes[,1] ), 2 ] )
}
I loaded my DEG csv file to deg_01 first. The id column has all the refseq ids.
Then
> deg_01$newMGI_symbol <- convertIDs("deg_01$id", "REFSEQ", "SYMBOL", org.Mm.eg.db)
Error in .testForValidKeys(x, keys, keytype) :
None of the keys entered are valid keys for 'REFSEQ'. Please use the keys method to see a listing of valid arguments.
I checked and the deg_01$id are all refseq ids.
> deg_01$id
[1] NM_026268 NM_025777 NM_013478 NR_015524 NM_001099297 NM_177610 NM_001122954
[8] NM_144834 NM_011110 NM_016668 NM_013820 NM_001145874 NM_001081060 NM_080457
Any suggestions? Thanks.
I used STAR and featurecounts on 20 mouse samples from RNAseq.
With many helps from the forums, now I got lists of DEGs.
I realized that there are quite some lincRNA pseudogenes in my DEG lists.
I was planing to exclude them from the following network, enrichment analysis, etc (using X2K package, and DAVID...any other suggestions?). I plan to deal with lincRNA separately from the rest protein coding genes and discard the pseudogenes. What do people usually do with these kind genes in the RNAseq field?
On the other hand, I do have lists of DEGs from BWA/DESeq2 analysis. These lists of DEG use RefSeq ID as the primary keys. I have used Biomart in Bioconductor to do the annotation to add MGI symbol and description. I found that with Biomart there are still some RefSeq IDs mapped to "NA" in the SYMBOL columns. I manually searched a particular RefSEqID with NA symbol in MGI and I did find a gene name associated with it. So it seems that Biomart/ENSEMBL is not very updated. How could I do more update on these missing symbols?
I was following the instruction of http://www.bioconductor.org/help/wor...Gene/#annotate to update the annotation in my DEG files. I used the code they designed:
convertIDs <- function( ids, from, to, db, ifMultiple=c("putNA", "useFirst")) {
stopifnot( inherits( db, "AnnotationDb" ) )
ifMultiple <- match.arg( ifMultiple )
suppressWarnings( selRes <- AnnotationDbi::select(
db, keys=ids, keytype=from, columns=c(from,to) ) )
if ( ifMultiple == "putNA" ) {
duplicatedIds <- selRes[ duplicated( selRes[,1] ), 1 ]
selRes <- selRes[ ! selRes[,1] %in% duplicatedIds, ]
}
return( selRes[ match( ids, selRes[,1] ), 2 ] )
}
I loaded my DEG csv file to deg_01 first. The id column has all the refseq ids.
Then
> deg_01$newMGI_symbol <- convertIDs("deg_01$id", "REFSEQ", "SYMBOL", org.Mm.eg.db)
Error in .testForValidKeys(x, keys, keytype) :
None of the keys entered are valid keys for 'REFSEQ'. Please use the keys method to see a listing of valid arguments.
I checked and the deg_01$id are all refseq ids.
> deg_01$id
[1] NM_026268 NM_025777 NM_013478 NR_015524 NM_001099297 NM_177610 NM_001122954
[8] NM_144834 NM_011110 NM_016668 NM_013820 NM_001145874 NM_001081060 NM_080457
Any suggestions? Thanks.
Comment