Dear all,
I am having problems annotating results from a table of counts obtained using the following R-script code:
library(Rsamtools)
library(GenomicFeatures)
bamlist=list()
src_files=list.files(pattern="*.bam")
for(filename in src_files)
{
tmp=readBamGappedAlignments(filename)
bamlist[[length(bamlist)+1]]=GRanges(seqnames=rname(tmp),
ranges=IRanges(start=start(tmp),end=end(tmp)),
strand=rep("*",length(tmp)))
}
names(bamlist)=src_files
#--------------------Fetching gene information
txdb=makeTranscriptDbFromUCSC(genome='mm9',tablename='knownGene')
#--------------------GRangesList object
tx_by_gene=transcriptsBy(txdb,'gene')
#--------------------include reads that overlap exons
ex_by_gene=exonsBy(txdb,'gene')
#------------------Finally, we get the number of reads that overlap with each gene
#Initialize table of counts
toc=data.frame(rep(NA,length(tx_by_gene)))
for(i in 1:length(bamlist))
{
toc[,i]=countOverlaps(tx_by_gene,bamlist[[i]])
}
#Fix up labels
rownames(toc)=names(tx_by_gene)
print(ncol(toc))
print(length(bamlist))
colnames(toc)=names(bamlist)
dim(toc)
head(toc)
write.table(toc, file="mm9_ToC.tsv",quote=FALSE,sep="\t")
The thing is that using this reference genome:
genome='mm9',tablename='knownGene'
The resulting table of counts shows up this IDs format:
IDs s1.bam s2.bam
100009600 121 83
100009609 0 3
100009614 0 0
100012 0 0
100017 121 79
100019 37 24
I can't explain myself which kind of identifier is this and how to convert it to other format that may allow me to infer the Gene Function related to the Gene Id...
May there be any problem with the code so that the Table of Counts isn't
generated correctly?
I'll appreciate any help and/or comments, because it's been a real headache trying to correlate such Ids to any G.O.
I am having problems annotating results from a table of counts obtained using the following R-script code:
library(Rsamtools)
library(GenomicFeatures)
bamlist=list()
src_files=list.files(pattern="*.bam")
for(filename in src_files)
{
tmp=readBamGappedAlignments(filename)
bamlist[[length(bamlist)+1]]=GRanges(seqnames=rname(tmp),
ranges=IRanges(start=start(tmp),end=end(tmp)),
strand=rep("*",length(tmp)))
}
names(bamlist)=src_files
#--------------------Fetching gene information
txdb=makeTranscriptDbFromUCSC(genome='mm9',tablename='knownGene')
#--------------------GRangesList object
tx_by_gene=transcriptsBy(txdb,'gene')
#--------------------include reads that overlap exons
ex_by_gene=exonsBy(txdb,'gene')
#------------------Finally, we get the number of reads that overlap with each gene
#Initialize table of counts
toc=data.frame(rep(NA,length(tx_by_gene)))
for(i in 1:length(bamlist))
{
toc[,i]=countOverlaps(tx_by_gene,bamlist[[i]])
}
#Fix up labels
rownames(toc)=names(tx_by_gene)
print(ncol(toc))
print(length(bamlist))
colnames(toc)=names(bamlist)
dim(toc)
head(toc)
write.table(toc, file="mm9_ToC.tsv",quote=FALSE,sep="\t")
The thing is that using this reference genome:
genome='mm9',tablename='knownGene'
The resulting table of counts shows up this IDs format:
IDs s1.bam s2.bam
100009600 121 83
100009609 0 3
100009614 0 0
100012 0 0
100017 121 79
100019 37 24
I can't explain myself which kind of identifier is this and how to convert it to other format that may allow me to infer the Gene Function related to the Gene Id...
May there be any problem with the code so that the Table of Counts isn't
generated correctly?
I'll appreciate any help and/or comments, because it's been a real headache trying to correlate such Ids to any G.O.
Comment