Hi all,
I am new in RNA seq and in R where I have only some solid experience. I face some problems to prepare the reads count matrix file for importing it in edgeR. I have used coverageBed to convert the 6 bam files corresponding to 2 conditions, 3 replicates each, to the respective coverage files (find attached an example of one of this cov file, named test.txt).
I have to sum up the reads counts which are referring to exons of the same gene (for example at the atatched file in case of the 2 exons of the gene_id "FusR_00001, lines 3 and 5, I have to sum up their read counts, in that case 43+5 = 48, column j).
I am trying to execute the following R code but when I execute it line by line in rstudio the error (Error in data.frame(NA_integer_, NA_integer_, NA_integer_, NA_integer_, : row names contain missing values)
is coming after I run line 28 before I run the following line code
colnames(dat_tab) <- unlist((strsplit(fnames,"/")))[seq(1,41,2)]
### R code###
fnames<- system("ls *.cov",intern=T)
count_list<- list()
for(i in 1:length(fnames))
{ print(i)
tt<- read.table(fnames[i],sep="\t",as.is=T)
tt_e <- tt[tt[,3]=="exon",]
gids<-apply(tt_e,1,function(x){strsplit(x[9],";")[[1]][5]})
gids2<-apply(tt_e,1,function(x){strsplit(x[9],";")[[1]][4]})
gids[is.na(gids)] <- gids2[is.na(gids)]
counts <- c()
for(j in unique(gids))
{
counts<- c(counts,sum(tt_e[gids==j,10]))
}
names(counts) <- unique(gids)
count_list[[i]] <- counts
}
un_names <- unique(unlist(lapply(count_list,names)))
dat_tab <- as.data.frame(lapply(count_list, function(x) {
x[un_names]
}
)
)
###Error in data.frame(NA_integer_, NA_integer_, NA_integer_, NA_integer_, : row names contain missing values###
colnames(dat_tab) <- unlist((strsplit(fnames,"/")))[seq(1,41,2)]
dat_sum <- cbind( rowSums(dat_tab[,1:2]), rowSums(dat_tab[,3:4]), rowSums(dat_tab[,5:6]), rowSums(dat_tab[,7:8]), rowSums(dat_tab[,9:10]), rowSums(dat_tab[,11:12]), rowSums(dat_tab[,13:14]), rowSums(dat_tab[,15:16]), rowSums(dat_tab[,17:18]), dat_tab[,19:21])
colnames(dat_sum) <- colnames(dat_tab)[c(seq(1,18,2),19:21)]
dat_sum <- as.matrix(dat_sum)
###end###
Since i am new in R I am struggled myself to find where the problem lies and I think that I have rows with 0 values at column j that provoke this error. Please any help to overpass this issue????
Thanks in advance
I am new in RNA seq and in R where I have only some solid experience. I face some problems to prepare the reads count matrix file for importing it in edgeR. I have used coverageBed to convert the 6 bam files corresponding to 2 conditions, 3 replicates each, to the respective coverage files (find attached an example of one of this cov file, named test.txt).
I have to sum up the reads counts which are referring to exons of the same gene (for example at the atatched file in case of the 2 exons of the gene_id "FusR_00001, lines 3 and 5, I have to sum up their read counts, in that case 43+5 = 48, column j).
I am trying to execute the following R code but when I execute it line by line in rstudio the error (Error in data.frame(NA_integer_, NA_integer_, NA_integer_, NA_integer_, : row names contain missing values)
is coming after I run line 28 before I run the following line code
colnames(dat_tab) <- unlist((strsplit(fnames,"/")))[seq(1,41,2)]
### R code###
fnames<- system("ls *.cov",intern=T)
count_list<- list()
for(i in 1:length(fnames))
{ print(i)
tt<- read.table(fnames[i],sep="\t",as.is=T)
tt_e <- tt[tt[,3]=="exon",]
gids<-apply(tt_e,1,function(x){strsplit(x[9],";")[[1]][5]})
gids2<-apply(tt_e,1,function(x){strsplit(x[9],";")[[1]][4]})
gids[is.na(gids)] <- gids2[is.na(gids)]
counts <- c()
for(j in unique(gids))
{
counts<- c(counts,sum(tt_e[gids==j,10]))
}
names(counts) <- unique(gids)
count_list[[i]] <- counts
}
un_names <- unique(unlist(lapply(count_list,names)))
dat_tab <- as.data.frame(lapply(count_list, function(x) {
x[un_names]
}
)
)
###Error in data.frame(NA_integer_, NA_integer_, NA_integer_, NA_integer_, : row names contain missing values###
colnames(dat_tab) <- unlist((strsplit(fnames,"/")))[seq(1,41,2)]
dat_sum <- cbind( rowSums(dat_tab[,1:2]), rowSums(dat_tab[,3:4]), rowSums(dat_tab[,5:6]), rowSums(dat_tab[,7:8]), rowSums(dat_tab[,9:10]), rowSums(dat_tab[,11:12]), rowSums(dat_tab[,13:14]), rowSums(dat_tab[,15:16]), rowSums(dat_tab[,17:18]), dat_tab[,19:21])
colnames(dat_sum) <- colnames(dat_tab)[c(seq(1,18,2),19:21)]
dat_sum <- as.matrix(dat_sum)
###end###
Since i am new in R I am struggled myself to find where the problem lies and I think that I have rows with 0 values at column j that provoke this error. Please any help to overpass this issue????
Thanks in advance
Comment