Does anybody know any string compression technique (in R or Perl)?
In particular we usually would like to compress tag of length 35 above,
to store them in data structure for further processing. Note also
that we are talking about ~10million tags to process.
I have such implementation in R to convert tag to numerical value.
But it get overflow error when handling tag of length > 30
In particular we usually would like to compress tag of length 35 above,
to store them in data structure for further processing. Note also
that we are talking about ~10million tags to process.
I have such implementation in R to convert tag to numerical value.
But it get overflow error when handling tag of length > 30
Code:
tagsequence2tagnum <- function (tags, length) { new.tags <- tolower(unlist(strsplit(as.character(tags), ""))) new.tags[!(new.tags == "a" | new.tags == "g" | new.tags == "c" | new.tags == "t" | new.tags == "s" | new.tags == "y" | new.tags == "b" | new.tags == "k")] <- "n" new.tags <- matrix(as.numeric(chartr("acgtnsybk", "012301112", new.tags)), nrow = length) colSums(new.tags * 4^((length - 1):0)) + 1 } tagnum2tagsequence <- function (tags, length) { new.tags <- t(matrix((rep(tags - 1, each = length)%/%4^((length - 1):0))%%4, nrow = length)) new.tags <- apply(new.tags, 1, paste, collapse = "") chartr("0123", "acgt", new.tags) }
Comment