Unconfigured Ad

**dpryan** · 07-30-2015, 09:15 AM

If you ask people to help you with an error message then you need to post the error message. I see a number of problems with your code, from using non-existent variables ("wtf") to incorrect syntax ("wt$[j]").

**chudar** · 07-31-2015, 02:15 AM

Originally posted by dpryan View Post

If you ask people to help you with an error message then you need to post the error message. I see a number of problems with your code, from using non-existent variables ("wtf") to incorrect syntax ("wt$[j]").

Dear DpRyan,

Thank you very much for your concern. I have updated my R script and have edited my question also. When I ran my updated script it expected it give number of mismatches but it gave me only 0 for all my sequences. I dont whether my comparison code is corrrect. Kindly take a look and guide me please. Thanks in advance

**dpryan** · 07-31-2015, 03:38 AM

Grr, the site ate my initial reply!

I assume you meant to write this, since otherwise you're trying to compare a character and a list (wt[i] == "m"), which will always be false and thus yield counts of 0:

Code:

for (i in 1:length(names(wt)))
{ 
  MT_count=0
  IH_count=0
  #for(j in 1:length(wt$seq1))
  for(j in 1:length(wt[[i]]))
  {  
    if(wt[[i]][j]=="m" && mt[[i]][j]=="t" )
    {
      MT_count=MT_count+1
    }
  else if(wt[[i]][j]=="i" && mt[[i]][j]=="h" )
    {
      IH_count=IH_count+1
    }
  }
  print(names(wt[i]))
  print(MT_count)
  print(IH_count)
}

This could be done more succinctly (and probably with better performance) with:

Code:

for (i in 1:length(names(wt))) {
    print(names(wt)[i])
    print(length(intersect(which(wt[[i]] == "m"), which(mt[[i]] == "t"))))
    print(length(intersect(which(wt[[i]] == "i"), which(mt[[i]] == "h"))))
}

**chudar** · 08-03-2015, 02:15 AM

Hi Dpryan,

Thank you very much for code. it works fine. I would like to output the data into a data frame so that it looks like

Code:

ID     MT  IH
seq1   2    5
seq2   4    7
seq3   6    9

So I edited the code like below where I created an data frame with NA and used it inside the for loop

Code:

mismatch=function(wt,mt)
{
df=data.frame(ID=NA,MT=NA,IH=NA)
for (i in 1:length(names(wt))) {
  df$ID[i]=names(wt)[i]
  df$MT[i]= length(intersect(which(wt[[i]] == "m"), which(mt[[i]] == "t")))
  df$IH[i]=length(intersect(which(wt[[i]] == "i"), which(mt[[i]] == "h")))
}
 return (df)
}

but it gives me an error as follows

Code:

Error in `$<-.data.frame`(`*tmp*`, "ID", value = c("seq1", "seq2")) : 
  replacement has 2 rows, data has 1

I have no clue how am I ending up like this.

**dpryan** · 08-03-2015, 02:25 AM

Dataframes need to have identical length rows, so you can't add something to ID without also adding values to MT and IH. The actual R way to do this would be something like:

Code:

library(seqinr)
wt=read.fasta("C:/Users/tsekaran/Documents/sample_ref_protein.fasta")
mt=read.fasta("C:/Users/tsekaran/Documents/sample_mut_protein.fasta")

getDiffs <- function(wt, mt) {
    c(names(wt),
    length(intersect(which(wt == "m"), which(mt == "t"))),
    length(intersect(which(wt == "i"), which(mt == "h"))))
}

res <- as.data.frame(t(mapply(getDiffs, wt, mt)))
names(res) <- c("MT","IT")

**dpryan** · 08-03-2015, 03:01 AM

BTW, you could also just use rbind() if you want to keep the for loop.

Topics	Statistics	Last Post
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, Yesterday, 05:37 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Yesterday, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 51 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 110 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM

Unconfigured Ad

Finding mismatches between two sequences using R

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News