SEQanswers Can I always get the net gain/loss of nts from an indel?
 User Name Remember Me? Password
 Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 Similar Threads Thread Thread Starter Forum Replies Last Post Anjani Sample Prep / Library Generation 3 06-26-2014 11:52 PM rickbe Events / Conferences 0 10-28-2011 08:28 AM johnsequence Bioinformatics 0 10-26-2011 10:51 AM hoisinjl Illumina/Solexa 8 10-05-2011 05:26 AM nilshomer Bioinformatics 9 03-31-2011 09:28 AM

 08-02-2011, 01:03 PM #1 sulicon Member   Location: Los Angeles Join Date: Aug 2010 Posts: 41 Can I always get the net gain/loss of nts from an indel? Hi, I'm analyzing SNPs and Indels called from alignments. I always observed indels described as in the following example: Code: `chr6 111368147 . ATCT ATCTTCT 184 . INDEL;DP=119;AF1=1;CI95=1,1;DP4=0,0,45,58;MQ=20;FQ=-290 GT:PL :GQ 1/1:225,255,0:99` The wild-type sequence here is "ATCT" and the alternative sequence is "ATCTTCT". The net effect of this indel is the insertion of "TCT", although it's hard to say which "TCT" element, the first one or the second one, is inserted. Another example: Code: `chrX 66765190 . cagcagcagcagcagcagcagcagcagcagcagcagca cagcagca 133 . INDEL;DP=31;AF1=1;CI95=1, 1;DP4=0,0,4,17;MQ=20;FQ=-97.5 GT:PL:GQ 1/1:174,63,0:99` The net effect here is the deletion of several 'cag' repeats. Again, maybe it's impossible to determine which 'cag' elements have been deleted exactly. It would be great to know the net gain/loss of nucleotides from each indel. The question is, can I always get such net effects (deletion or insertion) of the indels? Is it possible to get an indel composing of both deletion and insertion? Besides, can a SNP site be treated as a combination of one nt deletion and one nt insertion? Thanks in advance Shuli
 08-02-2011, 02:37 PM #2 swbarnes2 Senior Member   Location: San Diego Join Date: May 2008 Posts: 912 You aren't going to be able to figure out which individual nucleotides are "new" and which are old. The question is meaningless. The mutation would have happened in one cell, (either in an ancestor gamete stem cell, or in one cancer cell, or one cell of the developing zygote, or in one resistant bacterium, etc) and you are looking at the DNA from many descendant cells, that have undergone many rounds of DNA replication since the mutation event. If I had to guess, I'd say that the issue is polymerase slippage, but is it more likely for the polymerase to slip on the TCT as opposed to an AGA? I don't know, and I doubt anyone else knows either. The net change can just be calculated from the variants. I'm not sure what summing that up over lots of variants would tell you. Why would you treat a SNP as an insertion and deletion, unless you had evidence that the strain you were looking at was a revertant of a previous indel? Indels are rarer than simple SNPs, so how would you distinguish which SNPs were caused by a base change, and which were caused by the lightening strike of two sequential indels at the same locus?
 08-02-2011, 03:14 PM #3 sulicon Member   Location: Los Angeles Join Date: Aug 2010 Posts: 41 Thanks, swbarnes2. These indels are called from the sequencing data of some clones. Each clone corresponds to one mRNA molecule and there is at most one clone for a locus in the library. What we are interested in is how the indels would influence the final protein products. So I hope I'm able to calculate the net change for each indel: If it's a loss/gain of 3,6,9.. nts, then the reading frame would be kept. Otherwise, the frame would be shifted and a premature protein would be generated in most of the cases. It would be great if I can split all the indels into two categories: deletions and insertion, and variance sties in each category could be further split into 'in-frame' and 'out-of-frame' ones. Although it's not necessary to distinguish the net change, say deletion or insertion, here, I think it would be easier for others to understand than just telling them they are indels. What I'm worried about is whether it's possible to get a complex indel. For example, the wild-type (reference) sequence is 'ACTG' and the alternative sequence observed is 'ACGGG'. In this case, the 'T' is deleted, and two 'G's are inserted. This indel could also be interpreted as a 'T'->'G' mutation and an insertion of a 'G'. So I'm curious whether we can make a decision which interpretation is better?