Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
loss of library DNA in the last purification, need help Anjani Sample Prep / Library Generation 3 06-27-2014 12:52 AM
.NET Bio 1.0 training course rickbe Events / Conferences 0 10-28-2011 09:28 AM
trimmed DNA loss SNP calling johnsequence Bioinformatics 0 10-26-2011 11:51 AM
Loss of tiles with v3 chemistry hoisinjl Illumina/Solexa 8 10-05-2011 06:26 AM
BFAST to nilshomer Bioinformatics 9 03-31-2011 10:28 AM

Thread Tools
Old 08-02-2011, 02:03 PM   #1
Location: Los Angeles

Join Date: Aug 2010
Posts: 41
Default Can I always get the net gain/loss of nts from an indel?


I'm analyzing SNPs and Indels called from alignments. I always observed indels described as in the following example:
chr6    111368147   .   ATCT    ATCTTCT 184 .   INDEL;DP=119;AF1=1;CI95=1,1;DP4=0,0,45,58;MQ=20;FQ=-290 GT:PL     :GQ    1/1:225,255,0:99
The wild-type sequence here is "ATCT" and the alternative sequence is "ATCTTCT". The net effect of this indel is the insertion of "TCT", although it's hard to say which "TCT" element, the first one or the second one, is inserted.

Another example:
chrX    66765190    .   cagcagcagcagcagcagcagcagcagcagcagcagca  cagcagca    133 .   INDEL;DP=31;AF1=1;CI95=1,     1;DP4=0,0,4,17;MQ=20;FQ=-97.5  GT:PL:GQ    1/1:174,63,0:99
The net effect here is the deletion of several 'cag' repeats. Again, maybe it's impossible to determine which 'cag' elements have been deleted exactly.

It would be great to know the net gain/loss of nucleotides from each indel. The question is, can I always get such net effects (deletion or insertion) of the indels? Is it possible to get an indel composing of both deletion and insertion?

Besides, can a SNP site be treated as a combination of one nt deletion and one nt insertion?

Thanks in advance
sulicon is offline   Reply With Quote
Old 08-02-2011, 03:37 PM   #2
Senior Member
Location: San Diego

Join Date: May 2008
Posts: 912

You aren't going to be able to figure out which individual nucleotides are "new" and which are old. The question is meaningless. The mutation would have happened in one cell, (either in an ancestor gamete stem cell, or in one cancer cell, or one cell of the developing zygote, or in one resistant bacterium, etc) and you are looking at the DNA from many descendant cells, that have undergone many rounds of DNA replication since the mutation event. If I had to guess, I'd say that the issue is polymerase slippage, but is it more likely for the polymerase to slip on the TCT as opposed to an AGA? I don't know, and I doubt anyone else knows either.

The net change can just be calculated from the variants. I'm not sure what summing that up over lots of variants would tell you.

Why would you treat a SNP as an insertion and deletion, unless you had evidence that the strain you were looking at was a revertant of a previous indel? Indels are rarer than simple SNPs, so how would you distinguish which SNPs were caused by a base change, and which were caused by the lightening strike of two sequential indels at the same locus?
swbarnes2 is offline   Reply With Quote
Old 08-02-2011, 04:14 PM   #3
Location: Los Angeles

Join Date: Aug 2010
Posts: 41

Thanks, swbarnes2.

These indels are called from the sequencing data of some clones. Each clone corresponds to one mRNA molecule and there is at most one clone for a locus in the library. What we are interested in is how the indels would influence the final protein products. So I hope I'm able to calculate the net change for each indel: If it's a loss/gain of 3,6,9.. nts, then the reading frame would be kept. Otherwise, the frame would be shifted and a premature protein would be generated in most of the cases.

It would be great if I can split all the indels into two categories: deletions and insertion, and variance sties in each category could be further split into 'in-frame' and 'out-of-frame' ones. Although it's not necessary to distinguish the net change, say deletion or insertion, here, I think it would be easier for others to understand than just telling them they are indels.

What I'm worried about is whether it's possible to get a complex indel. For example, the wild-type (reference) sequence is 'ACTG' and the alternative sequence observed is 'ACGGG'. In this case, the 'T' is deleted, and two 'G's are inserted. This indel could also be interpreted as a 'T'->'G' mutation and an insertion of a 'G'. So I'm curious whether we can make a decision which interpretation is better?
sulicon is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 07:13 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO