Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can I always get the net gain/loss of nts from an indel?

    Hi,

    I'm analyzing SNPs and Indels called from alignments. I always observed indels described as in the following example:
    Code:
    chr6    111368147   .   ATCT    ATCTTCT 184 .   INDEL;DP=119;AF1=1;CI95=1,1;DP4=0,0,45,58;MQ=20;FQ=-290 GT:PL     :GQ    1/1:225,255,0:99
    The wild-type sequence here is "ATCT" and the alternative sequence is "ATCTTCT". The net effect of this indel is the insertion of "TCT", although it's hard to say which "TCT" element, the first one or the second one, is inserted.

    Another example:
    Code:
    chrX    66765190    .   cagcagcagcagcagcagcagcagcagcagcagcagca  cagcagca    133 .   INDEL;DP=31;AF1=1;CI95=1,     1;DP4=0,0,4,17;MQ=20;FQ=-97.5  GT:PL:GQ    1/1:174,63,0:99
    The net effect here is the deletion of several 'cag' repeats. Again, maybe it's impossible to determine which 'cag' elements have been deleted exactly.

    It would be great to know the net gain/loss of nucleotides from each indel. The question is, can I always get such net effects (deletion or insertion) of the indels? Is it possible to get an indel composing of both deletion and insertion?

    Besides, can a SNP site be treated as a combination of one nt deletion and one nt insertion?

    Thanks in advance
    Shuli

  • #2
    You aren't going to be able to figure out which individual nucleotides are "new" and which are old. The question is meaningless. The mutation would have happened in one cell, (either in an ancestor gamete stem cell, or in one cancer cell, or one cell of the developing zygote, or in one resistant bacterium, etc) and you are looking at the DNA from many descendant cells, that have undergone many rounds of DNA replication since the mutation event. If I had to guess, I'd say that the issue is polymerase slippage, but is it more likely for the polymerase to slip on the TCT as opposed to an AGA? I don't know, and I doubt anyone else knows either.

    The net change can just be calculated from the variants. I'm not sure what summing that up over lots of variants would tell you.

    Why would you treat a SNP as an insertion and deletion, unless you had evidence that the strain you were looking at was a revertant of a previous indel? Indels are rarer than simple SNPs, so how would you distinguish which SNPs were caused by a base change, and which were caused by the lightening strike of two sequential indels at the same locus?

    Comment


    • #3
      Thanks, swbarnes2.

      These indels are called from the sequencing data of some clones. Each clone corresponds to one mRNA molecule and there is at most one clone for a locus in the library. What we are interested in is how the indels would influence the final protein products. So I hope I'm able to calculate the net change for each indel: If it's a loss/gain of 3,6,9.. nts, then the reading frame would be kept. Otherwise, the frame would be shifted and a premature protein would be generated in most of the cases.

      It would be great if I can split all the indels into two categories: deletions and insertion, and variance sties in each category could be further split into 'in-frame' and 'out-of-frame' ones. Although it's not necessary to distinguish the net change, say deletion or insertion, here, I think it would be easier for others to understand than just telling them they are indels.

      What I'm worried about is whether it's possible to get a complex indel. For example, the wild-type (reference) sequence is 'ACTG' and the alternative sequence observed is 'ACGGG'. In this case, the 'T' is deleted, and two 'G's are inserted. This indel could also be interpreted as a 'T'->'G' mutation and an insertion of a 'G'. So I'm curious whether we can make a decision which interpretation is better?

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X