Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Can I always get the net gain/loss of nts from an indel?

    Hi,

    I'm analyzing SNPs and Indels called from alignments. I always observed indels described as in the following example:
    Code:
    chr6    111368147   .   ATCT    ATCTTCT 184 .   INDEL;DP=119;AF1=1;CI95=1,1;DP4=0,0,45,58;MQ=20;FQ=-290 GT:PL     :GQ    1/1:225,255,0:99
    The wild-type sequence here is "ATCT" and the alternative sequence is "ATCTTCT". The net effect of this indel is the insertion of "TCT", although it's hard to say which "TCT" element, the first one or the second one, is inserted.

    Another example:
    Code:
    chrX    66765190    .   cagcagcagcagcagcagcagcagcagcagcagcagca  cagcagca    133 .   INDEL;DP=31;AF1=1;CI95=1,     1;DP4=0,0,4,17;MQ=20;FQ=-97.5  GT:PL:GQ    1/1:174,63,0:99
    The net effect here is the deletion of several 'cag' repeats. Again, maybe it's impossible to determine which 'cag' elements have been deleted exactly.

    It would be great to know the net gain/loss of nucleotides from each indel. The question is, can I always get such net effects (deletion or insertion) of the indels? Is it possible to get an indel composing of both deletion and insertion?

    Besides, can a SNP site be treated as a combination of one nt deletion and one nt insertion?

    Thanks in advance
    Shuli

  • #2
    You aren't going to be able to figure out which individual nucleotides are "new" and which are old. The question is meaningless. The mutation would have happened in one cell, (either in an ancestor gamete stem cell, or in one cancer cell, or one cell of the developing zygote, or in one resistant bacterium, etc) and you are looking at the DNA from many descendant cells, that have undergone many rounds of DNA replication since the mutation event. If I had to guess, I'd say that the issue is polymerase slippage, but is it more likely for the polymerase to slip on the TCT as opposed to an AGA? I don't know, and I doubt anyone else knows either.

    The net change can just be calculated from the variants. I'm not sure what summing that up over lots of variants would tell you.

    Why would you treat a SNP as an insertion and deletion, unless you had evidence that the strain you were looking at was a revertant of a previous indel? Indels are rarer than simple SNPs, so how would you distinguish which SNPs were caused by a base change, and which were caused by the lightening strike of two sequential indels at the same locus?

    Comment


    • #3
      Thanks, swbarnes2.

      These indels are called from the sequencing data of some clones. Each clone corresponds to one mRNA molecule and there is at most one clone for a locus in the library. What we are interested in is how the indels would influence the final protein products. So I hope I'm able to calculate the net change for each indel: If it's a loss/gain of 3,6,9.. nts, then the reading frame would be kept. Otherwise, the frame would be shifted and a premature protein would be generated in most of the cases.

      It would be great if I can split all the indels into two categories: deletions and insertion, and variance sties in each category could be further split into 'in-frame' and 'out-of-frame' ones. Although it's not necessary to distinguish the net change, say deletion or insertion, here, I think it would be easier for others to understand than just telling them they are indels.

      What I'm worried about is whether it's possible to get a complex indel. For example, the wild-type (reference) sequence is 'ACTG' and the alternative sequence observed is 'ACGGG'. In this case, the 'T' is deleted, and two 'G's are inserted. This indel could also be interpreted as a 'T'->'G' mutation and an insertion of a 'G'. So I'm curious whether we can make a decision which interpretation is better?

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 08:47 AM
      0 responses
      11 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      59 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Working...
      X