Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • IUPAC ambiguous bases in vcf file?

    Dear all,

    I have been variant calling from a reference that contains IUPAC ambiguous bases (such as K, W etc...), using samtools/bcftools.
    In the mpileup file, these special base characters are maintained in the reference column, while the read bases are one of the classical 4 bases, ACGT.
    However, in the vcf file, two things happen:

    1. at SNP positions, that coincide with the ambiguous base positions in the reference, the vcf file says "N" at the reference and "ACGT" (or a comma-separated combination of those) at the alternative field. This seems to agree with the vcf4.1 format specifications saying that the reference field may be only ACGTN.

    2. at INDEL positions, however, that include an ambiguous base position, these ambiguous bases are displayed in the reference field (and also inside the sequence of the alternative field, if the indel includes that position), such as:

    Code:
    Lg10    29366679        .       K       KCG,KG  49.5    PASS    INDEL;DP=19;VDB=0.0318;AF1=1;AC1=2;DP4=0,0,6,9;MQ=20;FQ=-58.5;MPB=U;   GT:PL:DP:SP:GQ  1/1:154,88,64,104,0,83:15:0:45
    Lg10    29832925        .       TTAWAKWTATA     TTA     98.5    mrd15   INDEL;DP=17;VDB=0.0404;AF1=1;AC1=2;DP4=0,0,4,6;MQ=20;FQ=-64.5;MPB=U     GT:PL:DP:SP:GQ  1/1:139,30,0:10:0:57
    How is this possible? Is the original reference.fasta read for the INDEL positions? Does the vcf4.1 restriction to ref=ACGTN not apply to INDEL positions?

    Any of your comments will be very much appreciated.

    cheers,
    Sophia

Latest Articles

Collapse

  • seqadmin
    Recent Innovations in Spatial Biology
    by seqadmin


    Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

    3D Genomics
    While spatial biology often involves studying proteins and RNAs in their...
    01-01-2025, 07:30 PM
  • seqadmin
    Advancing Precision Medicine for Rare Diseases in Children
    by seqadmin




    Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
    12-16-2024, 07:57 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 01-09-2025, 04:04 PM
0 responses
431 views
0 likes
Last Post seqadmin  
Started by seqadmin, 01-09-2025, 09:42 AM
0 responses
439 views
0 likes
Last Post seqadmin  
Started by seqadmin, 01-08-2025, 03:17 PM
0 responses
452 views
0 likes
Last Post seqadmin  
Started by seqadmin, 01-03-2025, 11:18 AM
1 response
50 views
1 like
Last Post Tonia
by Tonia
 
Working...
X