SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Sequence Bases in XSQ file. ccaceman RNA Sequencing 0 04-10-2012 11:49 AM
How to get list of column in vcf file using Vcf.pm? jessada Bioinformatics 0 01-20-2012 08:22 AM
why the vcf has so little high_quality bases mapping on the reference? wswill Bioinformatics 3 12-13-2011 11:05 AM
Converting Dindel VCF file to GATK BED file MolecularToast Bioinformatics 2 09-24-2011 07:38 PM
Unique bases from .gff file Khanjan Bioinformatics 2 12-01-2010 01:21 PM

Reply
 
Thread Tools
Old 07-04-2012, 02:47 AM   #1
sdvie
Member
 
Location: Spain

Join Date: Jul 2010
Posts: 68
Default IUPAC ambiguous bases in vcf file?

Dear all,

I have been variant calling from a reference that contains IUPAC ambiguous bases (such as K, W etc...), using samtools/bcftools.
In the mpileup file, these special base characters are maintained in the reference column, while the read bases are one of the classical 4 bases, ACGT.
However, in the vcf file, two things happen:

1. at SNP positions, that coincide with the ambiguous base positions in the reference, the vcf file says "N" at the reference and "ACGT" (or a comma-separated combination of those) at the alternative field. This seems to agree with the vcf4.1 format specifications saying that the reference field may be only ACGTN.

2. at INDEL positions, however, that include an ambiguous base position, these ambiguous bases are displayed in the reference field (and also inside the sequence of the alternative field, if the indel includes that position), such as:

Code:
Lg10    29366679        .       K       KCG,KG  49.5    PASS    INDEL;DP=19;VDB=0.0318;AF1=1;AC1=2;DP4=0,0,6,9;MQ=20;FQ=-58.5;MPB=U;   GT:PL:DP:SP:GQ  1/1:154,88,64,104,0,83:15:0:45
Lg10    29832925        .       TTAWAKWTATA     TTA     98.5    mrd15   INDEL;DP=17;VDB=0.0404;AF1=1;AC1=2;DP4=0,0,4,6;MQ=20;FQ=-64.5;MPB=U     GT:PL:DP:SP:GQ  1/1:139,30,0:10:0:57
How is this possible? Is the original reference.fasta read for the INDEL positions? Does the vcf4.1 restriction to ref=ACGTN not apply to INDEL positions?

Any of your comments will be very much appreciated.

cheers,
Sophia
sdvie is offline   Reply With Quote
Reply

Tags
indel, iupac, reference, vcf

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:33 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO