SEQanswers

Go Back   SEQanswers > Applications Forums > Genomic Resequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Strelka: Somatic small-variant calling workflow for matched tumor-normal samples ctsa Bioinformatics 15 12-15-2014 01:38 AM
Re-adding all reads in Consed hnbc Bioinformatics 0 06-19-2013 09:22 AM
Merging SNPs and indels VCF from strelka for ANNOVAR annotation lbeltrame Bioinformatics 0 12-11-2012 12:37 AM
how do I output the CS tag for BWA align of SOLID reads? KevinLam Bioinformatics 16 07-23-2011 10:06 PM
output format content & tag mayhem alperyilmaz Wiki Discussion 1 05-02-2010 02:31 PM

Reply
 
Thread Tools
Old 07-08-2014, 07:36 AM   #1
erichpeterson
Junior Member
 
Location: Little Rock

Join Date: Nov 2012
Posts: 4
Default Adding GT tag to Strelka Output

Does anyone know of a way to add the GT tag to the VCF output of Strelka? There is a SGT tag, described as "Most likely somatic genotype excluding normal noise states," but it is not in the typical GT format. Maybe there is a program that will convert it to GT format?

Thanks,
Erich
erichpeterson is offline   Reply With Quote
Old 08-09-2018, 08:33 AM   #2
GobiJerboa
Junior Member
 
Location: California

Join Date: Aug 2013
Posts: 1
Default

I came across this issue while trying to run Annovar on Strelka2 output.
Annovar throws an error without the GT tag so I did the following at the command line to add in GT information:

Set filenames
Code:
strelka_output_file="somatic.indels.passed.vcf"
strelka_mod="somatic.indels.passed.GTmod.vcf"
Add GT FORMAT in VCF header
Find the first ##FORMAT line in header
grep for the line "n"umber and only 1 "m"atch
sed with a leading "number"i will insert into the file at the specified line. e.g. Leading 8i will insert at 8th line.
Code:
first_format_num=$(grep -n -m 1 '##FORMAT' "$strelka_output_file" | cut -d : -f 1)
sed "$first_format_num"'i##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">' "$strelka_output_file" > "$strelka_mod"
Use sed with extended "r"egular expression support to edit "i"nplace
All lines of my strelka output have the format string BCN50:DP:DP2:DP50:FDP50:SUBDP50:TAR:TIR:TOR
Find BCN50, prepend "GT:" then replace BCN50 with \1 capture group
Find tab following TOR, and prepend 0/0 for Normal
From the same TOR starting point, match anything except tabs. Prepend 0/1 for tumor
Code:
sed -ri 's|(BCN50:)|GT:\1|g' "$strelka_mod"
sed -ri 's|(:TOR\t)|\10/0:|g' "$strelka_mod"
sed -ri 's|(:TOR\t[^\t]*\t)|\10/1:|g' "$strelka_mod"

This changes the lines of strelka output from this:
Code:
chr1	803750	.	TA	T	.	PASS	IC=10;IHP=12;MQ=54.88;MQ0=0;NT=ref;QSI=35;QSI_NT=35;RC=11;RU=A;SGT=ref->het;SOMATIC;SomaticEVS=6.65;TQSI=2;TQSI_NT=2	BCN50:DP:DP2:DP50:FDP50:SUBDP50:TAR:TIR:TOR	0.06:38:38:35.07:2.02:0.00:31,40:0,0:7,4	0.09:25:25:23.33:2.12:0.00:17,22:4,4:4,2
To this:
Code:
chr1	803750	.	TA	T	.	PASS	IC=10;IHP=12;MQ=54.88;MQ0=0;NT=ref;QSI=35;QSI_NT=35;RC=11;RU=A;SGT=ref->het;SOMATIC;SomaticEVS=6.65;TQSI=2;TQSI_NT=2	GT:BCN50:DP:DP2:DP50:FDP50:SUBDP50:TAR:TIR:TOR	0/0:0.06:38:38:35.07:2.02:0.00:31,40:0,0:7,4	0/1:0.09:25:25:23.33:2.12:0.00:17,22:4,4:4,2

Admittedly, the 0/0 and 0/1 inserted aren't necessarily accurately representing homozygous / heterozygous status, but it was enough to get Annovar to run.
GobiJerboa is offline   Reply With Quote
Reply

Tags
genotype, strelka, variant analysis, variant calling, vcf

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO