Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • erichpeterson
    Junior Member
    • Nov 2012
    • 4

    Adding GT tag to Strelka Output

    Does anyone know of a way to add the GT tag to the VCF output of Strelka? There is a SGT tag, described as "Most likely somatic genotype excluding normal noise states," but it is not in the typical GT format. Maybe there is a program that will convert it to GT format?

    Thanks,
    Erich
  • GobiJerboa
    Junior Member
    • Aug 2013
    • 2

    #2
    I came across this issue while trying to run Annovar on Strelka2 output.
    Annovar throws an error without the GT tag so I did the following at the command line to add in GT information:

    Set filenames
    Code:
    strelka_output_file="somatic.indels.passed.vcf"
    strelka_mod="somatic.indels.passed.GTmod.vcf"
    Add GT FORMAT in VCF header
    Find the first ##FORMAT line in header
    grep for the line "n"umber and only 1 "m"atch
    sed with a leading "number"i will insert into the file at the specified line. e.g. Leading 8i will insert at 8th line.
    Code:
    first_format_num=$(grep -n -m 1 '##FORMAT' "$strelka_output_file" | cut -d : -f 1)
    sed "$first_format_num"'i##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">' "$strelka_output_file" > "$strelka_mod"
    Use sed with extended "r"egular expression support to edit "i"nplace
    All lines of my strelka output have the format string BCN50PP2P50:FDP50:SUBDP50:TAR:TIR:TOR
    Find BCN50, prepend "GT:" then replace BCN50 with \1 capture group
    Find tab following TOR, and prepend 0/0 for Normal
    From the same TOR starting point, match anything except tabs. Prepend 0/1 for tumor
    Code:
    sed -ri 's|(BCN50:)|GT:\1|g' "$strelka_mod"
    sed -ri 's|(:TOR\t)|\10/0:|g' "$strelka_mod"
    sed -ri 's|(:TOR\t[^\t]*\t)|\10/1:|g' "$strelka_mod"

    This changes the lines of strelka output from this:
    Code:
    chr1	803750	.	TA	T	.	PASS	IC=10;IHP=12;MQ=54.88;MQ0=0;NT=ref;QSI=35;QSI_NT=35;RC=11;RU=A;SGT=ref->het;SOMATIC;SomaticEVS=6.65;TQSI=2;TQSI_NT=2	BCN50:DP:DP2:DP50:FDP50:SUBDP50:TAR:TIR:TOR	0.06:38:38:35.07:2.02:0.00:31,40:0,0:7,4	0.09:25:25:23.33:2.12:0.00:17,22:4,4:4,2
    To this:
    Code:
    chr1	803750	.	TA	T	.	PASS	IC=10;IHP=12;MQ=54.88;MQ0=0;NT=ref;QSI=35;QSI_NT=35;RC=11;RU=A;SGT=ref->het;SOMATIC;SomaticEVS=6.65;TQSI=2;TQSI_NT=2	[B]GT:[/B]BCN50:DP:DP2:DP50:FDP50:SUBDP50:TAR:TIR:TOR	[B]0/0:[/B]0.06:38:38:35.07:2.02:0.00:31,40:0,0:7,4	[B]0/1:[/B]0.09:25:25:23.33:2.12:0.00:17,22:4,4:4,2

    Admittedly, the 0/0 and 0/1 inserted aren't necessarily accurately representing homozygous / heterozygous status, but it was enough to get Annovar to run.

    Comment

    • GobiJerboa
      Junior Member
      • Aug 2013
      • 2

      #3
      I came across this issue while trying to run Annovar on Strelka2 output.
      Annovar throws an error without the GT tag so I did the following at the command line to add in GT information:

      Set filenames
      Code:
      strelka_output_file="somatic.indels.passed.vcf"
      strelka_mod="somatic.indels.passed.GTmod.vcf"
      Add GT FORMAT in VCF header
      Find the first ##FORMAT line in header
      grep for the line "n"umber and only 1 "m"atch
      sed with a leading "number"i will insert into the file at the specified line. e.g. Leading 8i will insert at 8th line.
      Code:
      first_format_num=$(grep -n -m 1 '##FORMAT' "$strelka_output_file" | cut -d : -f 1)
      sed "$first_format_num"'i##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">' "$strelka_output_file" > "$strelka_mod"
      Use sed with extended "r"egular expression support to edit "i"nplace
      All lines of my strelka output have the format string BCN50PP2P50:FDP50:SUBDP50:TAR:TIR:TOR
      Find BCN50, prepend "GT:" then replace BCN50 with \1 capture group
      Find tab following TOR, and prepend 0/0 for Normal
      From the same TOR starting point, match anything except tabs. Prepend 0/1 for tumor
      Code:
      sed -ri 's|(BCN50:)|GT:\1|g' "$strelka_mod"
      sed -ri 's|(:TOR\t)|\10/0:|g' "$strelka_mod"
      sed -ri 's|(:TOR\t[^\t]*\t)|\10/1:|g' "$strelka_mod"

      This changes the lines of strelka output from this:
      Code:
      chr1	803750	.	TA	T	.	PASS	IC=10;IHP=12;MQ=54.88;MQ0=0;NT=ref;QSI=35;QSI_NT=35;RC=11;RU=A;SGT=ref->het;SOMATIC;SomaticEVS=6.65;TQSI=2;TQSI_NT=2	BCN50:DP:DP2:DP50:FDP50:SUBDP50:TAR:TIR:TOR	0.06:38:38:35.07:2.02:0.00:31,40:0,0:7,4	0.09:25:25:23.33:2.12:0.00:17,22:4,4:4,2
      To this:
      Code:
      chr1	803750	.	TA	T	.	PASS	IC=10;IHP=12;MQ=54.88;MQ0=0;NT=ref;QSI=35;QSI_NT=35;RC=11;RU=A;SGT=ref->het;SOMATIC;SomaticEVS=6.65;TQSI=2;TQSI_NT=2	[B]GT:[/B]BCN50:DP:DP2:DP50:FDP50:SUBDP50:TAR:TIR:TOR	[B]0/0:[/B]0.06:38:38:35.07:2.02:0.00:31,40:0,0:7,4	[B]0/1:[/B]0.09:25:25:23.33:2.12:0.00:17,22:4,4:4,2

      Admittedly, the 0/0 and 0/1 inserted aren't necessarily accurately representing homozygous / heterozygous status, but it was enough to get Annovar to run.

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, Today, 10:09 AM
      0 responses
      9 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, Yesterday, 08:59 AM
      0 responses
      14 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      24 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 11:40 AM
      0 responses
      20 views
      0 reactions
      Last Post SEQadmin2  
      Working...