Hi,
Currently I am using snpEff to annotate vcf files. The output is also a vcf with the info field populated by several annotations including the gene names.
For a sample, I am showing the annotations for five consecutive rows:
DOWNSTREAM(MODIFIER||||NOC2L|processed_transcript|CODING|ENST00000469563|),DOWNSTREAM(MODIFIER||||NOC2L|processed_transcript|CODING|ENST00000487214|),INTRON(LOW||||NOC2L|processed_transcript|CODING|ENST00000327044|),TRANSCRIPT(MODIFIER||||NOC2L|processed_transcript|CODING|ENST00000477976|)
INTRON(LOW||||KLHL17|protein_coding|CODING|ENST00000455747|),INTRON(LOW||||KLHL17|protein_coding|CODING|ENST00000540863|),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|Gcc/Acc|A111T|KLHL17|protein_coding|CODING|ENST00000338591|exon_1_896673_896932),TRANSCRIPT(MODIFIER||||KLHL17|protein_coding|CODING|ENST00000463212|),TRANSCRIPT(MODIFIER||||KLHL17|protein_coding|CODING|ENST00000473277|),UPSTREAM(LOW||||KLHL17|protein_coding|CODING|ENST00000466300|),UPSTREAM(LOW||||KLHL17|protein_coding|CODING|ENST00000481067|),UPSTREAM(LOW||||NOC2L|processed_transcript|CODING|ENST00000327044|),UPSTREAM(LOW||||NOC2L|processed_transcript|CODING|ENST00000469563|),UPSTREAM(LOW||||NOC2L|processed_transcript|CODING|ENST00000477976|),UPSTREAM(LOW||||NOC2L|processed_transcript|CODING|ENST00000487214|),UPSTREAM(LOW||||PLEKHN1|protein_coding|CODING|ENST00000379407|),UPSTREAM(LOW||||PLEKHN1|protein_coding|CODING|ENST00000379409|),UPSTREAM(LOW||||PLEKHN1|protein_coding|CODING|ENST00000379410|)
DOWNSTREAM(MODIFIER||||DVL1|protein_coding|CODING|ENST00000263743|),DOWNSTREAM(MODIFIER||||DVL1|protein_coding|CODING|ENST00000345100|),DOWNSTREAM(MODIFIER||||DVL1|protein_coding|CODING|ENST00000378888|),DOWNSTREAM(MODIFIER||||DVL1|protein_coding|CODING|ENST00000378891|),DOWNSTREAM(MODIFIER||||GLTPD1|processed_transcript|CODING|ENST00000343938|),DOWNSTREAM(MODIFIER||||GLTPD1|processed_transcript|CODING|ENST00000464957|),SYNONYMOUS_CODING(LOW|SILENT|ggG/ggA|G384|TAS1R3|protein_coding|CODING|ENST00000339381|exon_1_1267404_1268186)
TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000317673|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000340677|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000341832|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000407249|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000513088|)
TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000317673|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000340677|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000341832|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000407249|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000513088|)
My question is how do I extract specific fields like the gene names for all the rows?? vcftools doesn't help as it can only extract the whole info field with all these annotations.
Thanks
-Kasthuri
Currently I am using snpEff to annotate vcf files. The output is also a vcf with the info field populated by several annotations including the gene names.
For a sample, I am showing the annotations for five consecutive rows:
DOWNSTREAM(MODIFIER||||NOC2L|processed_transcript|CODING|ENST00000469563|),DOWNSTREAM(MODIFIER||||NOC2L|processed_transcript|CODING|ENST00000487214|),INTRON(LOW||||NOC2L|processed_transcript|CODING|ENST00000327044|),TRANSCRIPT(MODIFIER||||NOC2L|processed_transcript|CODING|ENST00000477976|)
INTRON(LOW||||KLHL17|protein_coding|CODING|ENST00000455747|),INTRON(LOW||||KLHL17|protein_coding|CODING|ENST00000540863|),NON_SYNONYMOUS_CODING(MODERATE|MISSENSE|Gcc/Acc|A111T|KLHL17|protein_coding|CODING|ENST00000338591|exon_1_896673_896932),TRANSCRIPT(MODIFIER||||KLHL17|protein_coding|CODING|ENST00000463212|),TRANSCRIPT(MODIFIER||||KLHL17|protein_coding|CODING|ENST00000473277|),UPSTREAM(LOW||||KLHL17|protein_coding|CODING|ENST00000466300|),UPSTREAM(LOW||||KLHL17|protein_coding|CODING|ENST00000481067|),UPSTREAM(LOW||||NOC2L|processed_transcript|CODING|ENST00000327044|),UPSTREAM(LOW||||NOC2L|processed_transcript|CODING|ENST00000469563|),UPSTREAM(LOW||||NOC2L|processed_transcript|CODING|ENST00000477976|),UPSTREAM(LOW||||NOC2L|processed_transcript|CODING|ENST00000487214|),UPSTREAM(LOW||||PLEKHN1|protein_coding|CODING|ENST00000379407|),UPSTREAM(LOW||||PLEKHN1|protein_coding|CODING|ENST00000379409|),UPSTREAM(LOW||||PLEKHN1|protein_coding|CODING|ENST00000379410|)
DOWNSTREAM(MODIFIER||||DVL1|protein_coding|CODING|ENST00000263743|),DOWNSTREAM(MODIFIER||||DVL1|protein_coding|CODING|ENST00000345100|),DOWNSTREAM(MODIFIER||||DVL1|protein_coding|CODING|ENST00000378888|),DOWNSTREAM(MODIFIER||||DVL1|protein_coding|CODING|ENST00000378891|),DOWNSTREAM(MODIFIER||||GLTPD1|processed_transcript|CODING|ENST00000343938|),DOWNSTREAM(MODIFIER||||GLTPD1|processed_transcript|CODING|ENST00000464957|),SYNONYMOUS_CODING(LOW|SILENT|ggG/ggA|G384|TAS1R3|protein_coding|CODING|ENST00000339381|exon_1_1267404_1268186)
TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000317673|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000340677|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000341832|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000407249|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000513088|)
TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000317673|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000340677|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000341832|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000407249|),TRANSCRIPT(MODIFIER||||AL691432.2|unprocessed_pseudogene|NON_CODING|ENST00000513088|)
My question is how do I extract specific fields like the gene names for all the rows?? vcftools doesn't help as it can only extract the whole info field with all these annotations.
Thanks
-Kasthuri
Comment