SEQanswers (
-   Introductions (
-   -   Counting of SNPs in different genomic regions (

shis 02-20-2017 01:15 PM

Counting of SNPs in different genomic regions
After annotating the SNP.vcf files using SnpEff v 4.2, I would like to count how many SNPs are found in different genomic regions such as upstream, gene, exon, intron, CDS, 5' UTR, 3' UTR, downstream and intergenic. To perform the SNPs count, I am looking for a program or perl- or python script for few days in different forums. I have annotated SNP.VCF and gene.txt file obtained from SNpEff. Can anyone help me - how can I count the number of SNPs found in different genomic regions? I would highly appreciate your help. Thanks

gringer 02-20-2017 05:03 PM

I've forgotten how SnpEff outputs information, but in general using 'grep -c' is a quick (but mostly manual) way.

If you want to count lots of things at once, isolate out the categories (e.g. using 'cut'), then run 'sort' and 'uniq -c'. Alternatively, import the data into R and use 'table'.

shis 02-21-2017 12:34 PM

Thanks a lot Gringer. I used SnpSift Extract Fields to get the effect of SNPs in different regions. Then the number of SNPs in different regions are counted in Excel. Finally I got SNPs number in different genomic regions.

All times are GMT -8. The time now is 02:34 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.