SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to tell SNPs synonymous v. nonsynonymous Margot Bioinformatics 31 09-29-2011 08:24 AM
GATK UnifiedGenotyper calling way too many SNPs in vcf swbarnes2 Bioinformatics 0 08-17-2011 01:33 PM
VCF formated bovine SNPs Moo Bioinformatics 3 05-23-2011 05:01 AM
Bovine SNPs in VCF format????? HELP! AKilleen Bioinformatics 1 05-10-2011 01:54 PM
Predicting true SNPs from .vcf file swbarnes2 Bioinformatics 1 04-06-2011 03:29 PM

Reply
 
Thread Tools
Old 05-15-2012, 05:41 AM   #1
bioinfun
Junior Member
 
Location: London

Join Date: Jun 2011
Posts: 4
Default synonymous snps from vcf

Hi

Anyone has any ideas how would one find out (programmatically) synonymous and non-synonymous snps from vcf files? I have used mpileup on several hundred bacterial genomes to get the vcf file.

Thanks
bioinfun is offline   Reply With Quote
Old 05-15-2012, 05:58 AM   #2
ymc
Senior Member
 
Location: Hong Kong

Join Date: Mar 2010
Posts: 493
Default

Well, you either write your own tool to do that or try annovar
ymc is offline   Reply With Quote
Old 05-15-2012, 07:03 AM   #3
iansealy
Member
 
Location: Hitchin, UK

Join Date: Oct 2010
Posts: 15
Default

Or Ensembl's VEP (http://www.ensembl.org/tools.html) or snpEff (http://snpeff.sourceforge.net/) or...
iansealy is offline   Reply With Quote
Old 05-15-2012, 07:40 AM   #4
bioinfun
Junior Member
 
Location: London

Join Date: Jun 2011
Posts: 4
Default

Thanks guys but....

I am trying to program it myself and I thought I can get some leads into how to do this from a vcf file.

What do you think of this quick way of doing this:

1- get the nucleotide sequence of the CDS that has the SNP
2- perform 6-frame translation
3- compare with reference translated sequence
4- if the sequences are different then the SNP at point (1) is non-syn if they are the same then its syn.

Not accurate but will give you an idea. What do you guys think?
bioinfun is offline   Reply With Quote
Old 05-15-2012, 08:15 AM   #5
SeekAnswers
Member
 
Location: USA

Join Date: Mar 2012
Posts: 21
Default

You can try comparing the coordinates in the variant VCF with the coding region start/ends in refseq to see where your variant falls in and make a determination based on that.
SeekAnswers is offline   Reply With Quote
Old 05-15-2012, 08:38 AM   #6
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

What I've done is using the coordiante from the vcf to get the sequence around and including the SNP. Then I blastx those sequences against a database of the proteins from that bacterium. Then I parse the blastx to find out which changes cause amino acid differences.

But yes, annovar is easier, if you can get a file for annovar to use to compare to.
swbarnes2 is offline   Reply With Quote
Old 07-06-2013, 12:47 PM   #7
fanx
Member
 
Location: USA

Join Date: Sep 2012
Posts: 18
Default

Quote:
Originally Posted by bioinfun View Post
Hi

Anyone has any ideas how would one find out (programmatically) synonymous and non-synonymous snps from vcf files? I have used mpileup on several hundred bacterial genomes to get the vcf file.

Thanks
bioinfun, I have a similar problem. Are there any solutions now? Thanks.
fanx is offline   Reply With Quote
Old 07-06-2013, 02:01 PM   #8
JackieBadger
Senior Member
 
Location: Halifax, Nova Scotia

Join Date: Mar 2009
Posts: 381
Default

Look at this pub. "De novo Transcriptome Assembly and SNP Discovery in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae)"

I now provide a quote from the primary author, reference their paper if you use the script

"The script for finding amino acid changes uses several data files.

- I searched the ORFs in the unigenes with this program: http://proteomics.ysu.edu/tools/OrfPredictor.html

Output: a CDS file (DNA sequences of the ORFs) and a PEP file (AA sequences of the ORFs, and also contains START, STOP and READINGFRAME of the ORFs)



- SNP calling with SAMtools

Output: VCF file (SNP and positions of SNP)



- Perl script (SNP_in_ORF_nonsyn.pl) infers whether SNPs are located within an ORF and whether the SNP results in an amino acid change. The script gets the SNP position from the VCF file, mutates the position in the original sequence in the unigene fasta file, then translates that sequence according its ORF (from PEP file) and then checks whether the original sequence differs from the mutated sequence. The script uses bioperl.

Output: each line in the VCF file that contains a nonsynonymous SNP. At the end, the number of synonymous and nonsynonymous is also outputted.



I made the script and data available here: http://users.ugent.be/~slvbelle/NGS/

(I added an example PEP and VCF file which should work)



The script should be used as follows:

./SNP_in_ORF_nonsyn.pl Trinity_GC018ALL_unique.fasta PEP.fasta SNP.vcf > output"
JackieBadger is offline   Reply With Quote
Old 07-06-2013, 04:53 PM   #9
fanx
Member
 
Location: USA

Join Date: Sep 2012
Posts: 18
Default

Quote:
Originally Posted by JackieBadger View Post
Look at this pub. "De novo Transcriptome Assembly and SNP Discovery in the Wing Polymorphic Salt Marsh Beetle Pogonus chalceus (Coleoptera, Carabidae)"

I now provide a quote from the primary author, reference their paper if you use the script

"The script for finding amino acid changes uses several data files.

- I searched the ORFs in the unigenes with this program: http://proteomics.ysu.edu/tools/OrfPredictor.html

Output: a CDS file (DNA sequences of the ORFs) and a PEP file (AA sequences of the ORFs, and also contains START, STOP and READINGFRAME of the ORFs)



- SNP calling with SAMtools

Output: VCF file (SNP and positions of SNP)



- Perl script (SNP_in_ORF_nonsyn.pl) infers whether SNPs are located within an ORF and whether the SNP results in an amino acid change. The script gets the SNP position from the VCF file, mutates the position in the original sequence in the unigene fasta file, then translates that sequence according its ORF (from PEP file) and then checks whether the original sequence differs from the mutated sequence. The script uses bioperl.

Output: each line in the VCF file that contains a nonsynonymous SNP. At the end, the number of synonymous and nonsynonymous is also outputted.



I made the script and data available here: http://users.ugent.be/~slvbelle/NGS/

(I added an example PEP and VCF file which should work)



The script should be used as follows:

./SNP_in_ORF_nonsyn.pl Trinity_GC018ALL_unique.fasta PEP.fasta SNP.vcf > output"
I havent tried it but I think your script on the end of the post should work (definitely cite that PLoS paper). I also wonder if there are alternative ways because my case is much simple. I sequenced a long and heterogeneous viral ORF using HiSeq 2000. Thus ORF prediction is unnecessary. My destination is to calculate the number of dS and dN over the viral ORF through a sliding window. Only tool I am aware is CLCs SNP analysis tool from a publication (http://www.ncbi.nlm.nih.gov/pubmed/22278255). There may other facilities to be able to do this job too. Thanks in advance.
fanx is offline   Reply With Quote
Old 07-28-2013, 01:20 PM   #10
fanx
Member
 
Location: USA

Join Date: Sep 2012
Posts: 18
Default

JackieBadger, I tried the script. It came with:

Use of uninitialized value $countSyn in concatenation (.) or string at SNP_in_ORF_nonsyn.pl line 101, <GEN0> line 39393.
Use of uninitialized value $countNonSyn in concatenation (.) or string at SNP_in_ORF_nonsyn.pl line 102, <GEN0> line 39393.

any advice? pls.
fanx is offline   Reply With Quote
Old 07-28-2013, 03:19 PM   #11
d1antho
Member
 
Location: Ireland

Join Date: Mar 2012
Posts: 15
Default SNPdat

SNPdat can be used for this

http://www.biomedcentral.com/1471-2105/14/45

http://code.google.com/p/snpdat/

(there is also a short tutorial in the downloads section)

You only need a VCF for input, annotation file (GTF) and reference sequence (Fasta file). The annotation and sequence information can be from your own assembly and dont require any preprocessing.
d1antho is offline   Reply With Quote
Old 07-31-2013, 06:14 AM   #12
Steven VB
Junior Member
 
Location: Belgium

Join Date: Jul 2013
Posts: 2
Default

Quote:
Originally Posted by fanx View Post
JackieBadger, I tried the script. It came with:

Use of uninitialized value $countSyn in concatenation (.) or string at SNP_in_ORF_nonsyn.pl line 101, <GEN0> line 39393.
Use of uninitialized value $countNonSyn in concatenation (.) or string at SNP_in_ORF_nonsyn.pl line 102, <GEN0> line 39393.

any advice? pls.
Please check again: http://users.ugent.be/~slvbelle/NGS/

I made some modifications, it should work now...
Steven VB is offline   Reply With Quote
Old 08-13-2013, 08:05 AM   #13
maoshigua
Junior Member
 
Location: UK

Join Date: Aug 2013
Posts: 3
Default

JackieBadger, I tried the script. It came with:
Error, Reference nucleotide does not equal the one in the original sequence at ./SNP_in_ORF_nonsyn_multiSNP.pl line 85, <GEN0> line 6.

any suggestions, please?

Maoshigua
maoshigua is offline   Reply With Quote
Old 08-13-2013, 08:13 AM   #14
Steven VB
Junior Member
 
Location: Belgium

Join Date: Jul 2013
Posts: 2
Default

Hi maoshigua,

can you send me ([email protected]) a sample of your data? I will try to fix it.

Cheers,
Steven
Steven VB is offline   Reply With Quote
Old 08-13-2013, 08:32 AM   #15
maoshigua
Junior Member
 
Location: UK

Join Date: Aug 2013
Posts: 3
Default

Hi Steven,
i send you those three input files. thanks a lot.

Maoshigua
maoshigua is offline   Reply With Quote
Old 09-19-2017, 05:41 AM   #16
MGCBrown
Member
 
Location: Canada

Join Date: May 2017
Posts: 15
Default

Does anyone have the SNP_in_ORF_nonsyn.pl script that was described in this thread? The link to the script is no longer in use (http://users.ugent.be/~slvbelle/NGS/)

Any info would be great!
MGCBrown is offline   Reply With Quote
Reply

Tags
mpileup, synonymous snps, vcf

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO