SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tool for SNP calling and annotaion mathew Bioinformatics 2 09-02-2014 06:47 PM
RNA-seq SNP calling softwore huangjun RNA Sequencing 8 07-22-2013 11:51 PM
RNA-seq SNP-calling without a complete reference shoegame2001 RNA Sequencing 6 07-04-2012 12:55 AM

Reply
 
Thread Tools
Old 07-31-2012, 05:21 PM   #1
DNAase
Junior Member
 
Location: Los Angeles

Join Date: Dec 2009
Posts: 5
Default Bis-SNP: An accurate SNP and methylation calling tool for Bisulfite-Seq/NOMe-seq/RRBS

Why we develop Bis-SNP?

Identifi cation and proper handling of SNPs in Bisulfite-seq are important for accurate quanti cation of methylation levels, especially so given the fact that C>T is the most common substitution in the human population (65% of all SNPs in dbSNP), and these usually occur in the CpG context. It is also required to identify SNPs for sequence dependent allele specific methylation analysis.
SNP calling of bisul te sequencing data has signifi cant complications. First, reads from the two genomic strands are not complementary, and this assumption of complementarity is made by all SNP calling algorithms. Second, true (evolutionary) C>T SNPs in the sample cannot be distinguished from C>T substitutions that are caused by bisulfi te conversion and can thus be misidenti ed as unmethylated Cs.

Currently, there is no other public available tool for SNP calling in Bisulfite-seq data. We implement and test all the methods described in current published methylome papers. None of their SNP detection methods works well unless applying additional matched non-bisul te sequencing data in the same sample/strain.

We have therefore developed a new tool, called Bis-SNP, for the accurate SNP and methylation analysis of BS-seq data. Bis-SNP is a software package mainly written in Java that is based on the GATK map-reduce framework. All associated files can be downloaded at:

http://epigenome.usc.edu/publication...011/index.html

How does it work?

Bis-SNP uses Bayesian inference to evaluate a model of strand-speci c base calls and base call quality scores, along with prior information on population SNP frequencies, experiment-speci c bisul te conversion efficiency, and site-speci c DNA methylation estimates.

It also enable base call quality score recalibration in Bisulfite-seq, an addition that has greatly improved SNP calling in the non-bisul te context. Since very few Bisulfite-mapping tool right now could do gapped alignment to detect indels, which would cause a lot of fake SNPs around indels, Bis-SNP also enables a local indel realignment in Bisulfite-seq. Bis-SNP is open-source and based on the Genome Analysis Toolkit (GATK) framework, in order to take advantage of the parallel Map-Reduce computation strategy and
provide practical execution times.

Bis-SNP accepts either single-end or paired-end mapped Bisul te-seq/NOMe-seq/RRBS data in the form of BAM fi les, and outputs SNP and methylation information using standard VCF formats and bed/bedDetail/bedGraph/wig formats.

Bis-SNP allows to call and summarize methylation of any cytosine context user provided (CpG, CHH, CHG, GCH et.al.), which enables its widely adaptation to different kinds of bisulfite treated sequencing data, e.g. Bisul te-seq/NOMe-seq/RRBS.

Bis-SNP provides a bunch of perl scripts to easy handel the output file format conversion and the whole genotyping and methylation calling pipeline.


Bis-SNP performance?

We have validated the specificity and sensitivity of SNP detection by Bisulfite-seq and Illumina 1M SNP array in the same sample. In default threshold (Phred scale score > 20) and test sample sequence depth(30X), it could detect 92.21% heterozygous SNPs with 0.14% false positive rate (90.88% sensitivity in C/T SNPs with 0.16% false positive rate, 98.51% sensitivity in non C/T SNPs with 0.16% false positive rate). In 10X sequence depth single sample, it could still detect 80% of the heterozygous SNPs and 98% of homozygous cytosines within FDR<0.05.

We show that Bis-SNP is a practical tool that can both (1) improve DNA methylation calling accuracy by detecting SNPs at cytosines and adjacent positions and (2) identify heterozygous SNPs that can be used to investigate mono-allelic DNA methylation and polymorphisms in cis-regulatory sequences.

Publication:

Liu Y, Siegmund KD, Laird PW, Berman BP. Bis-SNP: combined DNA methylation and SNP calling for Bisulfite-seq data. Genome Biology 2012 Jul 11;13(7):R61.
__________________
USC Epigenome Center
DNAase is offline   Reply With Quote
Old 08-06-2012, 05:44 AM   #2
yog77
Member
 
Location: London

Join Date: Jun 2011
Posts: 18
Default

I have some PCR/amplicon derived NGS data for a single small gene region. It is 100bp PE Illumina data for only on the origninal top bisulphite converted strand.

I assume I am unable to use Bis-SNP to call CpG SNP's as I have no data on the original bottom bisulphite converted strand, but am I able to use Bis-SNP to call SNP's at non-CpG bases? If not are you able to suggest any apps that allow me to get this non-CpG SNP data at the same time as the methylation data?

Regards
yog77 is offline   Reply With Quote
Old 08-09-2012, 03:19 PM   #3
DNAase
Junior Member
 
Location: Los Angeles

Join Date: Dec 2009
Posts: 5
Default

yes, in your condition, you can call all Non C/T SNPs and part of C/T SNP (when C on the reverse strand)
__________________
USC Epigenome Center
DNAase is offline   Reply With Quote
Old 10-29-2012, 05:43 AM   #4
ermelin
Junior Member
 
Location: Leipzig

Join Date: Oct 2012
Posts: 1
Default

Hi,
I have some huge bam files I want to process using bissnp. I set the nt flag to 2 in order to use only 2 CPU. Running bissnp as shown in 3.3 in the manual with the additonal flag -nt 2, bissnp uses still all Cpu avaliable.
Is there another possiblity to reduce the Cpu use?

Regards
ermelin is offline   Reply With Quote
Old 10-29-2012, 02:53 PM   #5
DNAase
Junior Member
 
Location: Los Angeles

Join Date: Dec 2009
Posts: 5
Default

Hi,
It sounds weird..
Could you please post your running message in our google group? I will see it and give you the solution there. Thanks!

https://groups.google.com/forum/?hl=...um/bissnp-help
__________________
USC Epigenome Center
DNAase is offline   Reply With Quote
Old 03-31-2018, 02:21 AM   #6
alim123
Junior Member
 
Location: delhi

Join Date: Apr 2016
Posts: 8
Default

Dear,

I am getting two different sets of snp result by same bam file when i used two different version. I am very much confused, which should i used for my further analysis.

Thank You
alim123 is offline   Reply With Quote
Old 03-31-2018, 05:33 AM   #7
DNAase
Junior Member
 
Location: Los Angeles

Join Date: Dec 2009
Posts: 5
Default

Apologized. Do you mean the newest 1.0 version? I just migrated it to the new GATK framework and have not got time to benchmark the performance yet.
__________________
USC Epigenome Center
DNAase is offline   Reply With Quote
Old 03-31-2018, 06:02 AM   #8
alim123
Junior Member
 
Location: delhi

Join Date: Apr 2016
Posts: 8
Default

Yes..And when i intersect both the file generated from two different version...I got only 5700 common snp.
alim123 is offline   Reply With Quote
Old 03-31-2018, 06:09 AM   #9
alim123
Junior Member
 
Location: delhi

Join Date: Apr 2016
Posts: 8
Default

And the latest version output gave more number of snp as compared to previous one. I don,t know what is the reason. And very much confused. which file i should use for further analysis.
alim123 is offline   Reply With Quote
Old 03-31-2018, 06:28 AM   #10
alim123
Junior Member
 
Location: delhi

Join Date: Apr 2016
Posts: 8
Default

One more thing ..The raw vcf file generated by latest version is too large( Size in Gb) while previous version gave 80-90 Mb raw vcf files.

Thank You
alim123 is offline   Reply With Quote
Old 04-30-2018, 11:37 AM   #11
seqlad
Junior Member
 
Location: Usa

Join Date: Mar 2018
Posts: 2
Default

Hello,

I ran Bis-SNP1.0.0 from he prompt line using the following command:
(Java version: JDK8)
java -Xmx10g -jar BisSNP-1.0.0.jar
-R mm10.fa
-I file.bam
-T BisulfiteCountCovariates
-knownSites dbSNP-150.vcf
-cov ReadGroupCovariate
-cov QualityScoreCovariate
-cov CycleCovariate
-recalFile File.csv

and got the following result:
ERROR A USER ERROR has occurred (version 3.8-1-0-gf15c1c3ef):
ERROR
ERROR This means that one or more arguments or inputs in your command are incorrect.
ERROR The error message below tells you what is the problem.
ERROR
ERROR If the problem is an invalid argument, please check the online documentation guide
ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
ERROR
ERROR Visit our website and forum for extensive documentation and answers to
ERROR commonly asked questions https://software.broadinstitute.org/gatk
ERROR
ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
ERROR
ERROR MESSAGE: Invalid command line: Malformed walker argument: Could not find walker with name: BisulfiteCountCovariates
ERROR

I can't find the BisulfiteCountCovariate.java walker in BisSNP files could that at least part of the probelm?

Thanks for helping
seqlad is offline   Reply With Quote
Reply

Tags
bisulfite snp calling, methylation calling

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO