SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
GATK: sorting vcf file given a reference file jorge Bioinformatics 4 01-14-2015 12:16 PM
extremely large vcf file generated by GATK slowsmile Bioinformatics 2 12-09-2012 08:23 PM
GATK: VCF file for Local realignment around indels jorge Bioinformatics 2 10-10-2011 11:15 PM
Conflict between mpileup/bcftools and GATK in VCF file ericarcher Bioinformatics 0 09-25-2011 04:33 PM
Converting Dindel VCF file to GATK BED file MolecularToast Bioinformatics 2 09-24-2011 06:38 PM

Reply
 
Thread Tools
Old 05-04-2012, 10:26 AM   #1
gap
Junior Member
 
Location: New York

Join Date: May 2012
Posts: 2
Default VCF file for the Mouse genome (mm9) used for GATK

I needed a mm9 dbSNP128 VCF file(v4 above) to integrate into our whole genome mouse sequencing pipeline using the GATK.
Anybody is lucky enough to generate this file? Broad Institute only provide human dbsnp VCF files. Sanger Institute does provide VCF3.3 files for the mouse strains they sequenced, but no VCF file is provided for mouse dbsnp128.

NCBI/UCSC only has mouse dbsnp128 in text format, and it is not easy to convert it to "workable" vcf file.
gap is offline   Reply With Quote
Old 05-04-2012, 11:48 AM   #2
lynnco2008
Junior Member
 
Location: UNMC

Join Date: Feb 2012
Posts: 4
Default

Hi, what do you mean a workable vcf format? Have you tried that dbsnp128 (from NCBI/UCSC) in GATK?
lynnco2008 is offline   Reply With Quote
Old 05-04-2012, 01:29 PM   #3
gap
Junior Member
 
Location: New York

Join Date: May 2012
Posts: 2
Default

I did following in order to get mouse dbsnp 128 vcf file
wget http://hgdownload.cse.ucsc.edu/golde.../snp128.txt.gz
gunzip snp128.txt.gz
vcfutils.pl ucscsnp2vcf snp128.txt >snp128.vcf

then run GATK
##### ERROR MESSAGE: We saw a record with a start of chr1:21250573 after a record with a start of chr1:21250574, for input source: /mm9/snp128.vcf

I sort mouse vcf file to snp128_sorted.vcf and re-run
##### ERROR MESSAGE: The provided VCF file is malformed at line number 2: Unparsable vcf record with allele NLENGTHTOOLONG

more snp128_sorted.vcf
##fileformat=VCFv4.0
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 3000248 rs32640266 G T 0 . molType=genomic;
class=single;valid=by-frequency
chr1 3000289 rs32137367 T G 0 . molType=genomic;
class=single;valid=by-frequency
chr1 3000353 rs31719101 C T 0 . molType=genomic;
class=single;valid=by-frequency
chr1 3000355 rs31443144 T C 0 . molType=genomic;
class=single;valid=by-frequency
chr1 3000424 rs32793820 TTTTTTTCTTGGGTTTCTGATATTCTTTAAAGGATTTATTGATTTCCT
CCAATTTTTAATTTGCTTTTTTCTTGATTTCTTTAGGATATTTCTTTTTCATTTTCCTTT A,T 0
. molType=genomic;class=single;valid=by-frequency
chr1 3001066 rs49746803 G T 0 . molType=genomic;
class=single
gap is offline   Reply With Quote
Old 05-07-2012, 10:44 AM   #4
SeekAnswers
Member
 
Location: USA

Join Date: Mar 2012
Posts: 21
Default

For the first error: You will have to sort the VCF file on chromosome coordinate order to work with GATK.

A simple unix sort command should do the trick for you.

However I have encountered other issues like the second error while using the dbsnp128 from UCSC with GATK and I was hoping to build a VCF from the xml files supplied by NCBI. I will post here if I find any success that way.

I *think* the snp128.txt file has variants other than small indels and SNPs, which are not being properly negotiated by GATK.

Last edited by SeekAnswers; 05-07-2012 at 10:50 AM.
SeekAnswers is offline   Reply With Quote
Old 05-23-2012, 12:39 AM   #5
wanguan2000
Member
 
Location: shanghai

Join Date: Nov 2010
Posts: 24
Default

Quote:
Originally Posted by gap View Post
I needed a mm9 dbSNP128 VCF file(v4 above) to integrate into our whole genome mouse sequencing pipeline using the GATK.
Anybody is lucky enough to generate this file? Broad Institute only provide human dbsnp VCF files. Sanger Institute does provide VCF3.3 files for the mouse strains they sequenced, but no VCF file is provided for mouse dbsnp128.

NCBI/UCSC only has mouse dbsnp128 in text format, and it is not easy to convert it to "workable" vcf file.
http://gene-seq.vicp.net/2012/05/mou...2-vcf-version/
maybe U can try this daatabase:
We are pleased to announce the release of VCF version dbSNP build 132, available on the mouse assembly (UCSC/mm9). dbSNP build 132 is available at NCBI.

This dbSNP_132 VCF version can be used to GATK pipeline.
Many thanks to dbSNP at NCBI for the data. This version were produced at WuXi Genome Center by Guan Wang and Qin Luo.
mm9_karyosort =['chrM','chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chrX', 'chrY','chr1_random','chr3_random','chr4_random','chr5_random','chr7_random','chr8_random','chr9_random','chr13_random','chr16_random','chr17_random','chrX_random','chrY_random','chrUn_random',]

Last edited by wanguan2000; 05-28-2012 at 06:40 PM.
wanguan2000 is offline   Reply With Quote
Old 05-23-2012, 08:40 AM   #6
SeekAnswers
Member
 
Location: USA

Join Date: Mar 2012
Posts: 21
Default

^Great!

and for working with the snp128.txt file from UCSC I was able to convert it to a workable VCF by usibg GATK VariantsToVCF and then using GATK's liftOverVCF.pl to convert it to MM10 reference.

This seems to be working fine with GATK so far.
SeekAnswers is offline   Reply With Quote
Old 05-23-2014, 01:10 PM   #7
JPreston
Member
 
Location: Eugene, OR

Join Date: Jun 2013
Posts: 17
Default

Can you tell me the command you used to convert .txt to .vcf with GATK? When I try to do this I get an error that the text file isn't sorted correctly but I'm not sure how to sort just a text file.
JPreston is offline   Reply With Quote
Reply

Tags
gatk vcf dbsnp mouse

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:12 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO