SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to build reference genome for targeted region(from a BED file) for alignment? serenaliao Epigenetics 6 08-07-2015 01:44 PM
RNA-Seq and mouse reference genome ChristmasSunflower Bioinformatics 3 06-25-2014 11:23 PM
VCF file for the Mouse genome (mm9) used for GATK gap Bioinformatics 6 05-23-2014 01:10 PM
Reference genome-SNP calling melNGS Bioinformatics 0 09-27-2012 03:32 AM
BOWTIE - Build Indexes - tutorial andrehorta Bioinformatics 0 02-07-2011 03:52 AM

Reply
 
Thread Tools
Old 04-13-2015, 08:07 AM   #1
mediator
Member
 
Location: New England

Join Date: Nov 2010
Posts: 27
Default mini tutorial: build reference mouse genome and SNP for GATK

GATK is a standard tool for calling SNPs however their authors did not provide any reference genomes or reference SNPs for non-human organism, such as mouse. Here is my quick tutorial for building a mm10 reference mouse genome and dbSNP reference SNP from scratch. It's not automated. I appreciate any inputs to make this workflow more efficient.

1. Build reference mm10 genome.
1.1 Download reference here:http://ccb.jhu.edu/software/tophat/igenomes.shtml, make sure you are downloading the "Mus musculus UCSC MM10" reference.
1.2 Untar the file, find the directory which contains the sequence for each individual chromosomes. The directory looks like this "Mus_musculus_UCSC_mm10\Mus_musculus\UCSC\mm10\Sequence\Chromosomes"
Enter the directory.
1.3 Change the chromosome header:
sed -i -- "s/chr//g" #.fa
1.4 Combine the chromosomes into a full genome:
cat ch1.fa chr2.fa...chrX.fa chr.Y.fa > mm10.fa #Make sure you are combining the chromosomes in karyotypic order and you are not including random or unmapped chromosomes.
1.5 index the genome and build dictionary file:
samtools faidx mm10.fa
java -jar CreateSequenceDictionary.jar R=mm10.fa O=mm10.dict
1.6 Create BWA index
bwa index -a bwtsw mm10.fa

2. Build reference mouse SNP
2.1 Download VCF (reference mouse SNP)
wget ftp://ftp.ncbi.nih.gov/snp/organisms...f_chr_*.vcf.gz
#Discard un and MT and randome chromosome, then unzip
#Remove excessive header (delete first 14 rows):
sed "1,14d" chr2.vcf #do all except chr1
#merge all vcf
cat chr1.vcf chr2.vcf... chrX.vcf chrY.vcf > dbsnp.vcf

Now you can use BWA to align the raw reads first, and then use GATK to call the SNPs.
mediator is offline   Reply With Quote
Old 04-13-2015, 08:34 AM   #2
id0
Senior Member
 
Location: USA

Join Date: Sep 2012
Posts: 130
Default

I am not sure why you are editing the chromosome names or merging multiple files. iGenomes already comes with a combined genome FASTA file (Sequence/WholeGenomeFasta) that is already indexed.
id0 is offline   Reply With Quote
Old 04-13-2015, 08:47 AM   #3
mediator
Member
 
Location: New England

Join Date: Nov 2010
Posts: 27
Default

Quote:
Originally Posted by id0 View Post
I am not sure why you are editing the chromosome names or merging multiple files. iGenomes already comes with a combined genome FASTA file (Sequence/WholeGenomeFasta) that is already indexed.
That genome is not sorted in karyotypic order
chr10 130694993 7 50 51
chr11 122082543 133308907 50 51
chr12 120129022 257833108 50 51
chr13 120421639 380364718 50 51
chr14 124902244 503194797 50 51
chr15 104043685 630595093 50 51
chr16 98207768 736719659 50 51
chr17 94987271 836891590 50 51
chr18 90702639 933778614 50 51
chr19 61431566 1026295313 50 51
chr1 195471971 1088955517 50 51
chr2 182113224 1288336934 50 51
chr3 160039680 1474092429 50 51
chr4 156508116 1637332909 50 51
chr5 151834684 1796971194 50 51
chr6 149736546 1951842578 50 51
chr7 145441459 2104573861 50 51
chr8 129401213 2252924156 50 51
chr9 124595110 2384913400 50 51
chrM 16299 2512000419 50 51
chrX 171031299 2512017050 50 51
chrY 91744698 2686468981 50 51
mediator is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:28 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO