![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to build reference genome for targeted region(from a BED file) for alignment? | serenaliao | Epigenetics | 6 | 08-07-2015 02:44 PM |
RNA-Seq and mouse reference genome | ChristmasSunflower | Bioinformatics | 3 | 06-26-2014 12:23 AM |
VCF file for the Mouse genome (mm9) used for GATK | gap | Bioinformatics | 6 | 05-23-2014 02:10 PM |
Reference genome-SNP calling | melNGS | Bioinformatics | 0 | 09-27-2012 04:32 AM |
BOWTIE - Build Indexes - tutorial | andrehorta | Bioinformatics | 0 | 02-07-2011 04:52 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: New England Join Date: Nov 2010
Posts: 27
|
![]()
GATK is a standard tool for calling SNPs however their authors did not provide any reference genomes or reference SNPs for non-human organism, such as mouse. Here is my quick tutorial for building a mm10 reference mouse genome and dbSNP reference SNP from scratch. It's not automated. I appreciate any inputs to make this workflow more efficient.
1. Build reference mm10 genome. 1.1 Download reference here:http://ccb.jhu.edu/software/tophat/igenomes.shtml, make sure you are downloading the "Mus musculus UCSC MM10" reference. 1.2 Untar the file, find the directory which contains the sequence for each individual chromosomes. The directory looks like this "Mus_musculus_UCSC_mm10\Mus_musculus\UCSC\mm10\Sequence\Chromosomes" Enter the directory. 1.3 Change the chromosome header: sed -i -- "s/chr//g" #.fa 1.4 Combine the chromosomes into a full genome: cat ch1.fa chr2.fa...chrX.fa chr.Y.fa > mm10.fa #Make sure you are combining the chromosomes in karyotypic order and you are not including random or unmapped chromosomes. 1.5 index the genome and build dictionary file: samtools faidx mm10.fa java -jar CreateSequenceDictionary.jar R=mm10.fa O=mm10.dict 1.6 Create BWA index bwa index -a bwtsw mm10.fa 2. Build reference mouse SNP 2.1 Download VCF (reference mouse SNP) wget ftp://ftp.ncbi.nih.gov/snp/organisms...f_chr_*.vcf.gz #Discard un and MT and randome chromosome, then unzip #Remove excessive header (delete first 14 rows): sed "1,14d" chr2.vcf #do all except chr1 #merge all vcf cat chr1.vcf chr2.vcf... chrX.vcf chrY.vcf > dbsnp.vcf Now you can use BWA to align the raw reads first, and then use GATK to call the SNPs. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: USA Join Date: Sep 2012
Posts: 130
|
![]()
I am not sure why you are editing the chromosome names or merging multiple files. iGenomes already comes with a combined genome FASTA file (Sequence/WholeGenomeFasta) that is already indexed.
|
![]() |
![]() |
![]() |
#3 | |
Member
Location: New England Join Date: Nov 2010
Posts: 27
|
![]() Quote:
chr10 130694993 7 50 51 chr11 122082543 133308907 50 51 chr12 120129022 257833108 50 51 chr13 120421639 380364718 50 51 chr14 124902244 503194797 50 51 chr15 104043685 630595093 50 51 chr16 98207768 736719659 50 51 chr17 94987271 836891590 50 51 chr18 90702639 933778614 50 51 chr19 61431566 1026295313 50 51 chr1 195471971 1088955517 50 51 chr2 182113224 1288336934 50 51 chr3 160039680 1474092429 50 51 chr4 156508116 1637332909 50 51 chr5 151834684 1796971194 50 51 chr6 149736546 1951842578 50 51 chr7 145441459 2104573861 50 51 chr8 129401213 2252924156 50 51 chr9 124595110 2384913400 50 51 chrM 16299 2512000419 50 51 chrX 171031299 2512017050 50 51 chrY 91744698 2686468981 50 51 |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|