View Single Post
Old 10-11-2011, 04:58 AM   #7
Senior Member
Location: St. Louis

Join Date: Dec 2010
Posts: 535

I actually haven't done this for awhile so hopefully I'm not forgetting anything.

1. Take all of your targeted regions, extend each region by X base pairs, where X is determined on your method of generating data. Put all of this in fasta format.

2. When you have your fasta file, you will want to generate a couple other files. One is the novoindex file ( The others are for use with GATK (

3. Then, align your data using novoalign, remove duplicates with Picard, realign around indels and do base quality recalibration if desired.

4. Then, call SNPs with mpileup.

I can give more details if desired. If you want to check how mpileup is progressing, assuming you are using a command such as this:

samtools-0.1.18/samtools mpileup -AB -ugf [reference_seq.fa] [aligned_file.bam] | samtools-0.1.18/bcftools/bcftools view -bvcg -> [intermediate_file] &

Go ahead and use the following command before the first one finishes. If your reference sequence and aligned files are sorted, then looking at the end of the following file that is generated will let you know how far the variant calling has progressed.

samtools-0.1.18/bcftools/bcftools view [intermediate_file] | samtools-0.1.18/bcftools/ varFilter -D99999 >[list_of_variants] &
Heisman is offline   Reply With Quote