View Single Post
Old 10-11-2011, 11:47 AM   #8
Location: India

Join Date: Sep 2011
Posts: 16
Thumbs up

Originally Posted by Heisman View Post
I actually haven't done this for awhile so hopefully I'm not forgetting anything.

1. Take all of your targeted regions, extend each region by X base pairs, where X is determined on your method of generating data. Put all of this in fasta format.

2. When you have your fasta file, you will want to generate a couple other files. One is the novoindex file ( The others are for use with GATK (

3. Then, align your data using novoalign, remove duplicates with Picard, realign around indels and do base quality recalibration if desired.

4. Then, call SNPs with mpileup.

I can give more details if desired. If you want to check how mpileup is progressing, assuming you are using a command such as this:

samtools-0.1.18/samtools mpileup -AB -ugf [reference_seq.fa] [aligned_file.bam] | samtools-0.1.18/bcftools/bcftools view -bvcg -> [intermediate_file] &

Go ahead and use the following command before the first one finishes. If your reference sequence and aligned files are sorted, then looking at the end of the following file that is generated will let you know how far the variant calling has progressed.

samtools-0.1.18/bcftools/bcftools view [intermediate_file] | samtools-0.1.18/bcftools/ varFilter -D99999 >[list_of_variants] &
Thanks Heisman, I got your point. I am doing that part and going fine with it. My question was like when we do dbSNP annotation, if our seq is mapped to ref genome, program will annotate dbSNPs on the basis of location on genome. But when we have seq reads alligned to custom reference, than how do we use dbSNP annotations?
By the way, really thans for the reply. Now I got to know how much mpileup process is completed. Can you suggest your or some other articles with similar strategy.

PratikC is offline   Reply With Quote