SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
GATK: -glm DINDEL question Michael.James.Clark Bioinformatics 36 04-25-2012 12:39 PM
one question in GATK evonne16 General 4 02-01-2012 05:47 PM
DSN experts? whw Sample Prep / Library Generation 3 08-14-2011 06:29 PM
GATK help! adaptivegenome Bioinformatics 0 01-17-2011 08:01 PM
question about GATK? loveseq Bioinformatics 1 11-29-2010 08:29 AM

Reply
 
Thread Tools
Old 05-07-2011, 07:28 PM   #1
newbietonextgen
Member
 
Location: USA

Join Date: Nov 2010
Posts: 56
Question Question for GATK experts.....

I called SNPs using GATK. I have a question.

If i call SNPs using a single bam file and i get a set of SNPs (850 SNPs).

Now i called SNPs from multiple bam alignments (not merged bam files, but listing them in consecutive order to get a single VCF file), i get more SNPs for the same sample (2598 SNPs). How is this possible? What am i going wrong? I am using the same filter conditions etc.
newbietonextgen is offline   Reply With Quote
Old 05-08-2011, 11:41 AM   #2
d17
Member
 
Location: United States

Join Date: Sep 2008
Posts: 27
Default

What is probably happening is that the 2600 minus 850 SNPs in your sample (call it sample #1) that are only called in the multi-sample SNP calling run are SNPs that didn't have enough evidence to be called as SNPs in sample #1 alone, but that did show evidence of being SNPs in other samples. Seeing the site as a SNP in other samples affects the probability that it is called as a SNP in sample #1.

Intuitively, the situation is as follows: If we run SNP calling on sample #1 alone and see a site that has a modest amount of evidence that it is a SNP, it will probably not pass the filtering thresholds. If we run SNP calling on a bunch of samples and see that the same site has strong evidence of being a SNP in a different sample we can be more confident that the site is truly a SNP in sample #1. This is I believe one of the main advantages of multi-sample calling.
d17 is offline   Reply With Quote
Old 05-08-2011, 03:18 PM   #3
newbietonextgen
Member
 
Location: USA

Join Date: Nov 2010
Posts: 56
Default

Thanks d17. I understand as you say the depth of coverage increases when you use multi-sample for a location, thus increasing the number of SNPS in the resulting VCF file. Now the question is which one is correct (single sample or multi sample (I know both are correct, but what would one use for ASE)?
newbietonextgen is offline   Reply With Quote
Old 06-06-2011, 08:34 AM   #4
prisnirath
Member
 
Location: Newcastle upon Tyne

Join Date: May 2011
Posts: 19
Default

Hi there,
I am using GATK to call SNPs from my sam files (from 454 data).
I am using he following pipeline::
Quote:
SAM to BAM
samtools import BRCA1_coding.fasta out1FR_bwasw.sam new_out1FR_bwasw.bam

Sort BAM
samtools sort new_out1FR_bwasw.bam new_out1FR_bwasw.sorted

Index BAM
samtools index new_out1FR_bwasw.sorted.bam new_out1FR_bwasw.sorted.bam.bai

Identify target regions for realignment
java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T RealignerTargetCreator -R BRCA1_coding.fasta -I new_out1FR_bwasw.sorted.bam -o new_out1FR_bwasw.intervals
And I get a an interval file that has 4 locations which is a subset of the regions i identified using tablet viewer.
Following note from command line.
Quote:
14:53:16,100 TraversalEngine - 0 reads were filtered out during traversal out of 565 total (0.00%)
Then,
Quote:
Realign BAM to get better Indel calling
java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T IndelRealigner -R BRCA1_coding.fasta -I new_out1FR_bwasw.sorted.bam -targetIntervals new_out1FR_bwasw.intervals -o new_out1FR_bwasw.sorted.realigned.bam
Add or Replace read group
java -jar ~/bin/picard-tools-1.45/AddOrReplaceReadGroups.jar I= new_out1FR_bwasw.sorted.realigned.bam O= new_out1FR_bwasw_new.sorted.realigned.bam SORT_ORDER=coordinate RGID=foo RGLB=bar RGPL=illumina RGSM=DePristo RGPU= GGDP4G001BFFBZ CREATE_INDEX=True
Reindex the realigned BAM
java -jar ~/bin/picard-tools-1.45/ReorderSam.jar I=new_out1FR_bwasw_new.sorted.realigned.bam O= new_out1FR_bwasw.resorted.realigned.bam REFERENCE= BRCA1_coding.fasta
samtools index new_out1FR_bwasw.resorted.realigned.bam new_out1FR_bwasw.resorted.realigned.bam.bai
Call SNPs
java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T UnifiedGenotyper -R BRCA1_coding.fasta -I new_out1FR_bwasw.resorted.realigned.bam -o new_out1FR_bwasw.vcf.calls -stand_call_conf 30.0 -stand_emit_conf 10.0
And I am getting only 1 SNP called of a very low quality and in the region of read depth 1.
This region doesn't coincide with the intervals identified before.
Also, when I compare the SNP called with my results from VarScan, there is no similarity.

Can anyone please suggest how to improve SNP calling?
Or is GATK not suitable for SNP calling in long read data from 454?
prisnirath is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:33 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO