![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
GATK: -glm DINDEL question | Michael.James.Clark | Bioinformatics | 36 | 04-25-2012 12:39 PM |
one question in GATK | evonne16 | General | 4 | 02-01-2012 05:47 PM |
DSN experts? | whw | Sample Prep / Library Generation | 3 | 08-14-2011 06:29 PM |
GATK help! | adaptivegenome | Bioinformatics | 0 | 01-17-2011 08:01 PM |
question about GATK? | loveseq | Bioinformatics | 1 | 11-29-2010 08:29 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: USA Join Date: Nov 2010
Posts: 56
|
![]()
I called SNPs using GATK. I have a question.
If i call SNPs using a single bam file and i get a set of SNPs (850 SNPs). Now i called SNPs from multiple bam alignments (not merged bam files, but listing them in consecutive order to get a single VCF file), i get more SNPs for the same sample (2598 SNPs). How is this possible? What am i going wrong? I am using the same filter conditions etc. |
![]() |
![]() |
![]() |
#2 |
Member
Location: United States Join Date: Sep 2008
Posts: 27
|
![]()
What is probably happening is that the 2600 minus 850 SNPs in your sample (call it sample #1) that are only called in the multi-sample SNP calling run are SNPs that didn't have enough evidence to be called as SNPs in sample #1 alone, but that did show evidence of being SNPs in other samples. Seeing the site as a SNP in other samples affects the probability that it is called as a SNP in sample #1.
Intuitively, the situation is as follows: If we run SNP calling on sample #1 alone and see a site that has a modest amount of evidence that it is a SNP, it will probably not pass the filtering thresholds. If we run SNP calling on a bunch of samples and see that the same site has strong evidence of being a SNP in a different sample we can be more confident that the site is truly a SNP in sample #1. This is I believe one of the main advantages of multi-sample calling. |
![]() |
![]() |
![]() |
#3 |
Member
Location: USA Join Date: Nov 2010
Posts: 56
|
![]()
Thanks d17. I understand as you say the depth of coverage increases when you use multi-sample for a location, thus increasing the number of SNPS in the resulting VCF file. Now the question is which one is correct (single sample or multi sample (I know both are correct, but what would one use for ASE)?
|
![]() |
![]() |
![]() |
#4 | |||
Member
Location: Newcastle upon Tyne Join Date: May 2011
Posts: 19
|
![]()
Hi there,
I am using GATK to call SNPs from my sam files (from 454 data). I am using he following pipeline:: Quote:
Following note from command line. Quote:
Quote:
This region doesn't coincide with the intervals identified before. Also, when I compare the SNP called with my results from VarScan, there is no similarity. Can anyone please suggest how to improve SNP calling? Or is GATK not suitable for SNP calling in long read data from 454? |
|||
![]() |
![]() |
![]() |
Thread Tools | |
|
|