![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
large samples calling indel and snp with GATK | jchoo | Bioinformatics | 0 | 06-24-2012 11:13 PM |
SNP base calling for multiple samples | shuang | Bioinformatics | 2 | 09-07-2011 03:06 PM |
tools for SNP calling in pooled samples | gfmgfm | Bioinformatics | 0 | 12-30-2010 10:57 AM |
SNP calling software in pooled samples | mrxcm3 | Bioinformatics | 3 | 11-03-2010 10:38 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: US Join Date: Dec 2012
Posts: 16
|
![]()
There is a reference genome for subspecies A available. We did a resequencing for subspecies B and subspecies C using Hiseq 2000 and we'd like to know the SNP diversity(difference in allele frequency) between subspecies B and subspecies C.
We know there are substantial differences among subspecies A, B and C. So, what is the best way to find the SNP diversity between B and C? Should we do snp calling for subspecies B or C separately based on subspecies A reference genome (I used samtools) and then merge the results ? It seems to me we need a more efficient way to do the job but I don't know much (cortex ?). Thanks in advance. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
Align both samples to your best reference, then use samtools mpileup on both .bams together.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: US Join Date: Dec 2012
Posts: 16
|
![]()
Thanks swbarnes2
Could you explain a little more why "use samtools mpileup on both .bams together" will work? In that case, we still need the reference genome from subspecies A, right? Another thing is that: if there are more than 1 non-reference allels reported, the samtools only gives out the depth of the 1st non-reference allel (as listed in DP4). Also, although the 1/1 indicates homozygous alternate, I don't understand the meaning of the PL value which is "131,59,26,91,0,85" (as shown below). How can we get the depths and other information for the 2nd alternate ? chr2 213263 . A C,T 72 . DP=14;VDB=0.0355;AF1=1;AC1=2;DP4=0,0,9,4;MQ=56;FQ=-60 GT:PL:GQ 1/1:131,59,26,91,0,85:63 |
![]() |
![]() |
![]() |
#4 |
Member
Location: US Join Date: Dec 2012
Posts: 16
|
![]()
Anyone can help?
|
![]() |
![]() |
![]() |
#5 |
Member
Location: Las Vegas Join Date: Mar 2012
Posts: 11
|
![]()
Calling species B and C against the reference together just saves space. And yes, you will still need to use the reference genome. The output is slightly different however, so what you will get is an extra GT field:
Code:
chr2 213263 . A C,T 72 . DP=14;VDB=0.0355;AF1=1;AC1=2;DP4=0,0,9,4;MQ=56;FQ=-60 GT:<Genotype of A> GT:<Genotype of B> Also, I find that the Broad Institute does a much better job documentation than does sourceforge or 1000genomes.org. Since samtools and gatk both use VCF as the standard output, you might want to start with the GATK documentation if not just switch to GATK altogether. Hope this helps. |
![]() |
![]() |
![]() |
#6 |
Member
Location: US Join Date: Dec 2012
Posts: 16
|
![]()
Thanks Khen.
If I undertand correctly, samtools is designed for diploid genome. If there are 2 alleles in your sample other than the allele in the reference genome (for example, the reference genome has a T, and you have a A and a G in your sample), samtool might not work well. Is there any tool specifically designed to find allele frequency in your own samples regardless what is in the reference genome? |
![]() |
![]() |
![]() |
Thread Tools | |
|
|