SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
large samples calling indel and snp with GATK jchoo Bioinformatics 0 06-24-2012 10:13 PM
SNP base calling for multiple samples shuang Bioinformatics 2 09-07-2011 02:06 PM
tools for SNP calling in pooled samples gfmgfm Bioinformatics 0 12-30-2010 09:57 AM
SNP calling software in pooled samples mrxcm3 Bioinformatics 3 11-03-2010 09:38 PM

Reply
 
Thread Tools
Old 04-03-2013, 07:10 AM   #1
Rainbird
Member
 
Location: US

Join Date: Dec 2012
Posts: 11
Default Snp calling between samples

There is a reference genome for subspecies A available. We did a resequencing for subspecies B and subspecies C using Hiseq 2000 and we'd like to know the SNP diversity(difference in allele frequency) between subspecies B and subspecies C.

We know there are substantial differences among subspecies A, B and C. So, what is the best way to find the SNP diversity between B and C? Should we do snp calling for subspecies B or C separately based on subspecies A reference genome (I used samtools) and then merge the results ? It seems to me we need a more efficient way to do the job but I don't know much (cortex ?).

Thanks in advance.
Rainbird is offline   Reply With Quote
Old 04-03-2013, 08:36 AM   #2
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 900
Default

Align both samples to your best reference, then use samtools mpileup on both .bams together.
swbarnes2 is offline   Reply With Quote
Old 04-03-2013, 06:35 PM   #3
Rainbird
Member
 
Location: US

Join Date: Dec 2012
Posts: 11
Default

Thanks swbarnes2

Could you explain a little more why "use samtools mpileup on both .bams together" will work? In that case, we still need the reference genome from subspecies A, right?

Another thing is that: if there are more than 1 non-reference allels reported, the samtools only gives out the depth of the 1st non-reference allel (as listed in DP4). Also, although the 1/1 indicates homozygous alternate, I don't understand the meaning of the PL value which is "131,59,26,91,0,85" (as shown below). How can we get the depths and other information for the 2nd alternate ?

chr2 213263 . A C,T 72 . DP=14;VDB=0.0355;AF1=1;AC1=2;DP4=0,0,9,4;MQ=56;FQ=-60 GT:PL:GQ 1/1:131,59,26,91,0,85:63
Rainbird is offline   Reply With Quote
Old 04-06-2013, 07:55 PM   #4
Rainbird
Member
 
Location: US

Join Date: Dec 2012
Posts: 11
Default

Anyone can help?
Rainbird is offline   Reply With Quote
Old 04-07-2013, 03:55 PM   #5
Khen
Member
 
Location: Las Vegas

Join Date: Mar 2012
Posts: 11
Default

Calling species B and C against the reference together just saves space. And yes, you will still need to use the reference genome. The output is slightly different however, so what you will get is an extra GT field:
Code:
chr2	213263	.	A	C,T	72	.	DP=14;VDB=0.0355;AF1=1;AC1=2;DP4=0,0,9,4;MQ=56;FQ=-60	GT:<Genotype of A>	GT:<Genotype of B>
I'm pretty sure that the PL field is reporting the quality score of all of the allelic possibilities, which is why you see six of them. You will have to consult the documentation for how to get multiple sample depth information.
Also, I find that the Broad Institute does a much better job documentation than does sourceforge or 1000genomes.org. Since samtools and gatk both use VCF as the standard output, you might want to start with the GATK documentation if not just switch to GATK altogether.

Hope this helps.
Khen is offline   Reply With Quote
Old 04-11-2013, 05:24 PM   #6
Rainbird
Member
 
Location: US

Join Date: Dec 2012
Posts: 11
Default

Thanks Khen.
If I undertand correctly, samtools is designed for diploid genome. If there are 2 alleles in your sample other than the allele in the reference genome (for example, the reference genome has a T, and you have a A and a G in your sample), samtool might not work well.

Is there any tool specifically designed to find allele frequency in your own samples regardless what is in the reference genome?
Rainbird is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:18 PM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.