SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Exome capture validation arvi8689 Genomic Resequencing 2 01-18-2012 01:45 AM
PubMed: SNVer: a statistical tool for variant calling in analysis of pooled or indivi Newsbot! Literature Watch 0 01-04-2012 02:10 AM
Illumina TruSeq exome capture Geneus Sample Prep / Library Generation 0 02-17-2011 07:50 AM
Barcode before exome capture upenn_ngs Sample Prep / Library Generation 5 11-01-2010 02:27 PM
Whole Exome Capture Bruce E Illumina/Solexa 2 02-25-2010 06:29 AM

Reply
 
Thread Tools
Old 08-26-2010, 07:50 PM   #1
sbaheti
Member
 
Location: Rochester

Join Date: Jul 2010
Posts: 12
Default Variant Calling for Exome Capture Analysis

HI

I am trying various tools to call variants in the exome data i have to get to standardize our work flow. I have few parameters to discuss, like what should be the min read depth to get less false positive variant calls. Ans when we do indel calls what should we use for indel-supported reads parameter.
I am analyzing GATK and samtools for the same. If any body having experience in this area please let me know the acceptable parameters to use for comparison.
Thanks

Saurabh
sbaheti is offline   Reply With Quote
Old 08-27-2010, 06:23 AM   #2
JohnK
Senior Member
 
Location: Los Angeles, China.

Join Date: Feb 2010
Posts: 106
Default

I'm interested in this too. Keep me posted on the indel/SNP options please? I'm not finding BioScope to be satisfactory for clients' needs.
JohnK is offline   Reply With Quote
Old 08-27-2010, 10:56 AM   #3
sbaheti
Member
 
Location: Rochester

Join Date: Jul 2010
Posts: 12
Default

I tried both samtools and GATK for indel calling for BWA alignment results. The numbers from Samtools are way more than by GATK. But when i see the overlap between them that looks promising. Samtools is covering allmost all the indel calls from GATK. I didn't check the exculsive calls from Samtools.
For this I used 6X coverage criterion and min 5 indel-supported reads to call as a indel. Probably best thing is to check the exculsive calls from samtools using IGV and flag them as false positive.
sbaheti is offline   Reply With Quote
Old 08-27-2010, 12:56 PM   #4
Lee Sam
Member
 
Location: Ann Arbor, MI

Join Date: Oct 2008
Posts: 57
Default

Quote:
Originally Posted by sbaheti View Post
I tried both samtools and GATK for indel calling for BWA alignment results. The numbers from Samtools are way more than by GATK. But when i see the overlap between them that looks promising. Samtools is covering allmost all the indel calls from GATK. I didn't check the exculsive calls from Samtools.
For this I used 6X coverage criterion and min 5 indel-supported reads to call as a indel. Probably best thing is to check the exculsive calls from samtools using IGV and flag them as false positive.
I'm running both samtools and GATK as well for our exome-captured data. I'm curious why you would consider samtools-exclusive indels as false positives.
Lee Sam is offline   Reply With Quote
Old 08-27-2010, 01:19 PM   #5
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Remember to set a quality threshold (about 50) on indels. Also the best indel caller so far is believed to be Dindel.
lh3 is offline   Reply With Quote
Old 08-27-2010, 01:22 PM   #6
sbaheti
Member
 
Location: Rochester

Join Date: Jul 2010
Posts: 12
Default

On spot check on the exculsive Indels from Samtools, i have noticed that there are two indels close to each other and samtools call one as a indel not the other. I am not sure why and GATK discards both of them. But i cant generalize this as i just checked few of those only. I am not sure we can get the comparision if we run default Samtools and GATK as both have different defulat parameters.
sbaheti is offline   Reply With Quote
Old 08-27-2010, 01:25 PM   #7
sbaheti
Member
 
Location: Rochester

Join Date: Jul 2010
Posts: 12
Default

Thanks lh3, i will definately try threshold for Indel as 50 and will look how Dindel works..

Last edited by sbaheti; 08-27-2010 at 01:34 PM.
sbaheti is offline   Reply With Quote
Old 08-27-2010, 01:37 PM   #8
Lee Sam
Member
 
Location: Ann Arbor, MI

Join Date: Oct 2008
Posts: 57
Default

Quote:
Originally Posted by lh3 View Post
Remember to set a quality threshold (about 50) on indels. Also the best indel caller so far is believed to be Dindel.
How did the belief that dindel was the best indel caller came about? I haven't seen any comparison papers (at the least I am not aware of them) so I would honestly like to know.
Lee Sam is offline   Reply With Quote
Old 08-29-2010, 07:26 PM   #9
Nomijill
Member
 
Location: Southwest Florida

Join Date: Sep 2009
Posts: 24
Default

Hi all,

For the indels that you are looking for, what are your read lengths and what are the sizes of the indels you expect to find?

Thanks,
Naomi
Nomijill is offline   Reply With Quote
Old 08-30-2010, 06:47 AM   #10
sbaheti
Member
 
Location: Rochester

Join Date: Jul 2010
Posts: 12
Default

Quote:
Originally Posted by Nomijill View Post
Hi all,

For the indels that you are looking for, what are your read lengths and what are the sizes of the indels you expect to find?

Thanks,
Naomi
I am looking for accuarte indel calls with less false positives. We want to use the indel caller for exome capture analysis. The read length we normally use are 50 and 75. I am not sure about the size of the indels, but we always target for short indels.

Thanks,
Saurabh
sbaheti is offline   Reply With Quote
Old 08-30-2010, 09:50 AM   #11
sbaheti
Member
 
Location: Rochester

Join Date: Jul 2010
Posts: 12
Default

I have one more question
We were looking at MAQ aligned reads for a dataset, calling variants using MAQ and Samtools, and the 2 results are not as similar as we expected. Could you comment on something we are missing here, or this is what you would expect too!

Results : (These numbers are for SNPs)
I am following convention as (Aligner_Variant caller)
MAQ_MAQ = 13525(Exclusive)
MAQ_SAMTOOLS = 21701 (Exclusive)
COMMON = 69689

SCRIPTS USED:
$ samtools pileup -vcf *.sorted.bam > *.pileup

Call the variant using samtool.pl perl script and using Varfilter and N 3 max SNPs in a window
$ samtools.pl varFilter -N 3 -D 100 *.pileup > *.raw.variant

Discard the calls for variant if phred is less than 40
$ cat *.raw.snp | awk $6>=40 > *.raw.filtered.snp


$VER/maq cns2snp consensus.cns > cns.snp
$VER/maq indelpe $genome $map_file > cns.indel.pe
$VER/scripts/maq.pl SNPfilter -f cns.indel -F cns.indel.pe cns.snp > cns.filter.snp
$VER/maq cns2win consensus.cns > cns.win
awk '$5>=40' cns.filter.snp > cns.final.snp

Thanks,
Saurabh
sbaheti is offline   Reply With Quote
Old 08-30-2010, 04:42 PM   #12
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

Quote:
Originally Posted by Lee Sam View Post
How did the belief that dindel was the best indel caller came about? I haven't seen any comparison papers (at the least I am not aware of them) so I would honestly like to know.
If you were part of the 1000 genomes project, you would know. The paper is coming.
lh3 is offline   Reply With Quote
Old 08-30-2010, 09:54 PM   #13
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

Quote:
Originally Posted by lh3 View Post
If you were part of the 1000 genomes project, you would know. The paper is coming.
Doh! I've just spent a weekend running GATK :-(

[ flame mode on ]
BTW, I'm not a master of variations and I'm probably the last who can say anything about, nevertheless I don't think 1000 Genomes team holds the Truth. I wish I was part of that project, but unfortunately I have to work somewhere else...
[ flame mode off ]

d

Last edited by dawe; 08-31-2010 at 01:23 AM.
dawe is offline   Reply With Quote
Old 08-31-2010, 07:54 AM   #14
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by lh3 View Post
If you were part of the 1000 genomes project, you would know. The paper is coming.
I have to politely disagree, since I do not know and I am a member of the project. I have not seen a re-alignment method comparison within the 1000 genomes project.
nilshomer is offline   Reply With Quote
Old 10-12-2010, 02:35 AM   #15
SeqAnswerSeeker
Junior Member
 
Location: Heidelberg, Germany

Join Date: Apr 2010
Posts: 3
Default

Just to make sure -- we are talking about PINDEL here, right? Or does the 1000 genomes have a new indel caller that is still unpublished (called DINDEL)
SeqAnswerSeeker is offline   Reply With Quote
Old 10-12-2010, 06:28 AM   #16
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

There is Dindel and there is Pindel.

Where will this naming endel? Perhaps when a developer meets Grendel.
krobison is offline   Reply With Quote
Old 10-14-2010, 07:39 PM   #17
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

I mean dindel (not pindel). It is here: http://sites.google.com/site/keesalbers/soft/dindel. Dindel does very sophisticated realignment, more sophisticated than most (if not all) other software, while the performance of an indel caller is largely determined by the quality of realignment. From the SRMA paper, I do not think it matches dindel (dindel has an HMM to evaluate many possible alignments and it explicitly models diploid). The GATK group also agree that dindel is better.

Note that realignment for indel calling requires more than realignment for better SNP calls and is harder.

The major disadvantage of dindel is its inefficiency.

Last edited by lh3; 10-14-2010 at 07:41 PM.
lh3 is offline   Reply With Quote
Old 10-14-2010, 08:12 PM   #18
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

I have to chime in here.

Modeling ploidy is not always desirable, say when you are re-aligning heterogeneous cancer samples, or trying to re-align many samples in tandem (one variant in a sample can inform the entire population). Also, slow re-alignment is not desirable if you need results quickly (think whole-genome clinical). In some cases we are more interested in the rare variants that are not always diploid than perfectly calling dbSNP positions.

A two-stage approach may be to use a fast re-aligner for the whole-genome, and target difficult regions with slower but more sensitive re-aligners. Anyhow, I have not seen any re-alignment comparisons (@lh3 I would be interested to see yours since you are usually thorough), but like we saw in the alignment world, there will always be a trade-off between efficiency and sensitivity (think BWTSW versus short-read aligners).

SRMA is open source if you would like to contribute (especially to the C-version ).
nilshomer is offline   Reply With Quote
Old 10-15-2010, 07:20 AM   #19
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

I was just saying that if one wants to get the best indel calls, (s)he should definitely try dindel. I guess the ploidy modeling can be switched off.

As to speed, dindel will be used for hundreds of samples from the 1000 genomes project. It is slow, but still affordable.
lh3 is offline   Reply With Quote
Old 10-16-2010, 12:56 AM   #20
SeqAnswerSeeker
Junior Member
 
Location: Heidelberg, Germany

Join Date: Apr 2010
Posts: 3
Default

Thank you lh3, I will definitely give Dindel a try!
SeqAnswerSeeker is offline   Reply With Quote
Reply

Tags
dindel, exome

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:46 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO