Seqanswers Leaderboard Ad

**krobison** · 10-12-2010, 06:28 AM

There is Dindel and there is Pindel.

Where will this naming endel? Perhaps when a developer meets Grendel.

**lh3** · 10-14-2010, 07:39 PM

I mean dindel (not pindel). It is here: http://sites.google.com/site/keesalbers/soft/dindel. Dindel does very sophisticated realignment, more sophisticated than most (if not all) other software, while the performance of an indel caller is largely determined by the quality of realignment. From the SRMA paper, I do not think it matches dindel (dindel has an HMM to evaluate many possible alignments and it explicitly models diploid). The GATK group also agree that dindel is better.

Note that realignment for indel calling requires more than realignment for better SNP calls and is harder.

The major disadvantage of dindel is its inefficiency.

**nilshomer** · 10-14-2010, 08:12 PM

I have to chime in here.

Modeling ploidy is not always desirable, say when you are re-aligning heterogeneous cancer samples, or trying to re-align many samples in tandem (one variant in a sample can inform the entire population). Also, slow re-alignment is not desirable if you need results quickly (think whole-genome clinical). In some cases we are more interested in the rare variants that are not always diploid than perfectly calling dbSNP positions.

A two-stage approach may be to use a fast re-aligner for the whole-genome, and target difficult regions with slower but more sensitive re-aligners. Anyhow, I have not seen any re-alignment comparisons (@lh3 I would be interested to see yours since you are usually thorough), but like we saw in the alignment world, there will always be a trade-off between efficiency and sensitivity (think BWTSW versus short-read aligners).

SRMA is open source if you would like to contribute (especially to the C-version

).

**lh3** · 10-15-2010, 07:20 AM

I was just saying that if one wants to get the best indel calls, (s)he should definitely try dindel. I guess the ploidy modeling can be switched off.

As to speed, dindel will be used for hundreds of samples from the 1000 genomes project. It is slow, but still affordable.

**SeqAnswerSeeker** · 10-16-2010, 12:56 AM

Thank you lh3, I will definitely give Dindel a try!

**Michael.James.Clark** · 10-16-2010, 11:02 AM

So what flow is being suggested here?

After alignment, do realignment with SRMA (or GATK I guess), then do SNV detection and indel detection with GATK. Then, if needed, Dindel will pick up additional indels.

I'm genuinely curious if the true indel yield with Dindel is significantly greater than SRMA to the point of adding a completely new, slower informatic step to the process for every experiment.

We could just do SRMA, then both SNV and indel detection with GATK (this is exactly what I was planning until Dindel was mentioned here). It seems like realignment and then Dindel might be redundant unless nothing interesting is detected with the more straight-forward approach.

I'm just finished with a number of exome alignments and my plan was GATK+SRMA for the whole thing. I'm thinking if I do not find what I'm looking for (I'm doing Mendelian family exomes, so the variant should be obvious), then maybe Dindel is worth using.

Complete Genomics uses assembly over all non-reference base calls that pass a score threshold. It does seem to be very effective. Anyone tried anything like that?

Any other suggestions?

**lh3** · 10-16-2010, 08:28 PM

The GATK group is reimplementing the dindel model because they think it is clearly better both in theory and in practice. As I tried dindel, it is faster than srma. As to CG, many including me have been deeply impressed, but so far as I know no one is doing the same for Illumina/SOLiD/454.

EDIT: a third (sort of) realignment approach is the one I am selling at the samtools mailing list: base alignment quality.

EDIT2: For the first time, I read how gatk's indel caller works. The caller itself does not do realignment like samtools. It relies on gatk's realigner. If one does not run the realigner, the caller will definitely miss a lot of indels, although the remaining are of very high quality.

EDIT3: I misinterpreted the dindel output. In fact, dindel is clearly better than samtools' indel caller in terms of specificity (at the cost of sensitivity).

**nilshomer** · 10-16-2010, 10:05 PM

Originally posted by lh3 View Post

The GATK group is reimplementing the dindel model because they think it is clearly better both in theory and in practice. As I tried dindel, it is faster than srma, but its accuracy is not as high as what I would expect. As to CG, many including me have been deeply impressed, but so far as I know no one is doing the same for Illumina/SOLiD/454.

EDIT: a third (sort of) realignment approach is the one I am selling at the samtools mailing list: base alignment quality.

Is their a public document describing the dindel model? I would be interested in thinking about how to allow it to handle SOLiD/454 data and their error properties; dindel says it only handles Illumina data so far. Maybe the GATK folks are already thinking about this.

I like the base alignment quality idea. Is the BAQ just the posterior probability of a read base aligning to a ref base using the forward/backward algorithm on Smith-Waterman HMM? Could you use the BAQ within the dindel model (I assume it is Bayesian)?

**lh3** · 10-17-2010, 08:31 PM

The following is what I would recommend for Illumina:

1. Do alignment with novoalign or bwa. Mosaik is also great, but unfortunately it does not write soft clippings, which will affect programs at a later step. Bowtie is not recommended because it does not do gapped alignment.

2. If you have bandwidth, do realignment with GATK. If you do not, it actually does not matter too much. The major downside of not doing realignment is you may get confusing alignment in an alignment viewer (the most frequent question is "why the indel caller is calling an indel when there is only 1 read supporting that?").

3. Cap base quality BAQ (with samtools).

4. Call SNPs with whatever SNP caller. It does not matter too much when indels are cleaned.

5. Call indels with dindel. The Dindel group has shown convincing evidence that it is clearly better. When I evaluate it by myself, I am convinced again. It is much more sensitive than gatk realigner+IndelGenotyperV2; its specificity is also better. For exonome sequencing the difference is probably smaller because the hard regions are mostly related to repeats.

**krobison** · 10-18-2010, 12:09 PM

EDIT: a third (sort of) realignment approach is the one I am selling at the samtools mailing list: base alignment quality.

Could you post a URL into the thread you are referencing?

**lh3** · 10-18-2010, 12:26 PM

Thread: [Samtools-help] New feature: Base Alignment Quality (BAQ) | SAM tools

https://sourceforge.net/mailarchive/forum.php?thread_name=EEEA3872-183F-4C2B-8FFB-7CE3EE877303%40sanger.ac.uk&forum_name=samtools-help

**Michael.James.Clark** · 10-22-2010, 12:01 PM

What tools do people use for coding consequence determination after all of this?

**svl** · 10-23-2010, 08:26 AM

Originally posted by Michael.James.Clark View Post

for coding consequence determination

If you mean how to get the consequence of a variation (whether it's a SNV or a small INDEL) -> we use the ensembl snp effect predictor:

404 Not Found

http://www.ensembl.org/tools.html

When using the ensembl perl API you can use this predictor by creating a variation object and get the consequence, which could be any of the ones listed here:

Variation

http://www.ensembl.org/info/docs/variation/index.html

/svl

**Michael.James.Clark** · 10-25-2010, 12:05 PM

Great, thanks.

There's also SIFT from JCVI ( http://sift.jcvi.org/ ) and PolyPhen ( http://genetics.bwh.harvard.edu/pph/ ). Both very good tools but with some limitations.

**krobison** · 10-25-2010, 04:53 PM

A new tool, GAMES, just published.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News