Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    There is Dindel and there is Pindel.

    Where will this naming endel? Perhaps when a developer meets Grendel.

    Comment


    • #17
      I mean dindel (not pindel). It is here: http://sites.google.com/site/keesalbers/soft/dindel. Dindel does very sophisticated realignment, more sophisticated than most (if not all) other software, while the performance of an indel caller is largely determined by the quality of realignment. From the SRMA paper, I do not think it matches dindel (dindel has an HMM to evaluate many possible alignments and it explicitly models diploid). The GATK group also agree that dindel is better.

      Note that realignment for indel calling requires more than realignment for better SNP calls and is harder.

      The major disadvantage of dindel is its inefficiency.
      Last edited by lh3; 10-14-2010, 07:41 PM.

      Comment


      • #18
        I have to chime in here.

        Modeling ploidy is not always desirable, say when you are re-aligning heterogeneous cancer samples, or trying to re-align many samples in tandem (one variant in a sample can inform the entire population). Also, slow re-alignment is not desirable if you need results quickly (think whole-genome clinical). In some cases we are more interested in the rare variants that are not always diploid than perfectly calling dbSNP positions.

        A two-stage approach may be to use a fast re-aligner for the whole-genome, and target difficult regions with slower but more sensitive re-aligners. Anyhow, I have not seen any re-alignment comparisons (@lh3 I would be interested to see yours since you are usually thorough), but like we saw in the alignment world, there will always be a trade-off between efficiency and sensitivity (think BWTSW versus short-read aligners).

        SRMA is open source if you would like to contribute (especially to the C-version ).

        Comment


        • #19
          I was just saying that if one wants to get the best indel calls, (s)he should definitely try dindel. I guess the ploidy modeling can be switched off.

          As to speed, dindel will be used for hundreds of samples from the 1000 genomes project. It is slow, but still affordable.

          Comment


          • #20
            Thank you lh3, I will definitely give Dindel a try!

            Comment


            • #21
              So what flow is being suggested here?

              After alignment, do realignment with SRMA (or GATK I guess), then do SNV detection and indel detection with GATK. Then, if needed, Dindel will pick up additional indels.

              I'm genuinely curious if the true indel yield with Dindel is significantly greater than SRMA to the point of adding a completely new, slower informatic step to the process for every experiment.

              We could just do SRMA, then both SNV and indel detection with GATK (this is exactly what I was planning until Dindel was mentioned here). It seems like realignment and then Dindel might be redundant unless nothing interesting is detected with the more straight-forward approach.

              I'm just finished with a number of exome alignments and my plan was GATK+SRMA for the whole thing. I'm thinking if I do not find what I'm looking for (I'm doing Mendelian family exomes, so the variant should be obvious), then maybe Dindel is worth using.

              Complete Genomics uses assembly over all non-reference base calls that pass a score threshold. It does seem to be very effective. Anyone tried anything like that?

              Any other suggestions?
              Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
              Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
              Projects: U87MG whole genome sequence [Website] [Paper]

              Comment


              • #22
                The GATK group is reimplementing the dindel model because they think it is clearly better both in theory and in practice. As I tried dindel, it is faster than srma. As to CG, many including me have been deeply impressed, but so far as I know no one is doing the same for Illumina/SOLiD/454.

                EDIT: a third (sort of) realignment approach is the one I am selling at the samtools mailing list: base alignment quality.

                EDIT2: For the first time, I read how gatk's indel caller works. The caller itself does not do realignment like samtools. It relies on gatk's realigner. If one does not run the realigner, the caller will definitely miss a lot of indels, although the remaining are of very high quality.

                EDIT3: I misinterpreted the dindel output. In fact, dindel is clearly better than samtools' indel caller in terms of specificity (at the cost of sensitivity).
                Last edited by lh3; 10-17-2010, 08:34 PM.

                Comment


                • #23
                  Originally posted by lh3 View Post
                  The GATK group is reimplementing the dindel model because they think it is clearly better both in theory and in practice. As I tried dindel, it is faster than srma, but its accuracy is not as high as what I would expect. As to CG, many including me have been deeply impressed, but so far as I know no one is doing the same for Illumina/SOLiD/454.

                  EDIT: a third (sort of) realignment approach is the one I am selling at the samtools mailing list: base alignment quality.
                  Is their a public document describing the dindel model? I would be interested in thinking about how to allow it to handle SOLiD/454 data and their error properties; dindel says it only handles Illumina data so far. Maybe the GATK folks are already thinking about this.

                  I like the base alignment quality idea. Is the BAQ just the posterior probability of a read base aligning to a ref base using the forward/backward algorithm on Smith-Waterman HMM? Could you use the BAQ within the dindel model (I assume it is Bayesian)?

                  Comment


                  • #24
                    The following is what I would recommend for Illumina:

                    1. Do alignment with novoalign or bwa. Mosaik is also great, but unfortunately it does not write soft clippings, which will affect programs at a later step. Bowtie is not recommended because it does not do gapped alignment.

                    2. If you have bandwidth, do realignment with GATK. If you do not, it actually does not matter too much. The major downside of not doing realignment is you may get confusing alignment in an alignment viewer (the most frequent question is "why the indel caller is calling an indel when there is only 1 read supporting that?").

                    3. Cap base quality BAQ (with samtools).

                    4. Call SNPs with whatever SNP caller. It does not matter too much when indels are cleaned.

                    5. Call indels with dindel. The Dindel group has shown convincing evidence that it is clearly better. When I evaluate it by myself, I am convinced again. It is much more sensitive than gatk realigner+IndelGenotyperV2; its specificity is also better. For exonome sequencing the difference is probably smaller because the hard regions are mostly related to repeats.

                    Comment


                    • #25
                      EDIT: a third (sort of) realignment approach is the one I am selling at the samtools mailing list: base alignment quality.

                      Could you post a URL into the thread you are referencing?

                      Comment


                      • #27
                        What tools do people use for coding consequence determination after all of this?
                        Last edited by Michael.James.Clark; 10-22-2010, 12:24 PM.
                        Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                        Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                        Projects: U87MG whole genome sequence [Website] [Paper]

                        Comment


                        • #28
                          Originally posted by Michael.James.Clark View Post
                          for coding consequence determination
                          If you mean how to get the consequence of a variation (whether it's a SNV or a small INDEL) -> we use the ensembl snp effect predictor:



                          When using the ensembl perl API you can use this predictor by creating a variation object and get the consequence, which could be any of the ones listed here:



                          /svl

                          Comment


                          • #29
                            Great, thanks.

                            There's also SIFT from JCVI ( http://sift.jcvi.org/ ) and PolyPhen ( http://genetics.bwh.harvard.edu/pph/ ). Both very good tools but with some limitations.
                            Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
                            Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
                            Projects: U87MG whole genome sequence [Website] [Paper]

                            Comment


                            • #30
                              A new tool, GAMES, just published.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              59 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              57 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              56 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X