Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Using dindel

    Hi all,

    I'm currently trying out dindel v0.12 for finding indels. However I hit a little snag and there is little help available that I can find.

    I'm running the stage two command to realign windows (second command of phase 2). The example in manual gives command:
    dindel --analysis indels --doDiploid --bamFile sample.bam --ref ref.fa --inputVarFile sample.realign_windows.2.txt --libFile sample.dindel_output.libraries.txt --outputFile sample.dindel_stage2_output_windows.2

    However running the above with correct file names doesn't work. It gives out error: Error parsing input options. and prints the usage. So what option(s) should be added to make that stage work?

    I also noticed that the phase 2 first command should have inputVarFile instead of varFile as said in the manual.

  • #2
    Kees (the author) has been quite generous about helping me past similar problems

    Just replace each $-prefixed item with the correct filename (this is pulled from some Perl code); I think the main problem you've hit is the --inputVarFile vs. --varFile inconsistency in the code
    Code:
    dindel --analysis indels --doDiploid --bamFile bamFile --ref $refFasta [B]--varFile[/B] $windowsFile  --outputFile $outputFile

    Comment


    • #3
      I think there are a couple of typos in the online documentation. The following shows how I run dindel.

      Code:
      ./dindel_x86-64  --ref chr20.fa --outputFile 1 --bamFile aln.bam --analysis getCIGARindels
      python makeWindows.py --inputVarFile 1.variants.txt --windowFilePrefix 2 --numWindowsPerFile 20000
      ./dindel_x86-64 --analysis indels --doDiploid --bamFile aln.bam --ref chr20.fa --varFile 2.1.txt --libFile 1.libraries.txt --outputFile 3 > 3.out 2> 3.err
      echo 3.glf.txt > 3.list
      python mergeOutput.py -t diploid -i 3.list -o 4.vcf -r chr20.fa

      Comment


      • #4
        Originally posted by lh3 View Post
        I think there are a couple of typos in the online documentation. The following shows how I run dindel.

        Code:
        ./dindel_x86-64  --ref chr20.fa --outputFile 1 --bamFile aln.bam --analysis getCIGARindels
        python makeWindows.py --inputVarFile 1.variants.txt --windowFilePrefix 2 --numWindowsPerFile 20000
        ./dindel_x86-64 --analysis indels --doDiploid --bamFile aln.bam --ref chr20.fa --varFile 2.1.txt --libFile 1.libraries.txt --outputFile 3 > 3.out 2> 3.err
        echo 3.glf.txt > 3.list
        python mergeOutput.py -t diploid -i 3.list -o 4.vcf -r chr20.fa
        Thanks, this is really helpful. I'm working with dindel too and I was just today wondering about these.

        Comment


        • #5
          Question regarding the --doEM option:
          I have a family of five individuals (two parents, three children), so I assume there are four haplotypes in the data set. Is there a way to set it for this (if it would make a difference)?
          Am I better off extracting each individual from the pooled BAM file and running them individually with --doDiploid instead?
          Thanks.
          Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
          Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
          Projects: U87MG whole genome sequence [Website] [Paper]

          Comment


          • #6
            Thanks for the answers lh3 and krobison. I got it running now .

            Comment


            • #7
              I used Dindel after GATK realignment/recalibration.
              It seems like this is redundant.
              Is it just as good/better to just run Dindel in a seperate pipeline directly from the original alignments?

              Another query: Do people just generally filter out those that end up with the fr0/q20/hp10/wv flags in the FILTER field?
              Last edited by Michael.James.Clark; 10-26-2010, 06:41 PM.
              Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
              Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
              Projects: U87MG whole genome sequence [Website] [Paper]

              Comment


              • #8
                In general I would advise not to use variants with quality scores below 10 for single diploid samples. The fr0 filter in the 0.12 version of Dindel does reduce the number of false positives on real data but you will also loose some sensitivity.

                It is true that running Dindel on BAMs realigned by the GATK will not result in too many new calls if you have high-depth diploid data.
                The main advantage of running Dindel currently would be for calling the genotypes: here the GATK realigned BAMs might result in undercalls as reads matching the reference are not realigned even though they may support the alternative haplotype with the indel just as well as the reference haplotype.
                Also, Dindel has a dedicated sequencing error model for homopolymer runs, which should result in more accurate calls in those contexts.
                The Broad are currently implementing the Dindel algorithm in the GATK, but I don't know exactly when it will be released (later this year I expect).

                The new version of Dindel has a script that lets you select only the indels that were seen twice or more (whatever number you prefer). If you apply this to indels extracted from the realigned BAM you will be able to significantly reduce compute time.

                Kees (Disclosure: I am the author of Dindel if it wasn't clear already).

                PS I put a new version of Dindel on the website today.

                Comment


                • #9
                  Originally posted by Michael.James.Clark View Post
                  I used Dindel after GATK realignment/recalibration.
                  It seems like this is redundant.
                  But it helps when you want to look by eye to the alignments to understand why your SNP caller performed a call.
                  -drd

                  Comment


                  • #10
                    Thanks for the update. It is a great tool that I was using to re-run several data sets.

                    For v 1.01: --numWindowsPerFile option not working.

                    I see discrepancied between QUAL and last column in vcf output:
                    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S3
                    chr13 8769 . C CA 897 PASS DP=150;NF=14;NR=13;NRS=16;NFS=13;HP=1 GT:GQ 1/1:90
                    chr13 8910 . AT A 289 PASS DP=127;NF=6;NR=6;NRS=11;NFS=10;HP=2 GT:GQ 0/1:289
                    chr13 8985 . ACT A 272 PASS DP=109;NF=13;NR=0;NRS=26;NFS=0;HP=1 GT:GQ 1/1:3

                    Can you output total read counts in vcf output? Can you generate the glf file list automaticallyas part of your makeWindows.py?

                    Comment


                    • #11
                      Anyone can feedback on the output? Did I make mistake in the run (single sample as diploid and with default settings)?



                      How can NRS+NFS = 32 with DP=81, and the genotype is 1/1? it should be heterozugous.

                      chr7 3304476 . AC A 1272 PASS DP=81;NF=20;NR=8;NRS=21;NFS=11;HP=3 GT:GQ 1/1:93


                      Below is more from the VCF4 output

                      ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total number of reads in haplotype window">
                      ##INFO=<ID=HP,Number=1,Type=Integer,Description="Reference homopolymer tract length">
                      ##INFO=<ID=NF,Number=1,Type=Integer,Description="Number of reads covering non-ref variant on forward strand">
                      ##INFO=<ID=NR,Number=1,Type=Integer,Description="Number of reads covering non-ref variant on reverse strand">
                      ##INFO=<ID=NFS,Number=1,Type=Integer,Description="Number of reads covering non-ref variant site on forward strand">
                      ##INFO=<ID=NRS,Number=1,Type=Integer,Description="Number of reads covering non-ref variant site on reverse strand">
                      ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
                      ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">
                      ##ALT=<ID=DEL,Description="Deletion">
                      ##FILTER=<ID=q5,Description="Quality below 5">
                      ##FILTER=<ID=hp10,Description="Reference homopolymer length was longer than 10">
                      ##FILTER=<ID=fr0,Description="Non-ref allele is not covered by at least one read on both strands">
                      ##FILTER=<ID=wv,Description="Other indel in window had higher likelihood">
                      #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 2044B
                      chr7 3304476 . AC A 1272 PASS DP=81;NF=20;NR=8;NRS=21;NFS=11;HP=3 GT:GQ 1/1:93
                      chr7 3311292 . G GAGA 12 PASS DP=113;NF=0;NR=0;NRS=11;NFS=36;HP=2 GT:GQ 0/1:12

                      chr3 135275377 . C CCGCTCTTCCGAT 36 PASS DP=40;NF=0;NR=0;NRS=0;NFS=0;HP=2 GT:GQ 0/1:36
                      chr3 135278476 . T TAGATCGGAAGA 3 q5 DP=130;NF=0;NR=0;NRS=0;NFS=0;HP=2 GT:GQ 0/1:3
                      chr3 135281981 . C CGCTCTTCCGATCT 15 PASS DP=42;NF=0;NR=0;NRS=1;NFS=0;HP=3 GT:GQ 0/1:15

                      Comment


                      • #12
                        Dindel on paired-end data

                        Hi all,

                        Since we want to compare samples sequenced in Sanger to our own samples we figured out that we needed the same analysis programs. Sanger informed me they have used Dindel for indels, so I wanted to use that too. Only thing is Dindel only takes one BAM file as input. Since I have paired-end reads I'm confused.
                        Do I need to merge these files with Samtools? And how does Dindel then know which reads are the pairs?

                        Kind regards
                        Jaap

                        Comment


                        • #13
                          What aligner are you using? Most aligners will take paired end data & use that in the alignment process as well as generate the proper pairing information.

                          Does dindel consider the pairing information? It could certainly have a potential value, but I'm not sure it relies on it.

                          Comment


                          • #14
                            I'm using BWA for alignment.
                            Do I understand correctly that the paired-end info is in the BWA generated BAM files? And I should merge them before I use Dindel?

                            Kind regards
                            Jaap

                            Comment


                            • #15
                              Originally posted by Jaap View Post
                              I'm using BWA for alignment.
                              Do I understand correctly that the paired-end info is in the BWA generated BAM files? And I should merge them before I use Dindel?
                              If you used sampe when processing your alignments your
                              BAM will already contain alignments from both ends(pairs).
                              Dindel will process them accordingly following the BAM standars.
                              -drd

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X