![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
how to use dindel | libiyagirl | Bioinformatics | 12 | 07-25-2012 04:04 AM |
DINDEL --varFile not specified | jorge | Bioinformatics | 5 | 01-23-2012 07:13 AM |
Dindel | zhangtao13039 | Bioinformatics | 3 | 12-01-2011 12:34 PM |
dindel question | csoong | Bioinformatics | 0 | 02-25-2011 12:32 PM |
Dindel --outputRealignedBAM | fitzgeraldlm | Bioinformatics | 3 | 02-04-2011 07:33 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Finland Join Date: Nov 2009
Posts: 19
|
![]()
Hi all,
I'm currently trying out dindel v0.12 for finding indels. However I hit a little snag and there is little help available that I can find. I'm running the stage two command to realign windows (second command of phase 2). The example in manual gives command: dindel --analysis indels --doDiploid --bamFile sample.bam --ref ref.fa --inputVarFile sample.realign_windows.2.txt --libFile sample.dindel_output.libraries.txt --outputFile sample.dindel_stage2_output_windows.2 However running the above with correct file names doesn't work. It gives out error: Error parsing input options. and prints the usage. So what option(s) should be added to make that stage work? I also noticed that the phase 2 first command should have inputVarFile instead of varFile as said in the manual. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Boston area Join Date: Nov 2007
Posts: 747
|
![]()
Kees (the author) has been quite generous about helping me past similar problems
Just replace each $-prefixed item with the correct filename (this is pulled from some Perl code); I think the main problem you've hit is the --inputVarFile vs. --varFile inconsistency in the code Code:
dindel --analysis indels --doDiploid --bamFile bamFile --ref $refFasta --varFile $windowsFile --outputFile $outputFile |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Boston Join Date: Feb 2008
Posts: 693
|
![]()
I think there are a couple of typos in the online documentation. The following shows how I run dindel.
Code:
./dindel_x86-64 --ref chr20.fa --outputFile 1 --bamFile aln.bam --analysis getCIGARindels python makeWindows.py --inputVarFile 1.variants.txt --windowFilePrefix 2 --numWindowsPerFile 20000 ./dindel_x86-64 --analysis indels --doDiploid --bamFile aln.bam --ref chr20.fa --varFile 2.1.txt --libFile 1.libraries.txt --outputFile 3 > 3.out 2> 3.err echo 3.glf.txt > 3.list python mergeOutput.py -t diploid -i 3.list -o 4.vcf -r chr20.fa |
![]() |
![]() |
![]() |
#4 | |
Member
Location: Ann Arbor, MI Join Date: Oct 2008
Posts: 57
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Palo Alto Join Date: Apr 2009
Posts: 213
|
![]()
Question regarding the --doEM option:
I have a family of five individuals (two parents, three children), so I assume there are four haplotypes in the data set. Is there a way to set it for this (if it would make a difference)? Am I better off extracting each individual from the pooled BAM file and running them individually with --doDiploid instead? Thanks.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog] Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post] Projects: U87MG whole genome sequence [Website] [Paper] |
![]() |
![]() |
![]() |
#6 |
Member
Location: Finland Join Date: Nov 2009
Posts: 19
|
![]()
Thanks for the answers lh3 and krobison. I got it running now
![]() |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Palo Alto Join Date: Apr 2009
Posts: 213
|
![]()
I used Dindel after GATK realignment/recalibration.
It seems like this is redundant. Is it just as good/better to just run Dindel in a seperate pipeline directly from the original alignments? Another query: Do people just generally filter out those that end up with the fr0/q20/hp10/wv flags in the FILTER field?
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog] Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post] Projects: U87MG whole genome sequence [Website] [Paper] Last edited by Michael.James.Clark; 10-26-2010 at 07:41 PM. |
![]() |
![]() |
![]() |
#8 |
Junior Member
Location: Cambridge, UK Join Date: Oct 2010
Posts: 2
|
![]()
In general I would advise not to use variants with quality scores below 10 for single diploid samples. The fr0 filter in the 0.12 version of Dindel does reduce the number of false positives on real data but you will also loose some sensitivity.
It is true that running Dindel on BAMs realigned by the GATK will not result in too many new calls if you have high-depth diploid data. The main advantage of running Dindel currently would be for calling the genotypes: here the GATK realigned BAMs might result in undercalls as reads matching the reference are not realigned even though they may support the alternative haplotype with the indel just as well as the reference haplotype. Also, Dindel has a dedicated sequencing error model for homopolymer runs, which should result in more accurate calls in those contexts. The Broad are currently implementing the Dindel algorithm in the GATK, but I don't know exactly when it will be released (later this year I expect). The new version of Dindel has a script that lets you select only the indels that were seen twice or more (whatever number you prefer). If you apply this to indels extracted from the realigned BAM you will be able to significantly reduce compute time. Kees (Disclosure: I am the author of Dindel if it wasn't clear already). PS I put a new version of Dindel on the website today. http://sites.google.com/site/keesalbers/soft/dindel |
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]()
But it helps when you want to look by eye to the alignments to understand why your SNP caller performed a call.
__________________
-drd |
![]() |
![]() |
![]() |
#10 |
Member
Location: Toronto Join Date: Jan 2008
Posts: 30
|
![]()
Thanks for the update. It is a great tool that I was using to re-run several data sets.
For v 1.01: --numWindowsPerFile option not working. I see discrepancied between QUAL and last column in vcf output: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S3 chr13 8769 . C CA 897 PASS DP=150;NF=14;NR=13;NRS=16;NFS=13;HP=1 GT:GQ 1/1:90 chr13 8910 . AT A 289 PASS DP=127;NF=6;NR=6;NRS=11;NFS=10;HP=2 GT:GQ 0/1:289 chr13 8985 . ACT A 272 PASS DP=109;NF=13;NR=0;NRS=26;NFS=0;HP=1 GT:GQ 1/1:3 Can you output total read counts in vcf output? Can you generate the glf file list automaticallyas part of your makeWindows.py? |
![]() |
![]() |
![]() |
#11 |
Member
Location: Toronto Join Date: Jan 2008
Posts: 30
|
![]()
Anyone can feedback on the output? Did I make mistake in the run (single sample as diploid and with default settings)?
How can NRS+NFS = 32 with DP=81, and the genotype is 1/1? it should be heterozugous. chr7 3304476 . AC A 1272 PASS DP=81;NF=20;NR=8;NRS=21;NFS=11;HP=3 GT:GQ 1/1:93 Below is more from the VCF4 output ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total number of reads in haplotype window"> ##INFO=<ID=HP,Number=1,Type=Integer,Description="Reference homopolymer tract length"> ##INFO=<ID=NF,Number=1,Type=Integer,Description="Number of reads covering non-ref variant on forward strand"> ##INFO=<ID=NR,Number=1,Type=Integer,Description="Number of reads covering non-ref variant on reverse strand"> ##INFO=<ID=NFS,Number=1,Type=Integer,Description="Number of reads covering non-ref variant site on forward strand"> ##INFO=<ID=NRS,Number=1,Type=Integer,Description="Number of reads covering non-ref variant site on reverse strand"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality"> ##ALT=<ID=DEL,Description="Deletion"> ##FILTER=<ID=q5,Description="Quality below 5"> ##FILTER=<ID=hp10,Description="Reference homopolymer length was longer than 10"> ##FILTER=<ID=fr0,Description="Non-ref allele is not covered by at least one read on both strands"> ##FILTER=<ID=wv,Description="Other indel in window had higher likelihood"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 2044B chr7 3304476 . AC A 1272 PASS DP=81;NF=20;NR=8;NRS=21;NFS=11;HP=3 GT:GQ 1/1:93 chr7 3311292 . G GAGA 12 PASS DP=113;NF=0;NR=0;NRS=11;NFS=36;HP=2 GT:GQ 0/1:12 chr3 135275377 . C CCGCTCTTCCGAT 36 PASS DP=40;NF=0;NR=0;NRS=0;NFS=0;HP=2 GT:GQ 0/1:36 chr3 135278476 . T TAGATCGGAAGA 3 q5 DP=130;NF=0;NR=0;NRS=0;NFS=0;HP=2 GT:GQ 0/1:3 chr3 135281981 . C CGCTCTTCCGATCT 15 PASS DP=42;NF=0;NR=0;NRS=1;NFS=0;HP=3 GT:GQ 0/1:15 |
![]() |
![]() |
![]() |
#12 |
Junior Member
Location: Leiden Join Date: Oct 2010
Posts: 3
|
![]()
Hi all,
Since we want to compare samples sequenced in Sanger to our own samples we figured out that we needed the same analysis programs. Sanger informed me they have used Dindel for indels, so I wanted to use that too. Only thing is Dindel only takes one BAM file as input. Since I have paired-end reads I'm confused. Do I need to merge these files with Samtools? And how does Dindel then know which reads are the pairs? Kind regards Jaap |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: Boston area Join Date: Nov 2007
Posts: 747
|
![]()
What aligner are you using? Most aligners will take paired end data & use that in the alignment process as well as generate the proper pairing information.
Does dindel consider the pairing information? It could certainly have a potential value, but I'm not sure it relies on it. |
![]() |
![]() |
![]() |
#14 |
Junior Member
Location: Leiden Join Date: Oct 2010
Posts: 3
|
![]()
I'm using BWA for alignment.
Do I understand correctly that the paired-end info is in the BWA generated BAM files? And I should merge them before I use Dindel? Kind regards Jaap |
![]() |
![]() |
![]() |
#15 | |
Senior Member
Location: 41°17'49"N / 2°4'42"E Join Date: Oct 2008
Posts: 323
|
![]() Quote:
BAM will already contain alignments from both ends(pairs). Dindel will process them accordingly following the BAM standars.
__________________
-drd |
|
![]() |
![]() |
![]() |
#16 |
Junior Member
Location: Leiden Join Date: Oct 2010
Posts: 3
|
![]()
Ah Ofcourse. Was confused by the separate alignment files.
Thanks. |
![]() |
![]() |
![]() |
#17 |
Member
Location: canada Join Date: Jan 2011
Posts: 27
|
![]()
In my case, when I was running
$ python makeWindows.py --inputVarFile 1.txt --windowFileFrefix sample.realign_windows--numWindowsPerFile 1000 it says I should specify --windowFilePrefix why? how to solve this problem? |
![]() |
![]() |
![]() |
#18 |
Member
Location: canada Join Date: Jan 2011
Posts: 27
|
![]()
Please ignore my message, problem solved!
|
![]() |
![]() |
![]() |
#19 |
Junior Member
Location: Berlin, Germany Join Date: Mar 2011
Posts: 1
|
![]()
Can somebody help?
Executing make produces this list of warnings and errors (I had to edit the messages to remove the smilies...) ########################################### make g++ -o dindel -I/home/aoschmitt/Desktop/samtools-0.1.14/ -Iseqan_library/ -I./ -Wno-deprecated -O3 DInDel.o HapBlock.o HaplotypeDistribution.o ObservationModelFB.o GetCandidates.o Faster.o -L/home/aoschmitt/Desktop/samtools-0.1.14/ -lbam -lz -lboost_program_options -static /usr/lib/gcc/i486-linux-gnu/4.4.3/../../../../lib/libbam.a(knetfile.o): In function `socket_connect': (.text+0xa8d): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking DInDel.o : In function `T.12415': DInDel.cpp: (.text+0x9c2b): undefined reference to `boost :: program_options::validation_error::validation_error(boost: rogram_options::validation_error::kind_t, std::basic_string char [......] DInDel.o InDel.cpp .text+0x26f05): more undefined references to `boost: rogram_options: options_description: ptions_description(std::basic_string char, std::char_traits<char>, std::allocator char const&, unsigned int, unsigned int)' follow collect2: ld returned 1 exit status Last edited by aos; 03-25-2011 at 06:28 AM. |
![]() |
![]() |
![]() |
#20 |
Junior Member
Location: Milano Join Date: Apr 2011
Posts: 5
|
![]()
Hello everyone,
I have problems with the very first step of Dindel. I mean, I should give it my "raw" bam files, right? Not merged, not realigned, just sorted. so, what I actually do is: dindel --ref ref.fa --outputFile sample.dindel_output --analysis getCIGARindels --region region.interval_list --bamFile sample.bam and what I got is Error parsing input options. Usage: plus the complete list of options and commands. Where's my mistake? Thanks in advance, Francesco
__________________
wherever you go, whatever you do, always bring a bioinformatic with you |
![]() |
![]() |
![]() |
Tags |
dindel |
Thread Tools | |
|
|