SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to use dindel libiyagirl Bioinformatics 12 07-25-2012 04:04 AM
DINDEL --varFile not specified jorge Bioinformatics 5 01-23-2012 07:13 AM
Dindel zhangtao13039 Bioinformatics 3 12-01-2011 12:34 PM
dindel question csoong Bioinformatics 0 02-25-2011 12:32 PM
Dindel --outputRealignedBAM fitzgeraldlm Bioinformatics 3 02-04-2011 07:33 AM

Reply
 
Thread Tools
Old 10-25-2010, 04:28 AM   #1
Hena
Member
 
Location: Finland

Join Date: Nov 2009
Posts: 19
Default Using dindel

Hi all,

I'm currently trying out dindel v0.12 for finding indels. However I hit a little snag and there is little help available that I can find.

I'm running the stage two command to realign windows (second command of phase 2). The example in manual gives command:
dindel --analysis indels --doDiploid --bamFile sample.bam --ref ref.fa --inputVarFile sample.realign_windows.2.txt --libFile sample.dindel_output.libraries.txt --outputFile sample.dindel_stage2_output_windows.2

However running the above with correct file names doesn't work. It gives out error: Error parsing input options. and prints the usage. So what option(s) should be added to make that stage work?

I also noticed that the phase 2 first command should have inputVarFile instead of varFile as said in the manual.
Hena is offline   Reply With Quote
Old 10-25-2010, 05:55 AM   #2
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Kees (the author) has been quite generous about helping me past similar problems

Just replace each $-prefixed item with the correct filename (this is pulled from some Perl code); I think the main problem you've hit is the --inputVarFile vs. --varFile inconsistency in the code
Code:
dindel --analysis indels --doDiploid --bamFile bamFile --ref $refFasta --varFile $windowsFile  --outputFile $outputFile
krobison is offline   Reply With Quote
Old 10-25-2010, 06:36 AM   #3
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

I think there are a couple of typos in the online documentation. The following shows how I run dindel.

Code:
./dindel_x86-64  --ref chr20.fa --outputFile 1 --bamFile aln.bam --analysis getCIGARindels
python makeWindows.py --inputVarFile 1.variants.txt --windowFilePrefix 2 --numWindowsPerFile 20000
./dindel_x86-64 --analysis indels --doDiploid --bamFile aln.bam --ref chr20.fa --varFile 2.1.txt --libFile 1.libraries.txt --outputFile 3 > 3.out 2> 3.err
echo 3.glf.txt > 3.list
python mergeOutput.py -t diploid -i 3.list -o 4.vcf -r chr20.fa
lh3 is offline   Reply With Quote
Old 10-25-2010, 12:51 PM   #4
Lee Sam
Member
 
Location: Ann Arbor, MI

Join Date: Oct 2008
Posts: 57
Default

Quote:
Originally Posted by lh3 View Post
I think there are a couple of typos in the online documentation. The following shows how I run dindel.

Code:
./dindel_x86-64  --ref chr20.fa --outputFile 1 --bamFile aln.bam --analysis getCIGARindels
python makeWindows.py --inputVarFile 1.variants.txt --windowFilePrefix 2 --numWindowsPerFile 20000
./dindel_x86-64 --analysis indels --doDiploid --bamFile aln.bam --ref chr20.fa --varFile 2.1.txt --libFile 1.libraries.txt --outputFile 3 > 3.out 2> 3.err
echo 3.glf.txt > 3.list
python mergeOutput.py -t diploid -i 3.list -o 4.vcf -r chr20.fa
Thanks, this is really helpful. I'm working with dindel too and I was just today wondering about these.
Lee Sam is offline   Reply With Quote
Old 10-25-2010, 02:04 PM   #5
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

Question regarding the --doEM option:
I have a family of five individuals (two parents, three children), so I assume there are four haplotypes in the data set. Is there a way to set it for this (if it would make a difference)?
Am I better off extracting each individual from the pooled BAM file and running them individually with --doDiploid instead?
Thanks.
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]
Michael.James.Clark is offline   Reply With Quote
Old 10-25-2010, 10:51 PM   #6
Hena
Member
 
Location: Finland

Join Date: Nov 2009
Posts: 19
Default

Thanks for the answers lh3 and krobison. I got it running now .
Hena is offline   Reply With Quote
Old 10-26-2010, 12:19 PM   #7
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

I used Dindel after GATK realignment/recalibration.
It seems like this is redundant.
Is it just as good/better to just run Dindel in a seperate pipeline directly from the original alignments?

Another query: Do people just generally filter out those that end up with the fr0/q20/hp10/wv flags in the FILTER field?
__________________
Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
Projects: U87MG whole genome sequence [Website] [Paper]

Last edited by Michael.James.Clark; 10-26-2010 at 07:41 PM.
Michael.James.Clark is offline   Reply With Quote
Old 10-28-2010, 04:43 AM   #8
keesa
Junior Member
 
Location: Cambridge, UK

Join Date: Oct 2010
Posts: 2
Default

In general I would advise not to use variants with quality scores below 10 for single diploid samples. The fr0 filter in the 0.12 version of Dindel does reduce the number of false positives on real data but you will also loose some sensitivity.

It is true that running Dindel on BAMs realigned by the GATK will not result in too many new calls if you have high-depth diploid data.
The main advantage of running Dindel currently would be for calling the genotypes: here the GATK realigned BAMs might result in undercalls as reads matching the reference are not realigned even though they may support the alternative haplotype with the indel just as well as the reference haplotype.
Also, Dindel has a dedicated sequencing error model for homopolymer runs, which should result in more accurate calls in those contexts.
The Broad are currently implementing the Dindel algorithm in the GATK, but I don't know exactly when it will be released (later this year I expect).

The new version of Dindel has a script that lets you select only the indels that were seen twice or more (whatever number you prefer). If you apply this to indels extracted from the realigned BAM you will be able to significantly reduce compute time.

Kees (Disclosure: I am the author of Dindel if it wasn't clear already).

PS I put a new version of Dindel on the website today.
http://sites.google.com/site/keesalbers/soft/dindel
keesa is offline   Reply With Quote
Old 10-28-2010, 05:33 AM   #9
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

Quote:
Originally Posted by Michael.James.Clark View Post
I used Dindel after GATK realignment/recalibration.
It seems like this is redundant.
But it helps when you want to look by eye to the alignments to understand why your SNP caller performed a call.
__________________
-drd
drio is offline   Reply With Quote
Old 10-28-2010, 09:25 AM   #10
lshen
Member
 
Location: Toronto

Join Date: Jan 2008
Posts: 30
Default

Thanks for the update. It is a great tool that I was using to re-run several data sets.

For v 1.01: --numWindowsPerFile option not working.

I see discrepancied between QUAL and last column in vcf output:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S3
chr13 8769 . C CA 897 PASS DP=150;NF=14;NR=13;NRS=16;NFS=13;HP=1 GT:GQ 1/1:90
chr13 8910 . AT A 289 PASS DP=127;NF=6;NR=6;NRS=11;NFS=10;HP=2 GT:GQ 0/1:289
chr13 8985 . ACT A 272 PASS DP=109;NF=13;NR=0;NRS=26;NFS=0;HP=1 GT:GQ 1/1:3

Can you output total read counts in vcf output? Can you generate the glf file list automaticallyas part of your makeWindows.py?
lshen is offline   Reply With Quote
Old 11-01-2010, 10:17 AM   #11
lshen
Member
 
Location: Toronto

Join Date: Jan 2008
Posts: 30
Default

Anyone can feedback on the output? Did I make mistake in the run (single sample as diploid and with default settings)?



How can NRS+NFS = 32 with DP=81, and the genotype is 1/1? it should be heterozugous.

chr7 3304476 . AC A 1272 PASS DP=81;NF=20;NR=8;NRS=21;NFS=11;HP=3 GT:GQ 1/1:93


Below is more from the VCF4 output

##INFO=<ID=DP,Number=1,Type=Integer,Description="Total number of reads in haplotype window">
##INFO=<ID=HP,Number=1,Type=Integer,Description="Reference homopolymer tract length">
##INFO=<ID=NF,Number=1,Type=Integer,Description="Number of reads covering non-ref variant on forward strand">
##INFO=<ID=NR,Number=1,Type=Integer,Description="Number of reads covering non-ref variant on reverse strand">
##INFO=<ID=NFS,Number=1,Type=Integer,Description="Number of reads covering non-ref variant site on forward strand">
##INFO=<ID=NRS,Number=1,Type=Integer,Description="Number of reads covering non-ref variant site on reverse strand">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">
##ALT=<ID=DEL,Description="Deletion">
##FILTER=<ID=q5,Description="Quality below 5">
##FILTER=<ID=hp10,Description="Reference homopolymer length was longer than 10">
##FILTER=<ID=fr0,Description="Non-ref allele is not covered by at least one read on both strands">
##FILTER=<ID=wv,Description="Other indel in window had higher likelihood">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 2044B
chr7 3304476 . AC A 1272 PASS DP=81;NF=20;NR=8;NRS=21;NFS=11;HP=3 GT:GQ 1/1:93
chr7 3311292 . G GAGA 12 PASS DP=113;NF=0;NR=0;NRS=11;NFS=36;HP=2 GT:GQ 0/1:12

chr3 135275377 . C CCGCTCTTCCGAT 36 PASS DP=40;NF=0;NR=0;NRS=0;NFS=0;HP=2 GT:GQ 0/1:36
chr3 135278476 . T TAGATCGGAAGA 3 q5 DP=130;NF=0;NR=0;NRS=0;NFS=0;HP=2 GT:GQ 0/1:3
chr3 135281981 . C CGCTCTTCCGATCT 15 PASS DP=42;NF=0;NR=0;NRS=1;NFS=0;HP=3 GT:GQ 0/1:15
lshen is offline   Reply With Quote
Old 11-05-2010, 08:15 AM   #12
Jaap
Junior Member
 
Location: Leiden

Join Date: Oct 2010
Posts: 3
Default Dindel on paired-end data

Hi all,

Since we want to compare samples sequenced in Sanger to our own samples we figured out that we needed the same analysis programs. Sanger informed me they have used Dindel for indels, so I wanted to use that too. Only thing is Dindel only takes one BAM file as input. Since I have paired-end reads I'm confused.
Do I need to merge these files with Samtools? And how does Dindel then know which reads are the pairs?

Kind regards
Jaap
Jaap is offline   Reply With Quote
Old 11-05-2010, 08:35 AM   #13
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

What aligner are you using? Most aligners will take paired end data & use that in the alignment process as well as generate the proper pairing information.

Does dindel consider the pairing information? It could certainly have a potential value, but I'm not sure it relies on it.
krobison is offline   Reply With Quote
Old 11-05-2010, 09:00 AM   #14
Jaap
Junior Member
 
Location: Leiden

Join Date: Oct 2010
Posts: 3
Default

I'm using BWA for alignment.
Do I understand correctly that the paired-end info is in the BWA generated BAM files? And I should merge them before I use Dindel?

Kind regards
Jaap
Jaap is offline   Reply With Quote
Old 11-05-2010, 09:24 AM   #15
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

Quote:
Originally Posted by Jaap View Post
I'm using BWA for alignment.
Do I understand correctly that the paired-end info is in the BWA generated BAM files? And I should merge them before I use Dindel?
If you used sampe when processing your alignments your
BAM will already contain alignments from both ends(pairs).
Dindel will process them accordingly following the BAM standars.
__________________
-drd
drio is offline   Reply With Quote
Old 11-05-2010, 09:54 AM   #16
Jaap
Junior Member
 
Location: Leiden

Join Date: Oct 2010
Posts: 3
Default

Ah Ofcourse. Was confused by the separate alignment files.
Thanks.
Jaap is offline   Reply With Quote
Old 01-28-2011, 11:39 AM   #17
libiyagirl
Member
 
Location: canada

Join Date: Jan 2011
Posts: 27
Default

In my case, when I was running

$ python makeWindows.py --inputVarFile 1.txt --windowFileFrefix sample.realign_windows--numWindowsPerFile 1000

it says I should specify --windowFilePrefix
why? how to solve this problem?
libiyagirl is offline   Reply With Quote
Old 01-28-2011, 11:45 AM   #18
libiyagirl
Member
 
Location: canada

Join Date: Jan 2011
Posts: 27
Default

Please ignore my message, problem solved!
libiyagirl is offline   Reply With Quote
Old 03-25-2011, 06:25 AM   #19
aos
Junior Member
 
Location: Berlin, Germany

Join Date: Mar 2011
Posts: 1
Question Problems installing dindel

Can somebody help?
Executing make produces this list of warnings and errors
(I had to edit the messages to remove the smilies...)

###########################################
make
g++ -o dindel -I/home/aoschmitt/Desktop/samtools-0.1.14/ -Iseqan_library/ -I./ -Wno-deprecated -O3 DInDel.o HapBlock.o HaplotypeDistribution.o ObservationModelFB.o GetCandidates.o Faster.o -L/home/aoschmitt/Desktop/samtools-0.1.14/ -lbam -lz -lboost_program_options -static
/usr/lib/gcc/i486-linux-gnu/4.4.3/../../../../lib/libbam.a(knetfile.o): In function `socket_connect':
(.text+0xa8d): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
DInDel.o : In function `T.12415':
DInDel.cpp: (.text+0x9c2b): undefined reference to `boost :: program_options::validation_error::validation_error(boost: rogram_options::validation_error::kind_t, std::basic_string char

[......]

DInDel.o InDel.cpp .text+0x26f05): more undefined references to `boost: rogram_options: options_description: ptions_description(std::basic_string char, std::char_traits<char>, std::allocator char const&, unsigned int, unsigned int)' follow
collect2: ld returned 1 exit status

Last edited by aos; 03-25-2011 at 06:28 AM.
aos is offline   Reply With Quote
Old 05-02-2011, 07:13 AM   #20
francesconea
Junior Member
 
Location: Milano

Join Date: Apr 2011
Posts: 5
Default

Hello everyone,
I have problems with the very first step of Dindel. I mean, I should give it my "raw" bam files, right? Not merged, not realigned, just sorted.
so, what I actually do is:
dindel --ref ref.fa --outputFile sample.dindel_output --analysis getCIGARindels --region region.interval_list --bamFile sample.bam

and what I got is Error parsing input options. Usage: plus the complete list of options and commands. Where's my mistake?

Thanks in advance,

Francesco
__________________
wherever you go, whatever you do, always bring a bioinformatic with you
francesconea is offline   Reply With Quote
Reply

Tags
dindel

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO