Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dindel giving error for every candidate indel

    I'm running Dindel 1.01 on bam files produced by BWA and Stampy, looking for small indels. All the steps run to completion but every candidate indel encounters an error, so the final vcf file is completely empty. The errors are the following (output from cat *.glf.txt | grep "chr" | cut -d " " -f 1 | sort | uniq -c):

    Code:
         14 error_above_read_count_threshold
          3 error_Cannot_find_reference_sequence.
        598 error_too_few_reads
    These proportions are typical of all runs I do - "error_too_few_reads" is dominating, though I highly doubt that there are actually too few reads - my coverage is ~25x, and through samtools pileup and manual inspection I know that there are plenty of indels that are well covered.

    When I run the "getCIGARindels" step, I get some alarming messages printed to stdout:
    Code:
    Parsing indels from CIGAR strings...
    Wrote indels in CIGARS for target chr1 to file candidate_dindels
    Wrote indels in CIGARS for target chr2 to file candidate_dindels
    error: faidx error, len==0
    start: -24 end: 
    error: faidx error, len==0
    start: -11 end:
    I get about ~20 of these "faidx error" in a typical run (though not for every chromosome). I can't find anything wrong with my reference genome file or it's index file (indexed using samtools faidx - appended at end of post). I don't know if these errors are related to the above ones.

    My commands (running on 64bit Ubuntu):
    Code:
    dindel-1.01-linux-64bit --analysis getCIGARindels --bamFile input.bam --outputFile candidate_dindels --ref ref_genome.fasta
    
    python makeWindows.py --inputVarFile candidate_dindels.variants.txt --windowFilePrefix realign_windows --numWindowsPerFile 2000
    
    dindel-1.01-linux-64bit --analysis indels --bamFile input.bam --ref ref_genome.fasta --libFile candidate_dindels.libraries.txt --varFile $infile --outputFile $outfile
    My fasta.fai file:
    Code:
    chr1	230208	6	100	101
    chr2	813178	232523	100	101
    chr3	316617	1053839	100	101
    chr4	1531919	1373629	100	101
    chr5	576869	2920874	100	101
    chr6	270148	3503518	100	101
    chr7	1090947	3776374	100	101
    chr8	562643	4878237	100	101
    chr9	439885	5446513	100	101
    chr10	745741	5890804	100	101
    chr11	666454	6644010	100	101
    chr12	1078175	7317136	100	101
    chr13	924429	8406100	100	101
    chr14	784334	9339781	100	101
    chr15	1091289	10131966	100	101
    chr16	948062	11234175	100	101
    chr17	85779	12191725	100	101
    chr18	6318	12278369	100	101

  • #2
    Hi,
    I have exactly the same problem.
    Did you ever figure out what the problem was?

    Comment


    • #3
      Yes in fact - Kees Albers has confirmed that this is caused by the omission of the --doDiploid (or --doPooled) flag. I left the --doDiploid flag out because I ran Dindel on haploid samples, but that will not work. You need one of the two "--do*" flags. Might it be your case too that you are not specifying this flag?

      Also the faidx errors are apparently nothing to worry about.

      Comment


      • #4
        Interesting. Thanks for that. Now why is it Dindel doesn't seem to have a mode for haploid samples ?

        Comment


        • #5
          Originally posted by colindaven View Post
          Interesting. Thanks for that. Now why is it Dindel doesn't seem to have a mode for haploid samples ?
          Well, it is common for software in the NGS field to not support haploid samples. In earlier versions of Dindel there was a "force homozygous" option, but not anymore.

          However, what I do is to run Dindel in pooled mode, and then use the "makeGenotypeLikelihoodFilePooled.py" script that comes with the program. This script prints a file with the likelihoods for homozygous ref/ref, heterozygous ref/alt and homozygous ref/ref - grabbing only the two homozygous likelihoods (ignoring the heterozygous likelihood) from this file allows me to force homozygosity (though then I have to compute some measure of confidence myself, e.g. a likelihood ratio or a posterior probability).

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Working...
          X