Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dindel stage4

    Hi all,

    I'm trying to detect indels from Exome capture paired-end reads from Illumina.
    I aligned my data with BWA and succesfully performed the 3 first stages of the Dindel-program (version 1.01). However, in step 4, I'm somewhat confused: do you first have to merge all files generated in stage 3 into one single file? And is this simply done by 'concatening'?

    When I tried without concatenating, I get several errors:
    ./mergeOutputDiploid.py --inputFiles sample.dindel_stage2_output_windows.txt --outputFile variantCalls.VCF --ref hg19.fa
    An error occurred!
    Traceback (most recent call last):
    File "./mergeOutputDiploid.py", line 351, in <module>
    main(sys.argv[1:])
    File "./mergeOutputDiploid.py", line 346, in main
    mergeOutput(glfFilesFile = options.inputFiles, sampleID = options.sampleID, maxHPLen = options.maxHPLen, refFile = options.refFile, vcfFile = options.outputFile, filterQual = int(options.filterQual))
    File "./mergeOutputDiploid.py", line 254, in mergeOutput
    fg = open(glfFilesFile,'r')
    IOError: [Errno 2] No such file or directory: 'LNCaP_gelukt_BWA.dindel_stage2_output_windows.txt'

    When I concatenated a few files into one single file, I still received an error:
    ./mergeOutputDiploid.py --inputFiles sample.dindel_stage2_outputfiles2.txt --outputFile variantCalls.VCF --ref hg19.fa
    WARNING: additional columns in line 1 of file sample.dindel_stage2_outputfiles2.txt were ignored
    File msg does not exist
    . Aborting.
    An error occurred!


    Anyone knows what I'm doing wrong?
    Thank a lot!
    Lien

  • #2
    I wrote absolute path of all window files from last stage in a text file as --inputFiles,

    /your/path/xxxxwindow1.txt
    /your/path/xxxxwindow2.txt
    /your/path/xxxxwindow3.txt
    ....

    That will work.

    Comment


    • #3
      Oh, I interpreted it wrong, I thought they meant the content of those files.

      It works fine now.
      Thanks!

      Comment


      • #4
        Dindel Stage 4 problems

        Hi,

        I'm having similar problems to Lien, however mine are unfortunately not solved yet I'm not sure if I understand your solution skblazer or the original instructions (very new to unix/linux).

        Currently I am typing in all of the x.dindel_stage2_output files after --inputFiles. So for example if I have three output files I would type

        python mergeOutputDiploid.py --inputFiles x.dindel_stage2_output.1.glf.txt x.dindel_stage2_output.2.glf.txt x.dindel_stage2_output.3.glf.txt --outputFile indel.VCF --ref hg18.fa

        I get the same message as Lien did when:
        WARNING: additional columns in line 1 of file x.dindel_stage2_output.1.glf.txt were ignored
        File msg does not exist
        . Aborting.
        An error occurred!

        I tried putting in the whole path name before each sample file thinking that was what you were suggesting skblazer, but it came back with the same message. Have I missed something? Am I supposed to combine all 3 files together beforehand? Is this what you are saying skblazer?

        thanks!

        Comment


        • #5
          You need create a file, for example "files.txt".
          In this file, you should write the following lines:
          /your/path/x.dindel_stage2_output.1.glf.txt
          /your/path/x.dindel_stage2_output.2.glf.txt
          /your/path/x.dindel_stage2_output.3.glf.txt

          Then you type the command:
          python mergeOutputDiploid.py --inputFiles files.txt --outputFile indel.VCF --ref hg18.fa

          That'll work.

          Originally posted by fitzgeraldlm View Post
          Hi,

          I'm having similar problems to Lien, however mine are unfortunately not solved yet I'm not sure if I understand your solution skblazer or the original instructions (very new to unix/linux).

          Currently I am typing in all of the x.dindel_stage2_output files after --inputFiles. So for example if I have three output files I would type

          python mergeOutputDiploid.py --inputFiles x.dindel_stage2_output.1.glf.txt x.dindel_stage2_output.2.glf.txt x.dindel_stage2_output.3.glf.txt --outputFile indel.VCF --ref hg18.fa

          I get the same message as Lien did when:
          WARNING: additional columns in line 1 of file x.dindel_stage2_output.1.glf.txt were ignored
          File msg does not exist
          . Aborting.
          An error occurred!

          I tried putting in the whole path name before each sample file thinking that was what you were suggesting skblazer, but it came back with the same message. Have I missed something? Am I supposed to combine all 3 files together beforehand? Is this what you are saying skblazer?

          thanks!

          Comment


          • #6
            fitzgeraldlm,

            The argument to --inputFiles should be the name of a single text file, this text file in turn containing all the names of the output files. So if your example, you would create a new file with the following content:

            Code:
            x.dindel_stage2_output.1.glf.txt
            x.dindel_stage2_output.2.glf.txt
            x.dindel_stage2_output.3.glf.txt
            That is, the literal names of the output files (you don't need absolute paths). Presumably the rationale behind this is that some runs can generate a very large number of output files, and so it would get difficult to specify them all on the command line. So instead you write all the file names to a text-file, and then the Dindel script looks into this text file. You can generate the text file either manually if you have a small number of output files or by a command like ls | grep ".glf.txt" > list_of_output_files.txt or similar (and then you'd specify --inputFiles list_of_output_files.txt)

            EDIT: beaten by skblazer ;]
            Last edited by gaffa; 01-07-2011, 05:36 PM.

            Comment


            • #7
              Solved

              A big thank you to gaffa and skblazer. I have now got Stage 4 running and have a VCF file! Thanks for the tip on how to create the txt file gaffa. I did have to add the whole path name, like you suggested skblazer. This may have something to do with the way I installed (or didn't correctly install) Dindel.

              Thanks!

              Comment


              • #8
                Hi I am still facing a similar problem:
                I've successfully went through first three stages of dindel variant calling, but getting the following error message when using mergeOutputPooled.py to generate the final vcf file. Note that cases_A.gene.ABCA1.glf.txt contain the name of my 10 glf.txt files.
                Thank you for helping me to fix that.

                [ndiayea@topaz] /shares/data/illumina_datastore/MI_20100215/analyses/Indels_calling/BAM_files/vcf_cases $ python /shares/home/ndiayea/programs/dindel-1.01-python/mergeOutputPooled.py --inputFiles ABCA1_cases_outputfiles.txt --outputFile ABCA1_cases_variantCalls.VCF --ref /shares/data/genome_datastore/homo_sapiens/Homo_sapiens_assembly18.fasta --numSamples 500 --numBamFiles 10
                Reading cases_A.gene.ABCA1.glf.txt
                An error occurred!
                Traceback (most recent call last):
                File "/shares/home/ndiayea/programs/dindel-1.01-python/mergeOutputPooled.py", line 620, in ?
                main(sys.argv[1:])
                File "/shares/home/ndiayea/programs/dindel-1.01-python/mergeOutputPooled.py", line 613, in main
                processPooledGLFFiles(glfFilesFile = options.inputFiles, maxHPLen = options.maxHPLen, refFile = options.refFile, outputVCFFile = options.outputFile, doNotFilterOnFR = (not options.filterFR), filterQual = int(options.filterQual), numSamples = int(options.numSamples), numBamFiles = int(options.numBAMFiles))
                File "/shares/home/ndiayea/programs/dindel-1.01-python/mergeOutputPooled.py", line 336, in processPooledGLFFiles
                raise NameError('Inconsistent glf files! Is the number of BAM files correctly specified?')
                NameError: Inconsistent glf files! Is the number of BAM files correctly specified?

                Comment


                • #9
                  I am having the same problem but I am using the option for pooled samples, I am not sure what to put in the numSamples, is the number of samples/individuals in one of my bam files?
                  In the outputFiles.txt I have the list of my 78 files.

                  my command line is
                  ./dindel-1.01-linux-64bit mergeOutputPooled.py --inputFiles outputFiles.txt --outputFile variantCalls_hy7.VCF --ref chick.fa --numSamples 50 --numBamFiles 1

                  The error message was only
                  Error parsing input options

                  But I think my options are correct, what could be the problem?

                  Thanks

                  Comment


                  • #10
                    I solved my problem to run the stage 4, the input file (unique text file) should have the same name as the other files.

                    I am using numSamples of 10, this is the number of individuals in my pool, but there is no explanation in the manual about it.

                    But now I am having a vcf file empty, and of course I dont know what is the problem. Anyway the Dindel software is so difficult to run, there are so many steps!

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM
                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    31 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    32 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    28 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-04-2024, 09:00 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X