Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    BAM file not indexed

    When I realigned reads around indels, an error occured:
    HTML Code:
    ERROR MESSAGE: Invalid command line: Cannot process the provided BAM file(s) because they were not indexed.
    The previous 2 steps and the realignment command lines are:
    Code:
    java -Xmx4g -Djava.io.tmpdir=./ \
      -jar picard-tools-1.57/SortSam.jar \
      SORT_ORDER=coordinate \
      INPUT=MK-5.sam \
      OUTPUT=MK-5.bam \
      VALIDATION_STRINGENCY=LENIENT \
      CREATE_INDEX=true
    java -Xmx4g -Djava.io.tmpdir=./ \
      -jar picard-tools-1.57/MarkDuplicates.jar \
      INPUT=MK-5.bam \
      OUTPUT=MK-5.marked.bam \
      METRICS_FILE=metrics \
      VALIDATION_STRINGENCY=LENIENT
    java -Xmx4g -jar GenomeAnalysisTK-1.6-9-g47df7bb/GenomeAnalysisTK.jar \
      -T RealignerTargetCreator \
      -R hg19.fa \
      -o MK-5.bam.list \
      -I MK-5.marked.bam
    
    java -Xmx4g -Djava.io.tmpdir=./ \
      -jar GenomeAnalysisTK-1.6-9-g47df7bb/GenomeAnalysisTK.jar \
      -I MK-5.marked.bam \
      -R hg19.fa \
      -T IndelRealigner \
      -targetIntervals MK-5.bam.list \
      -o MK-5.marked.realigned.bam
    Could anyone tell me what's wrong with it? Thanks in advance!
    Last edited by frewise; 08-17-2012, 12:29 AM.

    Comment


    • #92
      Originally posted by frewise View Post
      When I realigned reads around indels, an error occured:
      HTML Code:
      ERROR MESSAGE: Invalid command line: Cannot process the provided BAM file(s) because they were not indexed.
      The previous 2 steps and the realignment command lines are:
      Code:
      java -Xmx4g -Djava.io.tmpdir=./ \
        -jar picard-tools-1.57/SortSam.jar \
        SORT_ORDER=coordinate \
        INPUT=MK-5.sam \
        OUTPUT=MK-5.bam \
        VALIDATION_STRINGENCY=LENIENT \
        CREATE_INDEX=true
      java -Xmx4g -Djava.io.tmpdir=./ \
        -jar picard-tools-1.57/MarkDuplicates.jar \
        INPUT=MK-5.bam \
        OUTPUT=MK-5.marked.bam \
        METRICS_FILE=metrics \
        VALIDATION_STRINGENCY=LENIENT
      java -Xmx4g -jar GenomeAnalysisTK-1.6-9-g47df7bb/GenomeAnalysisTK.jar \
        -T RealignerTargetCreator \
        -R hg19.fa \
        -o MK-5.bam.list \
        -I MK-5.marked.bam
      
      java -Xmx4g -Djava.io.tmpdir=./ \
        -jar GenomeAnalysisTK-1.6-9-g47df7bb/GenomeAnalysisTK.jar \
        -I MK-5.marked.bam \
        -R hg19.fa \
        -T IndelRealigner \
        -targetIntervals MK-5.bam.list \
        -o MK-5.marked.realigned.bam
      Could anyone tell me what's wrong with it? Thanks in advance!
      Hi,
      Try adding a CREATE_INDEX=true to your markduplicates-command as well, and I think it will work.

      Ø

      Comment


      • #93
        Originally posted by oyvindbusk View Post
        Hi,
        Try adding a CREATE_INDEX=true to your markduplicates-command as well, and I think it will work.

        Ø
        Thank you very much, oyvindbusk. It works.

        Comment


        • #94
          Hello everybody,

          I am using the pipeline of ulz_peter (thanks ) to perform realignment around indels and recalibration.

          1) I am stuck at the third step of realignment:

          java -Djava.io.tmpdir=/tmp/flx-auswerter -jar picard/FixMateInformation.jar INPUT=input.marked.realigned.bam OUTPUT=input_bam.marked.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
          java -Djava.io.tmpdir=/tmp -jar /share/apps/picard-tools-1.76/FixMateInformation.jar INPUT=$path/$i/$i.realigned.bam OUTPUT=$path/$i/$i.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
          I got this error message:
          Code:
          ##### ERROR ------------------------------------------------------------------------------------------
          ##### ERROR A USER ERROR has occurred (version 2.0-39-gd091f72): 
          ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
          ##### ERROR Please do not post this error to the GATK forum
          ##### ERROR
          ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
          ##### ERROR Visit our website and forum for extensive documentation and answers to 
          ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
          ##### ERROR
          ##### ERROR MESSAGE: Walker CountCovariates is no longer available in the GATK; it has been deprecated since version 2.0
          Before that, I removed the PCR duplicates with samtools and performed the first 2 steps of local realignment near indels:

          Code:
          /share/apps/samtools-0.1.18/samtools rmdup ${bam}.bak $bam
          java -jar $GA/GenomeAnalysisTK.jar -T RealignerTargetCreator -R $db -o ${bam}.list -I $bam
          java -Djava.io.tmpdir=/tmp -jar $GA/GenomeAnalysisTK.jar -I $bam -R $db -T IndelRealigner -targetIntervals ${bam}.list -o $path/$i/$i.realigned.bam
          I'am using version 1.76. I really don't know why I got such an error... Does someone see what I am doing wrong?

          2) I haven't run yet the recalibration but I saw, in the picard output:
          Code:
          ##### ERROR MESSAGE: Walker CountCovariates is no longer available in the GATK; it has been deprecated since version 2.0
          
          ##### ERROR MESSAGE: Walker TableRecalibration is no longer available in the GATK; it has been deprecated since version 2.0
          Since I'am using v2.0-39-gd091f72, I am wondering how to perform the recalibration in these conditions Do you know why these analysis types have been removed? Should I change my GATK version?

          3) Finally, did I understand well: the recalibration of variant quality score, after variant calling, is rather not working when using a single bam? In this case, I won't perform this recalibration.

          Thank you in advance,
          Jane
          Last edited by Jane M; 09-04-2012, 05:22 AM.

          Comment


          • #95
            Dear Jane.

            The error message is because the countcovariates-tool of GATK is no longer supported. You have to use BaseRecalibrator and PrintReads tool as described on the GATK page:


            Ø

            Comment


            • #96
              Thank you for your help oyvindbusk,
              I changed -T CountCovariates by BaseRecalibrator and -T TableRecalibration by PrintReads.

              Nevertheless, I am stuck before this step, with Picard, when running:

              java -Djava.io.tmpdir=/tmp -jar /share/apps/picard-tools-1.76/FixMateInformation.jar INPUT=$path/$i/$i.realigned.bam OUTPUT=$path/$i/$i.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
              I got:
              [Wed Sep 05 09:59:27 CEST 2012] net.sf.picard.sam.FixMateInformation INPUT=[/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.bam] OUTPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.fixed.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
              [Wed Sep 05 09:59:27 CEST 2012] Executing as [..]; Java HotSpot(TM) Server VM 1.6.0_26-b03; Picard version: 1.76(1261)
              INFO 2012-09-05 09:59:27 FixMateInformation Sorting input into queryname order.
              [Wed Sep 05 10:15:30 CEST 2012] net.sf.picard.sam.FixMateInformation done. Elapsed time: 16.06 minutes.
              Runtime.totalMemory()=954466304
              FAQ: http://sourceforge.net/apps/mediawik...itle=Main_Page
              Exception in thread "main" net.sf.samtools.util.RuntimeIOException: java.io.IOException: No space left on device
              at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:228)
              at net.sf.samtools.util.SortingCollection.add(SortingCollection.java:150)
              at net.sf.picard.sam.FixMateInformation.doWork(FixMateInformation.java:147)
              at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
              at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
              at net.sf.picard.sam.FixMateInformation.main(FixMateInformation.java:75)
              Caused by: java.io.IOException: No space left on device
              at java.io.FileOutputStream.write(Native Method)
              at org.xerial.snappy.SnappyOutputStream.writeInt(SnappyOutputStream.java:105)
              at org.xerial.snappy.SnappyOutputStream.dump(SnappyOutputStream.java:126)
              at org.xerial.snappy.SnappyOutputStream.flush(SnappyOutputStream.java:100)
              at org.xerial.snappy.SnappyOutputStream.close(SnappyOutputStream.java:137)
              at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:219)
              ... 5 more
              I googled it, it seems to be a problem of space... I don't know if it's because the folder /tmp is too small or if I should add -Xmx4g... On the cluster that I'm using, this -Xmx4g is not working.
              Any idea?
              Last edited by Jane M; 09-05-2012, 12:57 AM.

              Comment


              • #97
                I think I had a similar problem resolved by increasing the size of temp-folder. I would try this first. If it were the memory it would probably say something like "not sufficient memory".

                Ø

                Comment


                • #98
                  IOException: No space left on device

                  Originally posted by Jane M View Post
                  Thank you for your help oyvindbusk,
                  I changed -T CountCovariates by BaseRecalibrator and -T TableRecalibration by PrintReads.

                  Nevertheless, I am stuck before this step, with Picard, when running:
                  I got:
                  I googled it, it seems to be a problem of space... I don't know if it's because the folder /tmp is too small or if I should add -Xmx4g... On the cluster that I'm using, this -Xmx4g is not working.
                  Any idea?
                  I have got the same errors and it seems I ran out of space in the hard drive. If you are running a very large exome file (i.e. 100 x exome) you may need a large space for your temporary files. Try to increase your hard drive space and see what happens.

                  Comment


                  • #99
                    Thank both of you.
                    I managed to run by using a different temporary folder :
                    java -Djava.io.tmpdir=/mnt/seq3/seq3/LMMC/GAR/GAR_sain -jar /share/apps/picard-tools-1.76/FixMateInformation.jar INPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.bam OUTPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
                    My output file is generating finally Thanks !
                    But I have this output:

                    [Wed Sep 05 14:53:57 CEST 2012] net.sf.picard.sam.FixMateInformation INPUT=[/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.bam] OUTPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.fixed.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
                    [Wed Sep 05 14:53:57 CEST 2012] Executing as [...]; Java HotSpot(TM) Server VM 1.6.0_26-b03; Picard version: 1.76(1261)
                    INFO 2012-09-05 14:53:57 FixMateInformation Sorting input into queryname order.
                    Ignoring SAM validation error: ERROR: Record 104416902, Read name HWI-ST584_0081:4:2206:2021:2237#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416903, Read name HWI-ST584_0081:4:1102:17248:85426#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416904, Read name HWI-ST584_0081:4:1102:2000:84379#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416905, Read name HWI-ST584_0081:4:1103:2169:69666#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416906, Read name HWI-ST584_0081:4:1103:8346:98868#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416907, Read name HWI-ST584_0081:4:1104:12598:60315#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416908, Read name HWI-ST584_0081:4:1105:1489:49925#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416909, Read name HWI-ST584_0081:4:1105:16587:151639#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416910, Read name HWI-ST584_0081:4:1105:8354:181686#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416911, Read name HWI-ST584_0081:4:1107:17717:4065#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416912, Read name HWI-ST584_0081:4:1108:4813:156146#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416913, Read name HWI-ST584_0081:4:1108:9333:173330#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416914, Read name HWI-ST584_0081:4:1204:9783:76003#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416915, Read name HWI-ST584_0081:4:1206:4951:59169#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416916, Read name HWI-ST584_0081:4:2104:7277:38433#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416917, Read name HWI-ST584_0081:4:2106:5867:60527#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416918, Read name HWI-ST584_0081:4:2202:1737:149838#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416919, Read name HWI-ST584_0081:4:2205:16036:86626#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416920, Read name HWI-ST584_0081:4:2207:1759:176116#AGTCAA, MAPQ should be 0 for unmapped read.
                    INFO 2012-09-05 15:13:24 FixMateInformation Sorting by queryname complete.
                    INFO 2012-09-05 15:13:24 FixMateInformation Output will be sorted by coordinate
                    INFO 2012-09-05 15:13:24 FixMateInformation Traversing query name sorted records and fixing up mate pair information.
                    INFO 2012-09-05 15:13:34 FixMateInformation Processed 1,000,000 records. Elapsed time: 00:00:10s. Time for last 1,000,000: 10s. Last read position: chr14:74,489,555
                    INFO 2012-09-05 15:13:52 FixMateInformation Processed 2,000,000 records. Elapsed time: 00:00:27s. Time for last 1,000,000: 17s. Last read position: chr4:104,072,163
                    INFO 2012-09-05 15:14:05 FixMateInformation Processed 3,000,000 records. Elapsed time: 00:00:41s. Time for last 1,000,000: 13s. Last read position: chr21:31,654,746
                    Did you get also this message?
                    Code:
                    Ignoring SAM validation error: ERROR: Record 104416914, Read name HWI-ST584_0081:4:1204:9783:76003#AGTCAA, MAPQ should be 0 for unmapped read.
                    Jane

                    Comment


                    • One additional question regarding the quality score recalibration: from the v2.0 of GATK, this seems to be performed in one step only.

                      Previously, it was:
                      java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R hg19.fa --DBSNP dbsnp132.txt -I input.marked.realigned.fixed.bam -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile input.recal_data.csv

                      java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R hg19.fa -I input.marked.realigned.fixed.bam -T TableRecalibration --out input.marked.realigned.fixed.recal.bam -recalFile input.recal_data.csv
                      but now, it's rather
                      java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R hg19.fa -knowSites dbsnp132.txt -I input.marked.realigned.fixed.bam -T BaseRecalibrator -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate --out input.recal_data.csv

                      From GATK doc:
                      java -Xmx4g -jar GenomeAnalysisTK.jar \
                      -T BaseRecalibrator \
                      -I my_reads.bam \
                      -R resources/Homo_sapiens_assembly18.fasta \
                      -knownSites bundle/hg18/dbsnp_132.hg18.vcf \
                      -knownSites another/optional/setOfSitesToMask.vcf \
                      -o recal_data.grp
                      with this additional step only if using several bam files (if I understand well the documentation: "PrintReads can dynamically merge the contents of multiple input BAM files, resulting in merged output sorted in coordinate order" )
                      java -Xmx2g -jar GenomeAnalysisTK.jar \
                      -R ref.fasta \
                      -T PrintReads \
                      -o output.bam \
                      -I input1.bam \
                      -I input2.bam \
                      --read_filter MappingQualityZero
                      What is the option -l INFO? Is it still in use in the new version?
                      I guess that -o recal_data.grp is equivalent to -recalFile input.recal_data.csv. Am I right? What is the interest of this file, I don't see when it is used...
                      Finally, where is the recalibrated bam file ? There is only .grp output file at the BaseRecalibrator step.

                      I am a bit confused by these changes between versions...

                      Comment


                      • Originally posted by frewise View Post
                        Hi, raonyguimaraes, why did you remove genes with multiple variants in your last 2 steps?
                        I think its the opposite, he's taking genes with multiple variants into his regions of interest and treating the others with less importance.

                        Comment


                        • Originally posted by Jane M View Post
                          Thank both of you.
                          I managed to run by using a different temporary folder :


                          My output file is generating finally Thanks !
                          But I have this output:



                          Did you get also this message?
                          Code:
                          Ignoring SAM validation error: ERROR: Record 104416914, Read name HWI-ST584_0081:4:1204:9783:76003#AGTCAA, MAPQ should be 0 for unmapped read.
                          Jane
                          This error messages means these sequences can not be aligned against the reference genome. That is why you use the VALIDATION_STRINGENCY=LENIENT option so that the programs points you the sequences but dont stop running.

                          Comment


                          • Thank you AJERYC!

                            Because of some troubles with my version of dbSNP, I haven't managed to run:
                            java -Xmx4g -jar GenomeAnalysisTK.jar \
                            -T BaseRecalibrator \
                            -I my_reads.bam \
                            -R resources/Homo_sapiens_assembly18.fasta \
                            -knownSites bundle/hg18/dbsnp_132.hg18.vcf \
                            -knownSites another/optional/setOfSitesToMask.vcf \
                            -o recal_data.grp
                            but I am still wondering if I should run the PrintReads step since I only have one bam file and if my recalibrated bam file will be the recal_data.grp file. Any idea?

                            Comment


                            • What is the option -l INFO? Is it still in use in the new version?
                              I guess that -o recal_data.grp is equivalent to -recalFile input.recal_data.csv. Am I right? What is the interest of this file, I don't see when it is used...
                              Finally, where is the recalibrated bam file ? There is only .grp output file at the BaseRecalibrator step.

                              I am a bit confused by these changes between versions...
                              Did you ever get an answer to these questions? I am running into the same issues with the newer version of GATK.

                              Thanks,

                              Fong

                              Comment


                              • Originally posted by fongchun View Post
                                Did you ever get an answer to these questions? I am running into the same issues with the newer version of GATK.

                                Thanks,

                                Fong
                                Not yet...

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                33 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                72 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                80 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X