Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Splitting the data up is exactly what I did to simplify things. You can give srma a set of ranges using the "RANGES=ranges.txt" with

    e.g.

    chr1 1 3224608765
    chr2 1 6342342344
    chr3 1 6453554555
    And adding "-Xmx4G" increases the memory for java to use to 4Gb. Im still not sure how much is enough based on coverage so I keep it reasonably high.

    So just make a "ranges.txt" for each chromosome and launch your separate threads

    Comment


    • #17
      Originally posted by zee View Post
      Splitting the data up is exactly what I did to simplify things. You can give srma a set of ranges using the "RANGES=ranges.txt" with

      e.g.



      And adding "-Xmx4G" increases the memory for java to use to 4Gb. Im still not sure how much is enough based on coverage so I keep it reasonably high.

      So just make a "ranges.txt" for each chromosome and launch your separate threads

      Ahh ! Nice! Thanks for the tip. This makes life easier since I constantly have different target enrichment designs, so ranges will constantly change. Just having it at a chromosome level makes it more flexible.

      Comment


      • #18
        Just a one last note on that. Ensure that the ranges.txt is sorted by chromosomes and position.

        Originally posted by NGSfan View Post
        Ahh ! Nice! Thanks for the tip. This makes life easier since I constantly have different target enrichment designs, so ranges will constantly change. Just having it at a chromosome level makes it more flexible.

        Comment


        • #19
          Hi Zee,
          I alomst tried teh same what u had said above. The first step in getting the intervals is ok. But I get the error during the second step.I am getting an error as below:

          Code:
          $sh gatk.sh -T RealignerTargetCreator -I KLM.bam -R  /Genome/hg18/hg18.fa -o output.intervals
          [B]$ sh gatk.sh -T IndelRealigner -I KLM.bam -R  /Genome/hg18/hg18.fa -targetIntervals output.intervals --output realigned_b.bam[/B]
          FATAL 13:53:37,074 CommandLineProgram - Exception caught by base Command Line Program.  Stack trace is as follows: 
          java.lang.RuntimeException: org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step
                  at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:253)
                  at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:97)
          Caused by: org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step
                  at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner$ReadBin.add(IndelRealigner.java:1286)
                  at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner.map(IndelRealigner.java:323)
                  at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner.map(IndelRealigner.java:55)
                  at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:98)
                  at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:48)
                  at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:73)
                  at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:175)
                  at org.broadinstitute.sting.gatk.CommandLineExecutable.executeGATK(CommandLineExecutable.java:93)
                  at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:75)
                  at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:238)
                  ... 1 more
          ------------------------------------------------------------------------------------------
          The following error has occurred:
          
          org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step:
          
          Please check your command line arguments for any typos or inconsistencies.
          Please review our general documentation at http://www.broadinstitute.org/gsa/wiki or contact us via our
          support site at http://getsatisfaction.com/gsa to report bugs or get help resolving undocumented issues
          picard dict is available for hg18.fa. Is there anything that I am missing? Thanks.

          Comment


          • #20
            I have not seen this error before

            org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step:
            But gatk is complaining that this read does not overlap the interval .. weird.

            Perhaps you could pull out the location of that read and compare it to the start and end of the interval in "output.intervals".
            It looks as though gatk does not like aligning areas where reads do not overlap but perhaps that is a question of algorithm design.

            I am using version 1.0.3471,.

            Comment


            • #21
              I really don't understand why it is giving such error. I have also downloaded the recent version GenomeAnalysisTK-1.0.3471. May b I will launch the query at teh GATK help site..Thanks Zee for your help.

              Comment


              • #22
                I had an older version of GATK and did a checkout 3742 yesterday and it fixed this error.

                Comment


                • #23
                  Originally posted by NGSfan View Post
                  Yes, exactly! I initially started from samtools protocols , which implies that you do:

                  1) recalibration with GATK
                  2) local (MSA) realignment with GATK


                  But if you look at the GATK powerpoint from Mark DePristo (great set of slides btw!) GATK ppt then you see in the flow chart
                  on slide #20 show that MSA realignment comes before recalibration. So I am going back to correct my previous runs.

                  The realignment before recalibration actually makes more sense - since you will fix the reads that had problem alignments, and then recalibrate based on these better alignments!

                  I don't know how big or small an effect the order of these two steps has, but I think now this is the correct way.

                  I will try the new version of GATK and get back to you.
                  If you look at the same set of slides, on slide #2 Mark has a flow chart of the pipeline and there quality score recalibration precedes realignment which is the opposite of what he states on slide #20. This suggests to me that the order doesn't matter.
                  Last edited by vgrubor; 07-28-2010, 10:35 AM. Reason: missing info

                  Comment


                  • #24
                    Hi,

                    I have had similar troubles. I tried splitting my range file into one for each chromosome. I get the following error:

                    Allele coverage cutoffs:
                    coverage: 1 minimum allele coverage: 0
                    coverage: 2 minimum allele coverage: 0
                    coverage: 3 minimum allele coverage: 0
                    coverage: 4 minimum allele coverage: 1
                    coverage: 5 minimum allele coverage: 1
                    coverage: 6 minimum allele coverage: 1
                    coverage: 7 minimum allele coverage: 2
                    coverage: 8 minimum allele coverage: 2
                    coverage: 9 minimum allele coverage: 3
                    coverage: >9 minimum allele coverage: 3
                    java.lang.Exception: SAM/BAM file is not co-ordinate sorted.
                    at srma.SRMA.doWork(SRMA.java:253)
                    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150)
                    at srma.SRMA.main(SRMA.java:94)

                    Has anyone had this error before? My BAM file is sorted and indexed. I have previously realigned this exact same BAM using GSA. Any help would be appreciated!

                    Thanks.

                    Comment


                    • #25
                      Originally posted by genericforms View Post
                      Hi,

                      I have had similar troubles. I tried splitting my range file into one for each chromosome. I get the following error:

                      Allele coverage cutoffs:
                      coverage: 1 minimum allele coverage: 0
                      coverage: 2 minimum allele coverage: 0
                      coverage: 3 minimum allele coverage: 0
                      coverage: 4 minimum allele coverage: 1
                      coverage: 5 minimum allele coverage: 1
                      coverage: 6 minimum allele coverage: 1
                      coverage: 7 minimum allele coverage: 2
                      coverage: 8 minimum allele coverage: 2
                      coverage: 9 minimum allele coverage: 3
                      coverage: >9 minimum allele coverage: 3
                      java.lang.Exception: SAM/BAM file is not co-ordinate sorted.
                      at srma.SRMA.doWork(SRMA.java:253)
                      at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150)
                      at srma.SRMA.main(SRMA.java:94)

                      Has anyone had this error before? My BAM file is sorted and indexed. I have previously realigned this exact same BAM using GSA. Any help would be appreciated!

                      Thanks.
                      What sort order? What program did you use to sort the BAM file? If it is still not working, feel free to send me the data and I can debug. I love bugs.

                      Comment


                      • #26
                        Another Java heap space problem with SRMA

                        I give SRMA 10 GB RAM with -Xmx10g but it keeps crashing after half an hour, giving the error message:
                        [Tue Sep 07 13:40:23 CEST 2010] srma.SRMA REFERENCE=hg19.fasta CORRECT_BASES=true QUIET_STDERR=false OFFSET=20 MIN_MAPQ=0 MINIMUM_ALLELE_PROBABILITY=0.1 MINIMUM_ALLELE_COVERAGE=3 MAXIMUM_TOTAL_COVERAGE=100 USE_SEQUENCE_QUALITIES=true MAX_HEAP_SIZE=8192 MAX_QUEUE_SIZE=65536 NUM_THREADS=1 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000
                        Allele coverage cutoffs:
                        coverage: 1 minimum allele coverage: 0
                        coverage: 2 minimum allele coverage: 0
                        coverage: 3 minimum allele coverage: 0
                        coverage: 4 minimum allele coverage: 1
                        coverage: 5 minimum allele coverage: 1
                        coverage: 6 minimum allele coverage: 1
                        coverage: 7 minimum allele coverage: 2
                        coverage: 8 minimum allele coverage: 2
                        coverage: 9 minimum allele coverage: 3
                        coverage: >9 minimum allele coverage: 3

                        Records processsed: 65535 (last chr1:30689710-30689759)Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space
                        [Tue Sep 07 14:08:09 CEST 2010] srma.SRMA done.
                        Runtime.totalMemory()=9544400896
                        Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
                        at java.util.LinkedList.addBefore(LinkedList.java:778)
                        at java.util.LinkedList.add(LinkedList.java:198)
                        at srma.ThreadPoolLinkedList.add(ThreadPoolLinkedList.java:19)
                        at srma.SRMA.processToAddToGraphList(SRMA.java:389)
                        at srma.SRMA.doWork(SRMA.java:222)
                        at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150)
                        at srma.SRMA.main(SRMA.java:92)

                        65535 - just the number for "Records processsed" after which it crashed - is the default of MAX_QUEUE_SIZE and don't understand why there should be a problem fitting that into 10 GB RAM. The BAM file itself (converted from BFAST SAM, sorted and indexed with samtools) is 1.7G with 9230289 mapped reads.
                        Is it a problem of SRMA (version 0.1.7) or of Picard (version 1.29)?
                        I hope someone can provide help since I'd really like to use SRMA for local realignment of my SOLiD data.

                        Best,
                        Barbara

                        Comment


                        • #27
                          Originally posted by epigen View Post
                          I give SRMA 10 GB RAM with -Xmx10g but it keeps crashing after half an hour, giving the error message:
                          [Tue Sep 07 13:40:23 CEST 2010] srma.SRMA REFERENCE=hg19.fasta CORRECT_BASES=true QUIET_STDERR=false OFFSET=20 MIN_MAPQ=0 MINIMUM_ALLELE_PROBABILITY=0.1 MINIMUM_ALLELE_COVERAGE=3 MAXIMUM_TOTAL_COVERAGE=100 USE_SEQUENCE_QUALITIES=true MAX_HEAP_SIZE=8192 MAX_QUEUE_SIZE=65536 NUM_THREADS=1 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000
                          Allele coverage cutoffs:
                          coverage: 1 minimum allele coverage: 0
                          coverage: 2 minimum allele coverage: 0
                          coverage: 3 minimum allele coverage: 0
                          coverage: 4 minimum allele coverage: 1
                          coverage: 5 minimum allele coverage: 1
                          coverage: 6 minimum allele coverage: 1
                          coverage: 7 minimum allele coverage: 2
                          coverage: 8 minimum allele coverage: 2
                          coverage: 9 minimum allele coverage: 3
                          coverage: >9 minimum allele coverage: 3

                          Records processsed: 65535 (last chr1:30689710-30689759)Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space
                          [Tue Sep 07 14:08:09 CEST 2010] srma.SRMA done.
                          Runtime.totalMemory()=9544400896
                          Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
                          at java.util.LinkedList.addBefore(LinkedList.java:778)
                          at java.util.LinkedList.add(LinkedList.java:198)
                          at srma.ThreadPoolLinkedList.add(ThreadPoolLinkedList.java:19)
                          at srma.SRMA.processToAddToGraphList(SRMA.java:389)
                          at srma.SRMA.doWork(SRMA.java:222)
                          at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150)
                          at srma.SRMA.main(SRMA.java:92)

                          65535 - just the number for "Records processsed" after which it crashed - is the default of MAX_QUEUE_SIZE and don't understand why there should be a problem fitting that into 10 GB RAM. The BAM file itself (converted from BFAST SAM, sorted and indexed with samtools) is 1.7G with 9230289 mapped reads.
                          Is it a problem of SRMA (version 0.1.7) or of Picard (version 1.29)?
                          I hope someone can provide help since I'd really like to use SRMA for local realignment of my SOLiD data.

                          Best,
                          Barbara
                          Java and Picard are wieldy beasts, and like to use a lot of memory. You can try to adjust the Java parameters, or you can use the C-version found in the source code (in testing right now). I wrote the C-version since memory usage was getting out of hand for no good reason (I echo your frustration). You can find the C-version in the GIT repository.

                          Also, what is your average read depth on this sample?

                          Comment


                          • #28
                            Thank you Nils, I'll ask our sysadmin to make the C-version accessible. In the meantime, I'll just try with 60GB.

                            Average read depth - I don't have exact data for my test file, but it should be a mean of about 10 reads (50 bp SOLiD) per 1 kb window, far from being able to create overflow.

                            Comment


                            • #29
                              Hi,

                              I was just wondering if you got to the bottom of the error below. I have the exact same problem. I am working on some very low depth whole genome data, I'm not sure if it could be because the depth is so low? I used the same command on my other candidate region data and it worked fine.

                              Thanks

                              Originally posted by seq_GA View Post
                              Hi Zee,
                              I alomst tried teh same what u had said above. The first step in getting the intervals is ok. But I get the error during the second step.I am getting an error as below:

                              Code:
                              $sh gatk.sh -T RealignerTargetCreator -I KLM.bam -R  /Genome/hg18/hg18.fa -o output.intervals
                              [B]$ sh gatk.sh -T IndelRealigner -I KLM.bam -R  /Genome/hg18/hg18.fa -targetIntervals output.intervals --output realigned_b.bam[/B]
                              FATAL 13:53:37,074 CommandLineProgram - Exception caught by base Command Line Program.  Stack trace is as follows: 
                              java.lang.RuntimeException: org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step
                                      at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:253)
                                      at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:97)
                              Caused by: org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step
                                      at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner$ReadBin.add(IndelRealigner.java:1286)
                                      at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner.map(IndelRealigner.java:323)
                                      at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner.map(IndelRealigner.java:55)
                                      at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:98)
                                      at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:48)
                                      at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:73)
                                      at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:175)
                                      at org.broadinstitute.sting.gatk.CommandLineExecutable.executeGATK(CommandLineExecutable.java:93)
                                      at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:75)
                                      at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:238)
                                      ... 1 more
                              ------------------------------------------------------------------------------------------
                              The following error has occurred:
                              
                              org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step:
                              
                              Please check your command line arguments for any typos or inconsistencies.
                              Please review our general documentation at http://www.broadinstitute.org/gsa/wiki or contact us via our
                              support site at http://getsatisfaction.com/gsa to report bugs or get help resolving undocumented issues
                              picard dict is available for hg18.fa. Is there anything that I am missing? Thanks.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              35 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              29 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X