Seqanswers Leaderboard Ad

**zee** · 06-29-2010, 01:12 AM

Splitting the data up is exactly what I did to simplify things. You can give srma a set of ranges using the "RANGES=ranges.txt" with

e.g.

chr1 1 3224608765
chr2 1 6342342344
chr3 1 6453554555

And adding "-Xmx4G" increases the memory for java to use to 4Gb. Im still not sure how much is enough based on coverage so I keep it reasonably high.

So just make a "ranges.txt" for each chromosome and launch your separate threads

**NGSfan** · 06-29-2010, 01:21 AM

Originally posted by zee View Post

Splitting the data up is exactly what I did to simplify things. You can give srma a set of ranges using the "RANGES=ranges.txt" with

e.g.

And adding "-Xmx4G" increases the memory for java to use to 4Gb. Im still not sure how much is enough based on coverage so I keep it reasonably high.

So just make a "ranges.txt" for each chromosome and launch your separate threads

Ahh ! Nice! Thanks for the tip. This makes life easier since I constantly have different target enrichment designs, so ranges will constantly change. Just having it at a chromosome level makes it more flexible.

**zee** · 06-29-2010, 01:25 AM

Just a one last note on that. Ensure that the ranges.txt is sorted by chromosomes and position.

Originally posted by NGSfan View Post

Ahh ! Nice! Thanks for the tip. This makes life easier since I constantly have different target enrichment designs, so ranges will constantly change. Just having it at a chromosome level makes it more flexible.

**seq_GA** · 06-29-2010, 10:09 PM

Hi Zee,
I alomst tried teh same what u had said above. The first step in getting the intervals is ok. But I get the error during the second step.I am getting an error as below:

Code:

$sh gatk.sh -T RealignerTargetCreator -I KLM.bam -R  /Genome/hg18/hg18.fa -o output.intervals
[B]$ sh gatk.sh -T IndelRealigner -I KLM.bam -R  /Genome/hg18/hg18.fa -targetIntervals output.intervals --output realigned_b.bam[/B]
FATAL 13:53:37,074 CommandLineProgram - Exception caught by base Command Line Program.  Stack trace is as follows: 
java.lang.RuntimeException: org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:253)
        at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:97)
Caused by: org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step
        at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner$ReadBin.add(IndelRealigner.java:1286)
        at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner.map(IndelRealigner.java:323)
        at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner.map(IndelRealigner.java:55)
        at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:98)
        at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:48)
        at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:73)
        at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:175)
        at org.broadinstitute.sting.gatk.CommandLineExecutable.executeGATK(CommandLineExecutable.java:93)
        at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:75)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:238)
        ... 1 more
------------------------------------------------------------------------------------------
The following error has occurred:

org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step:

Please check your command line arguments for any typos or inconsistencies.
Please review our general documentation at http://www.broadinstitute.org/gsa/wiki or contact us via our
support site at http://getsatisfaction.com/gsa to report bugs or get help resolving undocumented issues

picard dict is available for hg18.fa. Is there anything that I am missing? Thanks.

**zee** · 06-29-2010, 10:20 PM

I have not seen this error before

org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step:

But gatk is complaining that this read does not overlap the interval .. weird.

Perhaps you could pull out the location of that read and compare it to the start and end of the interval in "output.intervals".
It looks as though gatk does not like aligning areas where reads do not overlap but perhaps that is a question of algorithm design.

I am using version 1.0.3471,.

**seq_GA** · 06-29-2010, 10:26 PM

I really don't understand why it is giving such error. I have also downloaded the recent version GenomeAnalysisTK-1.0.3471. May b I will launch the query at teh GATK help site..Thanks Zee for your help.

**vgrubor** · 07-09-2010, 07:48 AM

I had an older version of GATK and did a checkout 3742 yesterday and it fixed this error.

**vgrubor** · 07-28-2010, 10:35 AM

Originally posted by NGSfan View Post

Yes, exactly! I initially started from samtools protocols , which implies that you do:

1) recalibration with GATK
2) local (MSA) realignment with GATK

But if you look at the GATK powerpoint from Mark DePristo (great set of slides btw!) GATK ppt then you see in the flow chart
on slide #20 show that MSA realignment comes before recalibration. So I am going back to correct my previous runs.

The realignment before recalibration actually makes more sense - since you will fix the reads that had problem alignments, and then recalibrate based on these better alignments!

I don't know how big or small an effect the order of these two steps has, but I think now this is the correct way.

I will try the new version of GATK and get back to you.

If you look at the same set of slides, on slide #2 Mark has a flow chart of the pipeline and there quality score recalibration precedes realignment which is the opposite of what he states on slide #20. This suggests to me that the order doesn't matter.

**adaptivegenome** · 08-07-2010, 07:09 AM

Hi,

I have had similar troubles. I tried splitting my range file into one for each chromosome. I get the following error:

Allele coverage cutoffs:
coverage: 1 minimum allele coverage: 0
coverage: 2 minimum allele coverage: 0
coverage: 3 minimum allele coverage: 0
coverage: 4 minimum allele coverage: 1
coverage: 5 minimum allele coverage: 1
coverage: 6 minimum allele coverage: 1
coverage: 7 minimum allele coverage: 2
coverage: 8 minimum allele coverage: 2
coverage: 9 minimum allele coverage: 3
coverage: >9 minimum allele coverage: 3
java.lang.Exception: SAM/BAM file is not co-ordinate sorted.
at srma.SRMA.doWork(SRMA.java:253)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150)
at srma.SRMA.main(SRMA.java:94)

Has anyone had this error before? My BAM file is sorted and indexed. I have previously realigned this exact same BAM using GSA. Any help would be appreciated!

Thanks.

**nilshomer** · 08-07-2010, 04:31 PM

Originally posted by genericforms View Post

Hi,

I have had similar troubles. I tried splitting my range file into one for each chromosome. I get the following error:

Allele coverage cutoffs:
coverage: 1 minimum allele coverage: 0
coverage: 2 minimum allele coverage: 0
coverage: 3 minimum allele coverage: 0
coverage: 4 minimum allele coverage: 1
coverage: 5 minimum allele coverage: 1
coverage: 6 minimum allele coverage: 1
coverage: 7 minimum allele coverage: 2
coverage: 8 minimum allele coverage: 2
coverage: 9 minimum allele coverage: 3
coverage: >9 minimum allele coverage: 3
java.lang.Exception: SAM/BAM file is not co-ordinate sorted.
at srma.SRMA.doWork(SRMA.java:253)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150)
at srma.SRMA.main(SRMA.java:94)

Has anyone had this error before? My BAM file is sorted and indexed. I have previously realigned this exact same BAM using GSA. Any help would be appreciated!

Thanks.

What sort order? What program did you use to sort the BAM file? If it is still not working, feel free to send me the data and I can debug. I love bugs.

**epigen** · 09-07-2010, 05:21 AM

Another Java heap space problem with SRMA

I give SRMA 10 GB RAM with -Xmx10g but it keeps crashing after half an hour, giving the error message:
[Tue Sep 07 13:40:23 CEST 2010] srma.SRMA REFERENCE=hg19.fasta CORRECT_BASES=true QUIET_STDERR=false OFFSET=20 MIN_MAPQ=0 MINIMUM_ALLELE_PROBABILITY=0.1 MINIMUM_ALLELE_COVERAGE=3 MAXIMUM_TOTAL_COVERAGE=100 USE_SEQUENCE_QUALITIES=true MAX_HEAP_SIZE=8192 MAX_QUEUE_SIZE=65536 NUM_THREADS=1 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000
Allele coverage cutoffs:
coverage: 1 minimum allele coverage: 0
coverage: 2 minimum allele coverage: 0
coverage: 3 minimum allele coverage: 0
coverage: 4 minimum allele coverage: 1
coverage: 5 minimum allele coverage: 1
coverage: 6 minimum allele coverage: 1
coverage: 7 minimum allele coverage: 2
coverage: 8 minimum allele coverage: 2
coverage: 9 minimum allele coverage: 3
coverage: >9 minimum allele coverage: 3

Records processsed: 65535 (last chr1:30689710-30689759)Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space
[Tue Sep 07 14:08:09 CEST 2010] srma.SRMA done.
Runtime.totalMemory()=9544400896
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.LinkedList.addBefore(LinkedList.java:778)
at java.util.LinkedList.add(LinkedList.java:198)
at srma.ThreadPoolLinkedList.add(ThreadPoolLinkedList.java:19)
at srma.SRMA.processToAddToGraphList(SRMA.java:389)
at srma.SRMA.doWork(SRMA.java:222)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150)
at srma.SRMA.main(SRMA.java:92)

65535 - just the number for "Records processsed" after which it crashed - is the default of MAX_QUEUE_SIZE and don't understand why there should be a problem fitting that into 10 GB RAM. The BAM file itself (converted from BFAST SAM, sorted and indexed with samtools) is 1.7G with 9230289 mapped reads.
Is it a problem of SRMA (version 0.1.7) or of Picard (version 1.29)?
I hope someone can provide help since I'd really like to use SRMA for local realignment of my SOLiD data.

Best,
Barbara

**nilshomer** · 09-07-2010, 07:40 AM

Originally posted by epigen View Post

I give SRMA 10 GB RAM with -Xmx10g but it keeps crashing after half an hour, giving the error message:
[Tue Sep 07 13:40:23 CEST 2010] srma.SRMA REFERENCE=hg19.fasta CORRECT_BASES=true QUIET_STDERR=false OFFSET=20 MIN_MAPQ=0 MINIMUM_ALLELE_PROBABILITY=0.1 MINIMUM_ALLELE_COVERAGE=3 MAXIMUM_TOTAL_COVERAGE=100 USE_SEQUENCE_QUALITIES=true MAX_HEAP_SIZE=8192 MAX_QUEUE_SIZE=65536 NUM_THREADS=1 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000
Allele coverage cutoffs:
coverage: 1 minimum allele coverage: 0
coverage: 2 minimum allele coverage: 0
coverage: 3 minimum allele coverage: 0
coverage: 4 minimum allele coverage: 1
coverage: 5 minimum allele coverage: 1
coverage: 6 minimum allele coverage: 1
coverage: 7 minimum allele coverage: 2
coverage: 8 minimum allele coverage: 2
coverage: 9 minimum allele coverage: 3
coverage: >9 minimum allele coverage: 3

Records processsed: 65535 (last chr1:30689710-30689759)Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space
[Tue Sep 07 14:08:09 CEST 2010] srma.SRMA done.
Runtime.totalMemory()=9544400896
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.LinkedList.addBefore(LinkedList.java:778)
at java.util.LinkedList.add(LinkedList.java:198)
at srma.ThreadPoolLinkedList.add(ThreadPoolLinkedList.java:19)
at srma.SRMA.processToAddToGraphList(SRMA.java:389)
at srma.SRMA.doWork(SRMA.java:222)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150)
at srma.SRMA.main(SRMA.java:92)

65535 - just the number for "Records processsed" after which it crashed - is the default of MAX_QUEUE_SIZE and don't understand why there should be a problem fitting that into 10 GB RAM. The BAM file itself (converted from BFAST SAM, sorted and indexed with samtools) is 1.7G with 9230289 mapped reads.
Is it a problem of SRMA (version 0.1.7) or of Picard (version 1.29)?
I hope someone can provide help since I'd really like to use SRMA for local realignment of my SOLiD data.

Best,
Barbara

Java and Picard are wieldy beasts, and like to use a lot of memory. You can try to adjust the Java parameters, or you can use the C-version found in the source code (in testing right now). I wrote the C-version since memory usage was getting out of hand for no good reason (I echo your frustration). You can find the C-version in the GIT repository.

Also, what is your average read depth on this sample?

**epigen** · 09-07-2010, 09:24 AM

Thank you Nils, I'll ask our sysadmin to make the C-version accessible. In the meantime, I'll just try with 60GB.

Average read depth - I don't have exact data for my test file, but it should be a mean of about 10 reads (50 bp SOLiD) per 1 kb window, far from being able to create overflow.

**mimi_lupton** · 01-17-2011, 09:10 AM

Hi,

I was just wondering if you got to the bottom of the error below. I have the exact same problem. I am working on some very low depth whole genome data, I'm not sure if it could be because the depth is so low? I used the same command on my other candidate region data and it worked fine.

Thanks

Originally posted by seq_GA View Post

Hi Zee,
I alomst tried teh same what u had said above. The first step in getting the intervals is ok. But I get the error during the second step.I am getting an error as below:

Code:

$sh gatk.sh -T RealignerTargetCreator -I KLM.bam -R  /Genome/hg18/hg18.fa -o output.intervals
[B]$ sh gatk.sh -T IndelRealigner -I KLM.bam -R  /Genome/hg18/hg18.fa -targetIntervals output.intervals --output realigned_b.bam[/B]
FATAL 13:53:37,074 CommandLineProgram - Exception caught by base Command Line Program.  Stack trace is as follows: 
java.lang.RuntimeException: org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:253)
        at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:97)
Caused by: org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step
        at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner$ReadBin.add(IndelRealigner.java:1286)
        at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner.map(IndelRealigner.java:323)
        at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner.map(IndelRealigner.java:55)
        at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:98)
        at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:48)
        at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:73)
        at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:175)
        at org.broadinstitute.sting.gatk.CommandLineExecutable.executeGATK(CommandLineExecutable.java:93)
        at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:75)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:238)
        ... 1 more
------------------------------------------------------------------------------------------
The following error has occurred:

org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step:

Please check your command line arguments for any typos or inconsistencies.
Please review our general documentation at http://www.broadinstitute.org/gsa/wiki or contact us via our
support site at http://getsatisfaction.com/gsa to report bugs or get help resolving undocumented issues

picard dict is available for hg18.fa. Is there anything that I am missing? Thanks.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News