SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Local realignment around indels (454 technology) desmo Bioinformatics 2 05-17-2012 06:31 AM
GATK: VCF file for Local realignment around indels jorge Bioinformatics 2 10-11-2011 12:15 AM
Samtools mpileup creates extra large file after local realignment Hkins552 Bioinformatics 7 07-21-2011 02:20 PM
Interpreting Local Realignment results from GATK alma Bioinformatics 0 07-07-2011 03:46 AM
GATK recalibration score and local realignment: what is the first step? m_elena_bioinfo Bioinformatics 0 01-24-2011 03:59 AM

Reply
 
Thread Tools
Old 06-24-2010, 01:09 AM   #1
seq_GA
Senior Member
 
Location: Asiana

Join Date: Feb 2009
Posts: 124
Default Local realignment using GATK and smra

Hi,
I am trying to use GATK to do base quality recalibration followed by local realignment to reduce the flase positive indels.

I am finding problem in getting GATK to work as have few doubts and smra seems very easy in a sigle step to move ahead. Anyone has done the comparison before.

I am struck with GATK where in the first step of realignment step,
Code:
java -jar /path/to/GenomeAnalysisTK.jar \
  -T RealignerTargetCreator \
  -I /path/to/input.bam \
  -R /path/to/reference.fasta \
  [-L intervals] \
  -o /path/to/output.intervals \
  [-B snps,VCF,/path/to/SNP_calls.vcf] \
  [-B indels,VCF,/path/to/indel_calls.vcf] \
  [-B dbsnp,dbsnp,/path/to/dbsnp.rod]
for the option L, what is the format of the file. My region of interest would be around 19000 regions from hg18 (all chromosomes). Regardsing -B option, I have only hg18 rod file downloaded from GATK site. How to proceed these steps.

I would appreciate very much if someone could through light on these queries. Thanks.
seq_GA is offline   Reply With Quote
Old 06-24-2010, 01:30 AM   #2
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

I have used GATK successfully, but it looks like they've updated it since the last version I've been using (they separated mismatches from indels and merged them - now it looks like they do it all in one). I will try this new version you've shown.

SRMA I just tried once, but it crashed. But I really need to give it another fair chance since I really didn't look into the problem.

I was doing recalibration followed by local realignment, but I think perhaps it is better to do local realignment first? Seems to be the order recommended by GATK.

Anyone else have any thoughts on this?
NGSfan is offline   Reply With Quote
Old 06-24-2010, 01:36 AM   #3
seq_GA
Senior Member
 
Location: Asiana

Join Date: Feb 2009
Posts: 124
Default

Thx for your reply.

Are you referring to these steps under samtools protocol.

1.Collect regions around potential indels and clusters of mismatches:
2.Merge intervals
3.Realign reads overlapping specified regions:
4.Write the cleaned alignment file:

GATK manual suggests you to to basequality recalibration followed by local realignment. Its bit confusing in the steps involved.
seq_GA is offline   Reply With Quote
Old 06-24-2010, 01:38 AM   #4
seq_GA
Senior Member
 
Location: Asiana

Join Date: Feb 2009
Posts: 124
Default

whole testing smra also crashed
seq_GA is offline   Reply With Quote
Old 06-24-2010, 02:20 AM   #5
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Quote:
Originally Posted by seq_GA View Post
Thx for your reply.

Are you referring to these steps under samtools protocol.

1.Collect regions around potential indels and clusters of mismatches:
2.Merge intervals
3.Realign reads overlapping specified regions:
4.Write the cleaned alignment file:

GATK manual suggests you to to basequality recalibration followed by local realignment. Its bit confusing in the steps involved.
Yes, exactly! I initially started from samtools protocols , which implies that you do:

1) recalibration with GATK
2) local (MSA) realignment with GATK


But if you look at the GATK powerpoint from Mark DePristo (great set of slides btw!) GATK ppt then you see in the flow chart
on slide #20 show that MSA realignment comes before recalibration. So I am going back to correct my previous runs.

The realignment before recalibration actually makes more sense - since you will fix the reads that had problem alignments, and then recalibrate based on these better alignments!

I don't know how big or small an effect the order of these two steps has, but I think now this is the correct way.

I will try the new version of GATK and get back to you.
NGSfan is offline   Reply With Quote
Old 06-24-2010, 02:30 AM   #6
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

I have run gatk successfully, using version 1.0.3471

This is how my system is configured:

I have a script simply named "gatk" which contains

Quote:
#!/bin/sh

SRC="/somepath/GenomeAnalysisTK-1.0.3471"
MEMORY="4096M"

java -Xmx$MEMORY -jar $SRC/GenomeAnalysisTK.jar $*
Then I just run it in the following way, assuming you got "genome.fasta" and "input.bam". Note that gatk assumes that the sequence dictionary for the genome exists as "genome.dict". I therefore add the Picard step to do the dictionary construction.

Quote:

#create sequence dictionary with Picard

java -jar /opt/picard/dist/CreateSequenceDictionary.jar R=genome.fasta O=genome.dict

#find realigner targets
gatk -T RealignerTargetCreator -I input.bam -R genome.fasta -o output.intervals

# run realigner
gatk -T IndelRealigner -I input.bam -R genome.fasta -targetIntervals output.intervals --output realigned.bam
And it worked fine for me.
zee is offline   Reply With Quote
Old 06-24-2010, 09:36 AM   #7
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by seq_GA View Post
whole testing smra also crashed
What version and could I get an example to debug? I would love this type of feedback.

Quote:
Originally Posted by NGSfan View Post
SRMA I just tried once, but it crashed. But I really need to give it another fair chance since I really didn't look into the problem.
Agian, what version and could I get an example to debug? Without your feedback I cannot support and improve the tool. I am very responsive solving these types of problems
nilshomer is offline   Reply With Quote
Old 06-24-2010, 06:29 PM   #8
seq_GA
Senior Member
 
Location: Asiana

Join Date: Feb 2009
Posts: 124
Default

Hi,

I am using srma-0.1.6 version from http://sourceforge.net/projects/srma/files/
I tried giving around 2G for MAX_HEAP_SIZE but still it went out of memory.

2. As explained in the manual, as my region of interest is around 19000 regions from the whole exome, I constructed a tab delimited file and then sorted it as below (sample)
RANGES=range_sorted.txt
Code:
chr10   100000807       100000927
chr10   100001260       100001500
chr10   100002097       100002217
chr10   100003241       100003601
chr10   100005284       100005519
chr10   100006490       100006719
chr10   100007353       100007593
chr10   100007699       100007939
chr10   100008745       100008985
chr10   100009096       100009216
I get the error as below.

Code:
Allele coverage cutoffs:
coverage:  1    minimum allele coverage: 0
coverage:  2    minimum allele coverage: 0
coverage:  3    minimum allele coverage: 0
coverage:  4    minimum allele coverage: 1
coverage:  5    minimum allele coverage: 1
coverage:  6    minimum allele coverage: 1
coverage:  7    minimum allele coverage: 2
coverage:  8    minimum allele coverage: 2
coverage:  9    minimum allele coverage: 3
coverage: >9    minimum allele coverage: 3
java.lang.Exception: Ranges must be in sorted order in RANGES (line numbers 271-272
        at srma.Ranges.addRange(Ranges.java:188)
        at srma.Ranges.<init>(Ranges.java:36)
        at srma.SRMA.doWork(SRMA.java:178)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150)
        at srma.SRMA.main(SRMA.java:94)
Let me know whether I am missing something here. Thanks.
seq_GA is offline   Reply With Quote
Old 06-24-2010, 06:40 PM   #9
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by seq_GA View Post
Hi,

I am using srma-0.1.6 version from http://sourceforge.net/projects/srma/files/
I tried giving around 2G for MAX_HEAP_SIZE but still it went out of memory.
Did you use the "-Xmx2g" option to Java ("java -Xmx2g -jar srma-0.1.6.jar [options]")? The "MAX_HEAP_SIZE" is a # used internally by SRMA, and is not passed into java. It shouldn't crash if the internal heap size is too small, only some regions will be ignored.

Quote:
Originally Posted by seq_GA View Post
2. As explained in the manual, as my region of interest is around 19000 regions from the whole exome, I constructed a tab delimited file and then sorted it as below (sample)
RANGES=range_sorted.txt
Code:
chr10   100000807       100000927
chr10   100001260       100001500
chr10   100002097       100002217
chr10   100003241       100003601
chr10   100005284       100005519
chr10   100006490       100006719
chr10   100007353       100007593
chr10   100007699       100007939
chr10   100008745       100008985
chr10   100009096       100009216
I get the error as below.

Code:
Allele coverage cutoffs:
coverage:  1    minimum allele coverage: 0
coverage:  2    minimum allele coverage: 0
coverage:  3    minimum allele coverage: 0
coverage:  4    minimum allele coverage: 1
coverage:  5    minimum allele coverage: 1
coverage:  6    minimum allele coverage: 1
coverage:  7    minimum allele coverage: 2
coverage:  8    minimum allele coverage: 2
coverage:  9    minimum allele coverage: 3
coverage: >9    minimum allele coverage: 3
java.lang.Exception: Ranges must be in sorted order in RANGES (line numbers 271-272
        at srma.Ranges.addRange(Ranges.java:188)
        at srma.Ranges.<init>(Ranges.java:36)
        at srma.SRMA.doWork(SRMA.java:178)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:150)
        at srma.SRMA.main(SRMA.java:94)
Let me know whether I am missing something here. Thanks.
The ranges in your file should be in sorted order. It is complaining that the out-of-order line is 271-272. Can you check that line?
nilshomer is offline   Reply With Quote
Old 06-25-2010, 06:34 AM   #10
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Hmm.. the new version of GATK realigner now gives me errors, complaining about read IDs whereas the old version (from April 2010) still works.

Also, GATK complains the "rod' file that I got from GATK is not in the right format for the -B option.

Last edited by NGSfan; 06-25-2010 at 07:19 AM.
NGSfan is offline   Reply With Quote
Old 06-25-2010, 07:30 AM   #11
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Hi Nils,

I am testing SRMA with a sorted bam file from a BFAST pair-end alignment of a whole exome data set from Brainbridge et al (http://genomebiology.com/2010/11/6/R62 , and SRA012615).

Just curious, is there a requirement for the number of threads (must be a power of 2?) I tried 8, 16, and got "unsynchronized addition":

java -Xmx128g -jar /net/chipdata/NGS.analysis/bin/srma/srma-0.1.6.jar I=s_1_paired_sequence.bfast.sorted.bam O=s_1_paired_sequence.bfast.srma.bam R=/net/chipdata/HumanGenome/hg18/hg18.fa NUM_THREADS=16 >&srma_log
[Fri Jun 25 16:24:59 CEST 2010] srma.SRMA INPUT=s_1_paired_sequence.bfast.sorted.bam OUTPUT=s_1_paired_sequence.bfast.srma.bam REFERENCE=/net/chipdata/HumanGenome/hg18/hg18.fa NUM_THREADS=22 OFFSET=20 MIN_MAPQ=0 MINIMUM_ALLELE_PROBABILITY=0.1 MINIMUM_A
LELE_COVERAGE=3 MAXIMUM_TOTAL_COVERAGE=100 CORRECT_BASES=false USE_SEQUENCE_QUALITIES=true QUIET_STDERR=false MAX_HEAP_SIZE=8192 MAX_QUEUE_SIZE=65536 TMP_DIR=/tmp/leparc VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECOR
S_IN_RAM=500000
Allele coverage cutoffs:
coverage: 1 minimum allele coverage: 0
coverage: 2 minimum allele coverage: 0
coverage: 3 minimum allele coverage: 0
coverage: 4 minimum allele coverage: 1
coverage: 5 minimum allele coverage: 1
coverage: 6 minimum allele coverage: 1
coverage: 7 minimum allele coverage: 2
coverage: 8 minimum allele coverage: 2
coverage: 9 minimum allele coverage: 3
coverage: >9 minimum allele coverage: 3
** Warning: option NUM_THREADS currently may not increase performance significantly. **
** Try running multiple processes with RANGES if the speed does not increase. **
java.lang.Exception: Unsynchronized addition
at srma.Graph.addSAMRecord(Graph.java:61)
at srma.SRMA$GraphThread.run(SRMA.java:691)
Please report bugs to srma-help@lists.sourceforge.net
java.lang.Exception: Unsynchronized addition
at srma.Graph.addSAMRecord(Graph.java:61)
at srma.SRMA$GraphThread.run(SRMA.java:691)
Please report bugs to srma-help@lists.sourceforge.net
java.lang.ArrayIndexOutOfBoundsException: -12
at java.util.ArrayList.elementData(ArrayList.java:338)
at java.util.ArrayList.get(ArrayList.java:351)
at srma.Graph.addNode(Graph.java:154)
at srma.Graph.addSAMRecord(Graph.java:109)
at srma.SRMA$GraphThread.run(SRMA.java:691)
Please report bugs to srma-help@lists.sourceforge.net
java.lang.Exception: Unsynchronized addition
at srma.Graph.addSAMRecord(Graph.java:61)
at srma.SRMA$GraphThread.run(SRMA.java:691)
Please report bugs to srma-help@lists.sourceforge.net
java.lang.Exception: Unsynchronized addition
at srma.Graph.addSAMRecord(Graph.java:61)
at srma.SRMA$GraphThread.run(SRMA.java:691)
Please report bugs to srma-help@lists.sourceforge.net

Last edited by NGSfan; 06-25-2010 at 07:33 AM.
NGSfan is offline   Reply With Quote
Old 06-25-2010, 09:10 AM   #12
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by NGSfan View Post
Hi Nils,

I am testing SRMA with a sorted bam file from a BFAST pair-end alignment of a whole exome data set from Brainbridge et al (http://genomebiology.com/2010/11/6/R62 , and SRA012615).

Just curious, is there a requirement for the number of threads (must be a power of 2?) I tried 8, 16, and got "unsynchronized addition":

java -Xmx128g -jar /net/chipdata/NGS.analysis/bin/srma/srma-0.1.6.jar I=s_1_paired_sequence.bfast.sorted.bam O=s_1_paired_sequence.bfast.srma.bam R=/net/chipdata/HumanGenome/hg18/hg18.fa NUM_THREADS=16 >&srma_log
[Fri Jun 25 16:24:59 CEST 2010] srma.SRMA INPUT=s_1_paired_sequence.bfast.sorted.bam OUTPUT=s_1_paired_sequence.bfast.srma.bam REFERENCE=/net/chipdata/HumanGenome/hg18/hg18.fa NUM_THREADS=22 OFFSET=20 MIN_MAPQ=0 MINIMUM_ALLELE_PROBABILITY=0.1 MINIMUM_A
LELE_COVERAGE=3 MAXIMUM_TOTAL_COVERAGE=100 CORRECT_BASES=false USE_SEQUENCE_QUALITIES=true QUIET_STDERR=false MAX_HEAP_SIZE=8192 MAX_QUEUE_SIZE=65536 TMP_DIR=/tmp/leparc VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECOR
S_IN_RAM=500000
Allele coverage cutoffs:
coverage: 1 minimum allele coverage: 0
coverage: 2 minimum allele coverage: 0
coverage: 3 minimum allele coverage: 0
coverage: 4 minimum allele coverage: 1
coverage: 5 minimum allele coverage: 1
coverage: 6 minimum allele coverage: 1
coverage: 7 minimum allele coverage: 2
coverage: 8 minimum allele coverage: 2
coverage: 9 minimum allele coverage: 3
coverage: >9 minimum allele coverage: 3
** Warning: option NUM_THREADS currently may not increase performance significantly. **
** Try running multiple processes with RANGES if the speed does not increase. **
java.lang.Exception: Unsynchronized addition
at srma.Graph.addSAMRecord(Graph.java:61)
at srma.SRMA$GraphThread.run(SRMA.java:691)
Please report bugs to srma-help@lists.sourceforge.net
java.lang.Exception: Unsynchronized addition
at srma.Graph.addSAMRecord(Graph.java:61)
at srma.SRMA$GraphThread.run(SRMA.java:691)
Please report bugs to srma-help@lists.sourceforge.net
java.lang.ArrayIndexOutOfBoundsException: -12
at java.util.ArrayList.elementData(ArrayList.java:338)
at java.util.ArrayList.get(ArrayList.java:351)
at srma.Graph.addNode(Graph.java:154)
at srma.Graph.addSAMRecord(Graph.java:109)
at srma.SRMA$GraphThread.run(SRMA.java:691)
Please report bugs to srma-help@lists.sourceforge.net
java.lang.Exception: Unsynchronized addition
at srma.Graph.addSAMRecord(Graph.java:61)
at srma.SRMA$GraphThread.run(SRMA.java:691)
Please report bugs to srma-help@lists.sourceforge.net
java.lang.Exception: Unsynchronized addition
at srma.Graph.addSAMRecord(Graph.java:61)
at srma.SRMA$GraphThread.run(SRMA.java:691)
Please report bugs to srma-help@lists.sourceforge.net
Don't use threading, it isn't helping running time right now on most systems (not sure why, not a Java expert). I do wan the BAM file so I can debug the synchronization issue (crashing srma will make it stronger). Can you post the BAM file somewhere so I can take a look? For now, try without threads.
nilshomer is offline   Reply With Quote
Old 06-28-2010, 02:05 AM   #13
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

I will try to find a place to host the BAM file for you. It's about 2.6GB after sorting.
NGSfan is offline   Reply With Quote
Old 06-28-2010, 06:22 AM   #14
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default access to Bainbridge whole exome bam file

Hi Nils,

Due to some local issues (proxies/security) I had to chop the bam file up and place it on a file sharing service:

There are five files:

http://www.filedropper.com/bainbridgeexomeaa
http://www.filedropper.com/bainbridgeexomeab
http://www.filedropper.com/bainbridgeexomeac
http://www.filedropper.com/bainbridgeexomead
http://www.filedropper.com/bainbridgeexomeae

You can cat the files back together in the same order (aa,ab,ac,ad,ae) to get the complete bam.

The BAM file is already sorted and ready to go. If you want to have the index (.bai) already made, you can also download it here:

http://www.filedropper.com/s1paireds...bfastsortedbam


Here is the output from samtools "flagstat":

38641522 in total
0 QC failure
0 duplicates
37596789 mapped (97.30%)
38641522 paired in sequencing
19320761 read1
19320761 read2
38641522 properly paired (100.00%)
37188108 with itself and mate mapped
408681 singletons (1.06%)
115342 with mate mapped to a different chr
106661 with mate mapped to a different chr (mapQ>=5)


Please let me know if you have any problems! And thanks for looking into the SRMA issue.
NGSfan is offline   Reply With Quote
Old 06-29-2010, 01:39 AM   #15
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Just some extra notes for you. I just completed the realignment of the reads from that bam file I posted, using GATK's local realignment tool (38 hours on 1 cpu! they do not allow multiple threads with this module). So no corruption or other issues with the bam file.

In parallel , I ran SRMA with a single thread, and looks like it was progressing really well until I got an error. Looks like I should increase the MAX_HEAP_SIZE ? I'll try that and report back

Would it be possible to split the processing by chromosome to simplify things?

Like running chromosomes 1,2,3, etc, to 9 individually, then process in one thread each on the sets: chr 10+11, 12+13, 14+15, 16+17+18, 19+20+21+22 , X+Y . Would thread out nicely on a 16-cpu machine...

[Fri Jun 25 16:33:11 CEST 2010] srma.SRMA INPUT=s_1_paired_sequence.bfast.sorted.bam OUTPUT=s_1_paired_sequence.bfast.srma2.bam REFERENCE=/net/chipdata/HumanGenome/hg18/hg18.fa OFFSET=20 MIN_MAPQ=0 MINIMUM_ALLELE_PROBABILITY=0.1 MINIMUM_ALLELE_COVERAGE
3 MAXIMUM_TOTAL_COVERAGE=100 CORRECT_BASES=false USE_SEQUENCE_QUALITIES=true QUIET_STDERR=false MAX_HEAP_SIZE=8192 MAX_QUEUE_SIZE=65536 NUM_THREADS=1 TMP_DIR=/tmp/leparc VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECOR
S_IN_RAM=500000
Allele coverage cutoffs:
coverage: 1 minimum allele coverage: 0
coverage: 2 minimum allele coverage: 0
coverage: 3 minimum allele coverage: 0
coverage: 4 minimum allele coverage: 1
coverage: 5 minimum allele coverage: 1
coverage: 6 minimum allele coverage: 1
coverage: 7 minimum allele coverage: 2
coverage: 8 minimum allele coverage: 2
coverage: 9 minimum allele coverage: 3
coverage: >9 minimum allele coverage: 3
Records processsed: 196556 (last chr1:4875295-4875369)
Records processsed: 327629 (last chr1:11802305-11802379)
Records processsed: 458668 (last chr1:16329237-16329311)
Records processsed: 589729 (last chr1:19505170-19505244)
........... output trimmed .............
Records processsed: 8580872 (last chr3:129300240-129300314)
Records processsed: 8717321 (last chr3:146329681-136329753)
Records processsed: 8848435 (last chr3:158581531-158581605)
Exception in thread "Thread-274" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2772)
at java.util.Arrays.copyOf(Arrays.java:2746)
at java.util.ArrayList.ensureCapacity(ArrayList.java:187)
at java.util.ArrayList.add(ArrayList.java:395)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
at java.util.ArrayList$SubList.add(ArrayList.java:928)
java.util.ConcurrentModificationException
at java.util.ArrayList$SubList.checkForComodification(ArrayList.java:1091)
at java.util.ArrayList$SubList.size(ArrayList.java:921)
at srma.Graph.contains(Graph.java:195)
at srma.Graph.addNode(Graph.java:143)
at srma.Graph.addSAMRecord(Graph.java:109)
at srma.SRMA$GraphThread.run(SRMA.java:691)
Please report bugs to srma-help@lists.sourceforge.net
NGSfan is offline   Reply With Quote
Old 06-29-2010, 02:12 AM   #16
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Splitting the data up is exactly what I did to simplify things. You can give srma a set of ranges using the "RANGES=ranges.txt" with

e.g.

Quote:
chr1 1 3224608765
chr2 1 6342342344
chr3 1 6453554555
And adding "-Xmx4G" increases the memory for java to use to 4Gb. Im still not sure how much is enough based on coverage so I keep it reasonably high.

So just make a "ranges.txt" for each chromosome and launch your separate threads
zee is offline   Reply With Quote
Old 06-29-2010, 02:21 AM   #17
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Quote:
Originally Posted by zee View Post
Splitting the data up is exactly what I did to simplify things. You can give srma a set of ranges using the "RANGES=ranges.txt" with

e.g.



And adding "-Xmx4G" increases the memory for java to use to 4Gb. Im still not sure how much is enough based on coverage so I keep it reasonably high.

So just make a "ranges.txt" for each chromosome and launch your separate threads

Ahh ! Nice! Thanks for the tip. This makes life easier since I constantly have different target enrichment designs, so ranges will constantly change. Just having it at a chromosome level makes it more flexible.
NGSfan is offline   Reply With Quote
Old 06-29-2010, 02:25 AM   #18
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Just a one last note on that. Ensure that the ranges.txt is sorted by chromosomes and position.

Quote:
Originally Posted by NGSfan View Post
Ahh ! Nice! Thanks for the tip. This makes life easier since I constantly have different target enrichment designs, so ranges will constantly change. Just having it at a chromosome level makes it more flexible.
zee is offline   Reply With Quote
Old 06-29-2010, 11:09 PM   #19
seq_GA
Senior Member
 
Location: Asiana

Join Date: Feb 2009
Posts: 124
Default

Hi Zee,
I alomst tried teh same what u had said above. The first step in getting the intervals is ok. But I get the error during the second step.I am getting an error as below:

Code:
$sh gatk.sh -T RealignerTargetCreator -I KLM.bam -R  /Genome/hg18/hg18.fa -o output.intervals
$ sh gatk.sh -T IndelRealigner -I KLM.bam -R  /Genome/hg18/hg18.fa -targetIntervals output.intervals --output realigned_b.bam
FATAL 13:53:37,074 CommandLineProgram - Exception caught by base Command Line Program.  Stack trace is as follows: 
java.lang.RuntimeException: org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:253)
        at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:97)
Caused by: org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step
        at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner$ReadBin.add(IndelRealigner.java:1286)
        at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner.map(IndelRealigner.java:323)
        at org.broadinstitute.sting.gatk.walkers.indels.IndelRealigner.map(IndelRealigner.java:55)
        at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:98)
        at org.broadinstitute.sting.gatk.traversals.TraverseReads.traverse(TraverseReads.java:48)
        at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:73)
        at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:175)
        at org.broadinstitute.sting.gatk.CommandLineExecutable.executeGATK(CommandLineExecutable.java:93)
        at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:75)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:238)
        ... 1 more
------------------------------------------------------------------------------------------
The following error has occurred:

org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step:

Please check your command line arguments for any typos or inconsistencies.
Please review our general documentation at http://www.broadinstitute.org/gsa/wiki or contact us via our
support site at http://getsatisfaction.com/gsa to report bugs or get help resolving undocumented issues
picard dict is available for hg18.fa. Is there anything that I am missing? Thanks.
seq_GA is offline   Reply With Quote
Old 06-29-2010, 11:20 PM   #20
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

I have not seen this error before

Quote:
org.broadinstitute.sting.utils.StingException: Read G004_1:5:9:3288:12810 does not overlap the previous read in this interval; please ensure that you are using the same input bam that was used in the RealignerTargetCreator step:
But gatk is complaining that this read does not overlap the interval .. weird.

Perhaps you could pull out the location of that read and compare it to the start and end of the interval in "output.intervals".
It looks as though gatk does not like aligning areas where reads do not overlap but perhaps that is a question of algorithm design.

I am using version 1.0.3471,.
zee is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:12 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO