Dear All
I am still on the learning curve with the GATK tool but I encountered an error at the duplicates marking step with Picard tool.
The procedure I did is the following:
I generated bam files for each sample using tophat 1.33 and I sorted
each bam file (one file per sample) using picard ReorderSam.jar to
hg19 reference genome.
After that I added read group information using Picard
AddOrReplaceReadGroups.jar.
Then I tried to remove pair duplicates using the MarkDuplicates.jar in
Picard. However, I encountered error at this step and failed to
generated the duplicates-removed files after running the Picard code.
The errors I received are like the following:
I read the log carefully but cannot figure out the source of error.
What does "Value was put into PairInfoMap more than once" mean here?
Can you help me resolve this problem?
Thanks a lot
I am still on the learning curve with the GATK tool but I encountered an error at the duplicates marking step with Picard tool.
The procedure I did is the following:
I generated bam files for each sample using tophat 1.33 and I sorted
each bam file (one file per sample) using picard ReorderSam.jar to
hg19 reference genome.
After that I added read group information using Picard
AddOrReplaceReadGroups.jar.
Then I tried to remove pair duplicates using the MarkDuplicates.jar in
Picard. However, I encountered error at this step and failed to
generated the duplicates-removed files after running the Picard code.
The errors I received are like the following:
[Thu Feb 16 15:06:56 EST 2012] net.sf.picard.sam.MarkDuplicates
INPUT=[/media/FreeAgent GoFlex
Drive/RNAseq-coloncancer/LID46437/tophat_out/sorted_GP.bam]
OUTPUT=/media/FreeAgent GoFlex
Drive/RNAseq-coloncancer/LID46437/tophat_out/marked.bam
METRICS_FILE=/media/FreeAgent GoFlex
Drive/RNAseq-coloncancer/LID46437/tophat_out/metrics
REMOVE_DUPLICATES=false ASSUME_SORTED=false
MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000
MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000
SORTING_COLLECTION_SIZE_RATIO=0.25
READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).*
OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false
VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5
MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Thu Feb 16 15:06:56 EST 2012] Executing as
slowsmile@slowsmile-HP-xw8600-Workstation on Linux 3.0.0-15-generic
amd64; OpenJDK 64-Bit Server VM 1.7.0_147-icedtea-b147; Picard
version: 1.60(1086)
INFO 2012-02-16 15:06:56 MarkDuplicates Start of doWork freeMemory:
124147272; totalMemory: 125698048; maxMemory: 1866006528
INFO 2012-02-16 15:06:56 MarkDuplicates Reading input file and
constructing read end information.
INFO 2012-02-16 15:06:56 MarkDuplicates Will retain up to 7404787 data
points before spilling to disk.
INFO 2012-02-16 15:07:14 MarkDuplicates Read 1000000 records. Tracking
129157 as yet unmatched pairs. 6550 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:19 MarkDuplicates Read 2000000 records. Tracking
136196 as yet unmatched pairs. 9506 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:24 MarkDuplicates Read 3000000 records. Tracking
190648 as yet unmatched pairs. 61032 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:29 MarkDuplicates Read 4000000 records. Tracking
144992 as yet unmatched pairs. 9135 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:34 MarkDuplicates Read 5000000 records. Tracking
180193 as yet unmatched pairs. 36398 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:39 MarkDuplicates Read 6000000 records. Tracking
186193 as yet unmatched pairs. 35242 records in RAM. Last sequence
index: 0
[Thu Feb 16 15:07:42 EST 2012] net.sf.picard.sam.MarkDuplicates done.
Elapsed time: 0.78 minutes.
Runtime.totalMemory()=1352466432
Exception in thread "main" net.sf.picard.PicardException: Value was
put into PairInfoMap more than once. 1:
HT29.LANE1:HWI-ST978:1370AHMACXX:5:1207:8810:84360
at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:343)
at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:122)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:106)
INPUT=[/media/FreeAgent GoFlex
Drive/RNAseq-coloncancer/LID46437/tophat_out/sorted_GP.bam]
OUTPUT=/media/FreeAgent GoFlex
Drive/RNAseq-coloncancer/LID46437/tophat_out/marked.bam
METRICS_FILE=/media/FreeAgent GoFlex
Drive/RNAseq-coloncancer/LID46437/tophat_out/metrics
REMOVE_DUPLICATES=false ASSUME_SORTED=false
MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000
MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000
SORTING_COLLECTION_SIZE_RATIO=0.25
READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).*
OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false
VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5
MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Thu Feb 16 15:06:56 EST 2012] Executing as
slowsmile@slowsmile-HP-xw8600-Workstation on Linux 3.0.0-15-generic
amd64; OpenJDK 64-Bit Server VM 1.7.0_147-icedtea-b147; Picard
version: 1.60(1086)
INFO 2012-02-16 15:06:56 MarkDuplicates Start of doWork freeMemory:
124147272; totalMemory: 125698048; maxMemory: 1866006528
INFO 2012-02-16 15:06:56 MarkDuplicates Reading input file and
constructing read end information.
INFO 2012-02-16 15:06:56 MarkDuplicates Will retain up to 7404787 data
points before spilling to disk.
INFO 2012-02-16 15:07:14 MarkDuplicates Read 1000000 records. Tracking
129157 as yet unmatched pairs. 6550 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:19 MarkDuplicates Read 2000000 records. Tracking
136196 as yet unmatched pairs. 9506 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:24 MarkDuplicates Read 3000000 records. Tracking
190648 as yet unmatched pairs. 61032 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:29 MarkDuplicates Read 4000000 records. Tracking
144992 as yet unmatched pairs. 9135 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:34 MarkDuplicates Read 5000000 records. Tracking
180193 as yet unmatched pairs. 36398 records in RAM. Last sequence
index: 0
INFO 2012-02-16 15:07:39 MarkDuplicates Read 6000000 records. Tracking
186193 as yet unmatched pairs. 35242 records in RAM. Last sequence
index: 0
[Thu Feb 16 15:07:42 EST 2012] net.sf.picard.sam.MarkDuplicates done.
Elapsed time: 0.78 minutes.
Runtime.totalMemory()=1352466432
Exception in thread "main" net.sf.picard.PicardException: Value was
put into PairInfoMap more than once. 1:
HT29.LANE1:HWI-ST978:1370AHMACXX:5:1207:8810:84360
at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:343)
at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:122)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:106)
I read the log carefully but cannot figure out the source of error.
What does "Value was put into PairInfoMap more than once" mean here?
Can you help me resolve this problem?
Thanks a lot
Comment