I mapped two runs of SOLiD paired end reads with BFAST, converted into BAMs, merged them and now want to remove duplicates using Picard, but it gives the following error:
[Fri Dec 10 11:32:00 CET 2010] net.sf.picard.sam.MarkDuplicates done.
Runtime.totalMemory()=18422956032
Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: null:656_1000_1619
at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:273)
at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:113)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:156)
at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:97)
I guess the problem is that the same read ID (656_1000_1619) was present in both runs. How can I tell Picard that it's from different runs? Would it help to add read group information before merging? Or would I need to make unique IDs - and how to do that in a BAM file?! (I'll also be given BAM files from the BioScope pipeline so no way for me to change anything before that.)
Thanks in advance for any useful suggestions.
Barbara
[Fri Dec 10 11:32:00 CET 2010] net.sf.picard.sam.MarkDuplicates done.
Runtime.totalMemory()=18422956032
Exception in thread "main" net.sf.picard.PicardException: Value was put into PairInfoMap more than once. 1: null:656_1000_1619
at net.sf.picard.sam.CoordinateSortedPairInfoMap.ensureSequenceLoaded(CoordinateSortedPairInfoMap.java:124)
at net.sf.picard.sam.CoordinateSortedPairInfoMap.remove(CoordinateSortedPairInfoMap.java:78)
at net.sf.picard.sam.DiskReadEndsMap.remove(DiskReadEndsMap.java:61)
at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:273)
at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:113)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:156)
at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:97)
I guess the problem is that the same read ID (656_1000_1619) was present in both runs. How can I tell Picard that it's from different runs? Would it help to add read group information before merging? Or would I need to make unique IDs - and how to do that in a BAM file?! (I'll also be given BAM files from the BioScope pipeline so no way for me to change anything before that.)
Thanks in advance for any useful suggestions.
Barbara
Comment