Hi folks,
here comes my first question for you. I'm trying to remove duplicates from a big sorted merged BAM-file (~270 GB) with the help of Picard's MarkDuplicate function, but I'm running into OutOfMemoryErrors all the time. I'm kind of new to the real world sequencing industry and would appreciate any help you can give me.
That's the command I'm using:
The ErrorMessag usually looks like this after running around 8 hours:
The machine I'm running it on has 48275 MB RAM and 2000 MB Swap.
Please tell me, if you need mor info, if I'm doing something completley wrong or the amount of memory just isn't enough to get a result or whatever. Thanks in advance.
here comes my first question for you. I'm trying to remove duplicates from a big sorted merged BAM-file (~270 GB) with the help of Picard's MarkDuplicate function, but I'm running into OutOfMemoryErrors all the time. I'm kind of new to the real world sequencing industry and would appreciate any help you can give me.
That's the command I'm using:
Code:
/usr/lib/jvm/java-1.6.0-ibm-1.6.0.8.x86_64/jre/bin/java -jar -Xmx40g /illumina/tools/picard-tools-1.45/MarkDuplicates.jar INPUT=BL14_sorted_merged.bam OUTPUT=BL14_sorted_merged_deduped.bam METRICS_FILE=metrics.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=LENIENT TMP_DIR=/illumina/runs/temp/
Code:
Exception in thread "main" java.lang.OutOfMemoryError at net.sf.samtools.util.SortingLongCollection.<init>(SortingLongCollection.java:101) at net.sf.picard.sam.MarkDuplicates.generateDuplicateIndexes(MarkDuplicates.java:443) at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:115) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:158) at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:97)
Please tell me, if you need mor info, if I'm doing something completley wrong or the amount of memory just isn't enough to get a result or whatever. Thanks in advance.
Comment