![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Error with MarkDuplicates in Picard | slowsmile | Bioinformatics | 13 | 11-01-2015 04:16 AM |
How to use Picard's MarkDuplicates | cliff | Bioinformatics | 12 | 01-26-2015 11:56 PM |
MarkDuplicates in picard | bair | Bioinformatics | 3 | 12-23-2010 12:00 PM |
picard markduplicates on huge files | rcorbett | Bioinformatics | 2 | 09-17-2010 05:39 AM |
Picard MarkDuplicates | wangzkai | Bioinformatics | 2 | 05-18-2010 10:14 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Heidelberg Join Date: May 2011
Posts: 8
|
![]()
Hi folks,
here comes my first question for you. I'm trying to remove duplicates from a big sorted merged BAM-file (~270 GB) with the help of Picard's MarkDuplicate function, but I'm running into OutOfMemoryErrors all the time. I'm kind of new to the real world sequencing industry and would appreciate any help you can give me. That's the command I'm using: Code:
/usr/lib/jvm/java-1.6.0-ibm-1.6.0.8.x86_64/jre/bin/java -jar -Xmx40g /illumina/tools/picard-tools-1.45/MarkDuplicates.jar INPUT=BL14_sorted_merged.bam OUTPUT=BL14_sorted_merged_deduped.bam METRICS_FILE=metrics.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=LENIENT TMP_DIR=/illumina/runs/temp/ Code:
Exception in thread "main" java.lang.OutOfMemoryError at net.sf.samtools.util.SortingLongCollection.<init>(SortingLongCollection.java:101) at net.sf.picard.sam.MarkDuplicates.generateDuplicateIndexes(MarkDuplicates.java:443) at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:115) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:158) at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:97) Please tell me, if you need mor info, if I'm doing something completley wrong or the amount of memory just isn't enough to get a result or whatever. Thanks in advance. ![]() |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: Heidelberg Join Date: May 2011
Posts: 8
|
![]()
It seems I've finally found a working set of arguments! After more than 14 hours it's still running! Fingers crossed, it keeps doing so and finishs successfully eventually.
|
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: USA Join Date: Jan 2011
Posts: 105
|
![]()
Do you mind posting your working set of arguments? I'm in a very similar situation with this error.
|
![]() |
![]() |
![]() |
#4 | |
Junior Member
Location: Heidelberg Join Date: May 2011
Posts: 8
|
![]() Quote:
[Samtools-help] Picard MarkDuplicates memory error on very large file In short: Less heap makes Picard more stable. Xmx4g seems optimal. |
|
![]() |
![]() |
![]() |
#5 | ||
Member
Location: Rochester, NY Join Date: Jan 2013
Posts: 43
|
![]()
Hi, I am having a lot of trouble w MarkDuplicates on some of my bam files. It was throwing the same error as shown in this forum:
Quote:
1. -Xmx2g (this is the most that my cluster is allowing me for some reason) : this allowed the program to run longer but still throws the same error 2. MAX_RECORDS_IN_RAM=5000000: this gave me a different error (below) Quote:
|
||
![]() |
![]() |
![]() |
#6 | |||
Member
Location: Rochester, NY Join Date: Jan 2013
Posts: 43
|
![]() Quote:
Quote:
Quote:
|
|||
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
Are you sure you are running 64-bit java (wonder if that is the reason it is only allowing you to allocate 2G to the heap space)? Both 32-bit and 64-bit java may be installed on your cluster.
Can you post the output of Code:
$ java -version |
![]() |
![]() |
![]() |
#8 |
Member
Location: Rochester, NY Join Date: Jan 2013
Posts: 43
|
![]()
Thank you so much Geno
I am using java 7 java version "1.7.0_11" Java(TM) SE Runtime Environment (build 1.7.0_11-b21) Java HotSpot(TM) Server VM (build 23.6-b04, mixed mode) |
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]() |
![]() |
![]() |
![]() |
#10 |
Member
Location: Rochester, NY Join Date: Jan 2013
Posts: 43
|
![]()
Error: This Java instance does not support a 64-bit JVM.
is what I get. So I guess I am running 32 bit. Could this be my problem? I am a little confused bc I don't have trouble running MarkDuplicates on my old bam files until now, just our most recent ones. |
![]() |
![]() |
![]() |
#11 | |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]() Quote:
Can you look around to see if there is 64-bit version of Java available on your cluster? Are these BAM files larger than previous one? |
|
![]() |
![]() |
![]() |
#12 |
Member
Location: Rochester, NY Join Date: Jan 2013
Posts: 43
|
![]()
Yes, these BAM files are slightly larger. I will see if I can use 64-bit java on our cluster...thank you so much Geno for you suggestion!
|
![]() |
![]() |
![]() |
#13 |
Member
Location: Rochester, NY Join Date: Jan 2013
Posts: 43
|
![]()
So, I tried using 64bit java and using the -Xmx4g option. This allowed markduplicates to run longer (72min) and then ran out of memory again. any thoughts?
![]() |
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,080
|
![]()
Are you sure the process ran out of RAM or did it run out of temp space on disk? How big is the BAM file?
|
![]() |
![]() |
![]() |
#15 | |
Member
Location: Rochester, NY Join Date: Jan 2013
Posts: 43
|
![]() Quote:
in my case the solution was: I used -Xmx8g and that ran fine...so I guess -Xmx4g is not always optimal. Thank you Geno for all your help and 64bit Java was definitely the way to go. |
|
![]() |
![]() |
![]() |
#16 |
Junior Member
Location: Cambridge, UK Join Date: Jul 2013
Posts: 9
|
![]()
The biobambam package contains a tool called bammarkduplicates. It produces results which should be quite similar to those of Picard's MarkDuplicates tool while avoiding the sometimes high memory requirements of the Java implementation. The source code is available on github at https://github.com/gt1/biobambam, binaries for some versions of Linux are at ftp://ftp.sanger.ac.uk/pub/users/gt1/biobambam and also on launchpad https://launchpad.net/biobambam . It was developed at the Sanger Institute because the Java tool failed with out of memory type errors for a number of (at least locally high depth) BAM files, which required manual intervention (rerunning compute jobs with higher memory settings, which blocks otherwise free CPU cores because a single one is using all the RAM). If someone is interested in the algorithmic background, there is a preprint at http://arxiv.org/abs/1306.0836 .
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|