Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picard tools out of memory: PermGen

    Hi all, first post. Great site!

    Thought I'd share a new problem... I'm just starting with Picard tools (version 1.56) to estimate redundancy, and have predictably been wrestling with memory issues...

    But not with the heap, as I'd expected (and not initially noticed). Instead, I'm running out of PermGen space. One of my .bam's is really large, but it happens even on much smaller .bam's containing single ends of mate pairs.

    I increased it to 1g (-XX:PermSize=1g -XX:MaxPermSize=1g), and it still died, though after 2 hrs CPU time rather than 10 minutes as before. I've increased it now to 4g and we'll see how it goes.

    Does this point to memory leak issues within Picard tools, that the permanent heap gets this full?? Seems to be way beyond where JVM expects things to be, and I've rarely seen PermGen space problems mentioned, never for Picard tools.

    Cheers,

    Doug


    [Mon Nov 21 19:11:40 CET 2011] net.sf.picard.sam.MarkDuplicates INPUT=map.CLCh001.lib300.bam_sorted.bam OUTPUT=map.CLCh001.lib300.bam_sorted.bam.PicardDups.bam METRICS_FILE=map.CLCh001.lib300.bam_sorted.bam.MarkDuplicates REMOVE_DUPLICATES=true ASSUME_SORTED=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=80000 TMP_DIR=[tmp] MAX_RECORDS_IN_RAM=10000000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false
    [Mon Nov 21 19:11:40 CET 2011] Executing as douglas.scofield@xxxxxxx on Linux 2.6.32-131.17.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.6.0_20-b20
    INFO 2011-11-21 19:11:40 MarkDuplicates Start of doWork freeMemory: 132124215176; totalMemory: 132857659392; maxMemory: 132857659392
    INFO 2011-11-21 19:11:40 MarkDuplicates Reading input file and constructing read end information.
    INFO 2011-11-21 19:11:40 MarkDuplicates Will retain up to 527212934 data points before spilling to disk.
    [Mon Nov 21 21:44:56 CET 2011] net.sf.picard.sam.MarkDuplicates done. Elapsed time: 153.26 minutes.
    Runtime.totalMemory()=132857659392
    Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
    at java.lang.String.intern(Native Method)
    at net.sf.samtools.SAMSequenceRecord.<init>(SAMSequenceRecord.java:83)
    at net.sf.samtools.SAMTextHeaderCodec.parseSQLine(SAMTextHeaderCodec.java:205)
    at net.sf.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:96)
    at net.sf.samtools.BAMFileReader.readHeader(BAMFileReader.java:391)
    at net.sf.samtools.BAMFileReader.<init>(BAMFileReader.java:144)
    at net.sf.samtools.BAMFileReader.<init>(BAMFileReader.java:114)
    at net.sf.samtools.SAMFileReader.init(SAMFileReader.java:514)
    at net.sf.samtools.SAMFileReader.<init>(SAMFileReader.java:167)
    at net.sf.samtools.SAMFileReader.<init>(SAMFileReader.java:122)
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:267)
    at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:175) at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)

  • #2
    Hi Doug,

    I just had the same problem and solved it with -XX:MaxPermSize=512m

    As you already tried with 1g, it looks like you just need to increase it further... was the 4g enough?

    Comment


    • #3
      Hi, yep, 4GB was enough. If I recall it died with 2GB. The main challenge was getting enough heap space, had to request 256GB and if I believe the htop stats it was using 221GB at one point :-)

      /Doug

      Comment


      • #4
        The following are the output of picards_markduplicates, I changed some of the options to bigger number but it still give me error.
        my file is about 10GB of bam file, and program was running with 24G RAM using version 1.49 and 1.50. Please help me to fix the problem. Thank you so much

        net.sf.picard.sam.MarkDuplicates INPUT=accepted_hits_sorted.bam OUTPUT=accepted_hits_sorted.pk.mk.out METRICS_FILE=accepted_hits_sorted.pk.mk.metrics ASSUME_SORTED=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=500000000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000 MAX_RECORDS_IN_RAM=500000000 REMOVE_DUPLICATES=false SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 TMP_DIR=/tmp/tangwei VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false
        [Fri Feb 03 16:06:12 EST 2012] Executing as tangwei@p809 on Linux 2.6.18-128.el5 i386; Java HotSpot(TM) Server VM 1.7.0_02-b13
        INFO 2012-02-03 16:06:12 MarkDuplicates Start of doWork freeMemory: 63278136; totalMemory: 64356352; maxMemory: 1908932608
        INFO 2012-02-03 16:06:12 MarkDuplicates Reading input file and constructing read end information.
        INFO 2012-02-03 16:06:12 MarkDuplicates Will retain up to 7575129 data points before spilling to disk.
        INFO 2012-02-03 16:06:18 MarkDuplicates Read 1000000 records. Tracking 8778 as yet unmatched pairs. 8778 records in RAM. Last sequence index: 0
        ......
        ......
        INFO 2012-02-03 16:41:35 MarkDuplicates Read 151000000 records. Tracking 5300425 as yet unmatched pairs. 5300425 records in RAM. Last sequence index: 51
        [Fri Feb 03 16:52:03 EST 2012] net.sf.picard.sam.MarkDuplicates done. Elapsed time: 45.84 minutes.
        Runtime.totalMemory()=1980170240
        Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.regex.Matcher.<init>(Matcher.java:224)
        at java.util.regex.Pattern.matcher(Pattern.java:1088)
        at net.sf.picard.sam.AbstractDuplicateFindingAlgorithm.addLocationInformation(AbstractDuplicateFindingAlgorithm.java:61)
        at net.sf.picard.sam.MarkDuplicates.buildReadEnds(MarkDuplicates.java:364)
        at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:298)
        at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:169)
        at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)


        Originally posted by dgscofield View Post
        Hi, yep, 4GB was enough. If I recall it died with 2GB. The main challenge was getting enough heap space, had to request 256GB and if I believe the htop stats it was using 221GB at one point :-)

        /Doug

        Comment


        • #5
          Perhaps you need to tell Java to use your memory (Java heap space), if I remember correctly Java allocates only 1Gb of memory if you don't instruct it differently.
          You should use the option -Xmx

          Have a look, for example, at http://www.ehow.com/how_5347474_set-...eap-space.html

          Originally posted by townway View Post
          The following are the output of picards_markduplicates, I changed some of the options to bigger number but it still give me error.
          my file is about 10GB of bam file, and program was running with 24G RAM using version 1.49 and 1.50. Please help me to fix the problem. Thank you so much

          [cut]

          [Fri Feb 03 16:52:03 EST 2012] net.sf.picard.sam.MarkDuplicates done. Elapsed time: 45.84 minutes.
          Runtime.totalMemory()=1980170240
          Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
          at java.util.regex.Matcher.<init>(Matcher.java:224)
          at java.util.regex.Pattern.matcher(Pattern.java:1088)
          at net.sf.picard.sam.AbstractDuplicateFindingAlgorithm.addLocationInformation(AbstractDuplicateFindingAlgorithm.java:61)
          at net.sf.picard.sam.MarkDuplicates.buildReadEnds(MarkDuplicates.java:364)
          at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:298)
          at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117)
          at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:169)
          at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Advancing Precision Medicine for Rare Diseases in Children
            by seqadmin




            Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
            12-16-2024, 07:57 AM
          • seqadmin
            Recent Advances in Sequencing Technologies
            by seqadmin



            Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

            Long-Read Sequencing
            Long-read sequencing has seen remarkable advancements,...
            12-02-2024, 01:49 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 12-17-2024, 10:28 AM
          0 responses
          26 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-13-2024, 08:24 AM
          0 responses
          43 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-12-2024, 07:41 AM
          0 responses
          29 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-11-2024, 07:45 AM
          0 responses
          42 views
          0 likes
          Last Post seqadmin  
          Working...
          X