Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Picard tools out of memory: PermGen

    Hi all, first post. Great site!

    Thought I'd share a new problem... I'm just starting with Picard tools (version 1.56) to estimate redundancy, and have predictably been wrestling with memory issues...

    But not with the heap, as I'd expected (and not initially noticed). Instead, I'm running out of PermGen space. One of my .bam's is really large, but it happens even on much smaller .bam's containing single ends of mate pairs.

    I increased it to 1g (-XX:PermSize=1g -XX:MaxPermSize=1g), and it still died, though after 2 hrs CPU time rather than 10 minutes as before. I've increased it now to 4g and we'll see how it goes.

    Does this point to memory leak issues within Picard tools, that the permanent heap gets this full?? Seems to be way beyond where JVM expects things to be, and I've rarely seen PermGen space problems mentioned, never for Picard tools.

    Cheers,

    Doug


    [Mon Nov 21 19:11:40 CET 2011] net.sf.picard.sam.MarkDuplicates INPUT=map.CLCh001.lib300.bam_sorted.bam OUTPUT=map.CLCh001.lib300.bam_sorted.bam.PicardDups.bam METRICS_FILE=map.CLCh001.lib300.bam_sorted.bam.MarkDuplicates REMOVE_DUPLICATES=true ASSUME_SORTED=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=80000 TMP_DIR=[tmp] MAX_RECORDS_IN_RAM=10000000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false
    [Mon Nov 21 19:11:40 CET 2011] Executing as douglas.scofield@xxxxxxx on Linux 2.6.32-131.17.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.6.0_20-b20
    INFO 2011-11-21 19:11:40 MarkDuplicates Start of doWork freeMemory: 132124215176; totalMemory: 132857659392; maxMemory: 132857659392
    INFO 2011-11-21 19:11:40 MarkDuplicates Reading input file and constructing read end information.
    INFO 2011-11-21 19:11:40 MarkDuplicates Will retain up to 527212934 data points before spilling to disk.
    [Mon Nov 21 21:44:56 CET 2011] net.sf.picard.sam.MarkDuplicates done. Elapsed time: 153.26 minutes.
    Runtime.totalMemory()=132857659392
    Exception in thread "main" java.lang.OutOfMemoryError: PermGen space
    at java.lang.String.intern(Native Method)
    at net.sf.samtools.SAMSequenceRecord.<init>(SAMSequenceRecord.java:83)
    at net.sf.samtools.SAMTextHeaderCodec.parseSQLine(SAMTextHeaderCodec.java:205)
    at net.sf.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:96)
    at net.sf.samtools.BAMFileReader.readHeader(BAMFileReader.java:391)
    at net.sf.samtools.BAMFileReader.<init>(BAMFileReader.java:144)
    at net.sf.samtools.BAMFileReader.<init>(BAMFileReader.java:114)
    at net.sf.samtools.SAMFileReader.init(SAMFileReader.java:514)
    at net.sf.samtools.SAMFileReader.<init>(SAMFileReader.java:167)
    at net.sf.samtools.SAMFileReader.<init>(SAMFileReader.java:122)
    at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:267)
    at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:175) at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)

  • #2
    Hi Doug,

    I just had the same problem and solved it with -XX:MaxPermSize=512m

    As you already tried with 1g, it looks like you just need to increase it further... was the 4g enough?

    Comment


    • #3
      Hi, yep, 4GB was enough. If I recall it died with 2GB. The main challenge was getting enough heap space, had to request 256GB and if I believe the htop stats it was using 221GB at one point :-)

      /Doug

      Comment


      • #4
        The following are the output of picards_markduplicates, I changed some of the options to bigger number but it still give me error.
        my file is about 10GB of bam file, and program was running with 24G RAM using version 1.49 and 1.50. Please help me to fix the problem. Thank you so much

        net.sf.picard.sam.MarkDuplicates INPUT=accepted_hits_sorted.bam OUTPUT=accepted_hits_sorted.pk.mk.out METRICS_FILE=accepted_hits_sorted.pk.mk.metrics ASSUME_SORTED=true MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=500000000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=1000 MAX_RECORDS_IN_RAM=500000000 REMOVE_DUPLICATES=false SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9][0-9]+)[0-9]+)[0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 TMP_DIR=/tmp/tangwei VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false
        [Fri Feb 03 16:06:12 EST 2012] Executing as tangwei@p809 on Linux 2.6.18-128.el5 i386; Java HotSpot(TM) Server VM 1.7.0_02-b13
        INFO 2012-02-03 16:06:12 MarkDuplicates Start of doWork freeMemory: 63278136; totalMemory: 64356352; maxMemory: 1908932608
        INFO 2012-02-03 16:06:12 MarkDuplicates Reading input file and constructing read end information.
        INFO 2012-02-03 16:06:12 MarkDuplicates Will retain up to 7575129 data points before spilling to disk.
        INFO 2012-02-03 16:06:18 MarkDuplicates Read 1000000 records. Tracking 8778 as yet unmatched pairs. 8778 records in RAM. Last sequence index: 0
        ......
        ......
        INFO 2012-02-03 16:41:35 MarkDuplicates Read 151000000 records. Tracking 5300425 as yet unmatched pairs. 5300425 records in RAM. Last sequence index: 51
        [Fri Feb 03 16:52:03 EST 2012] net.sf.picard.sam.MarkDuplicates done. Elapsed time: 45.84 minutes.
        Runtime.totalMemory()=1980170240
        Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.regex.Matcher.<init>(Matcher.java:224)
        at java.util.regex.Pattern.matcher(Pattern.java:1088)
        at net.sf.picard.sam.AbstractDuplicateFindingAlgorithm.addLocationInformation(AbstractDuplicateFindingAlgorithm.java:61)
        at net.sf.picard.sam.MarkDuplicates.buildReadEnds(MarkDuplicates.java:364)
        at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:298)
        at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117)
        at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:169)
        at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)


        Originally posted by dgscofield View Post
        Hi, yep, 4GB was enough. If I recall it died with 2GB. The main challenge was getting enough heap space, had to request 256GB and if I believe the htop stats it was using 221GB at one point :-)

        /Doug

        Comment


        • #5
          Perhaps you need to tell Java to use your memory (Java heap space), if I remember correctly Java allocates only 1Gb of memory if you don't instruct it differently.
          You should use the option -Xmx

          Have a look, for example, at http://www.ehow.com/how_5347474_set-...eap-space.html

          Originally posted by townway View Post
          The following are the output of picards_markduplicates, I changed some of the options to bigger number but it still give me error.
          my file is about 10GB of bam file, and program was running with 24G RAM using version 1.49 and 1.50. Please help me to fix the problem. Thank you so much

          [cut]

          [Fri Feb 03 16:52:03 EST 2012] net.sf.picard.sam.MarkDuplicates done. Elapsed time: 45.84 minutes.
          Runtime.totalMemory()=1980170240
          Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
          at java.util.regex.Matcher.<init>(Matcher.java:224)
          at java.util.regex.Pattern.matcher(Pattern.java:1088)
          at net.sf.picard.sam.AbstractDuplicateFindingAlgorithm.addLocationInformation(AbstractDuplicateFindingAlgorithm.java:61)
          at net.sf.picard.sam.MarkDuplicates.buildReadEnds(MarkDuplicates.java:364)
          at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:298)
          at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:117)
          at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:169)
          at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:101)

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          30 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X