Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sdm
    Junior Member
    • Oct 2009
    • 9

    SAM to FASTQ converter - Picard

    Hi,
    I am having problems to translate a large SAM file into FASTQ using Picard, the error messages are related to memory or heap space, or garbage

    the command I am currently using is:
    java -Xmx2g -XX:-UseGCOverheadLimit -jar picard-tools-1.26/SamToFastq.jar INPUT=Apollo102b.sam FASTQ=102b_1.fq SECOND_END_FASTQ=102b_2.fq INCLUDE_NON_PF_READS=True VALIDATION_STRINGENCY=SILENT &

    What other flag could I set to increase memory for JAVA? Or maybe there is even a better idea how to translate SAM to FASTQ ...
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    Originally posted by sdm View Post
    Or maybe there is even a better idea how to translate SAM to FASTQ ...
    Several suggestions on this thread initially about BAM to FASTQ will also apply for SAM to FASTQ: http://seqanswers.com/forums/showthread.php?t=6164

    Comment

    • joachim.jacob
      Junior Member
      • Jan 2011
      • 9

      #3
      samtofastq out of memory problem

      Hi all,

      I have the same issue: on small SAM files (730MB), samtofastq of Picard tools does a wonderful job:

      Code:
      [user]$ java -Xmx40g -jar /opt/picardtools/SamToFastq.jar I=RECORDS_IN_RAM=5000000
      [Thu Aug 04 08:55:46 CEST 2011] net.sf.picard.sam.SamToFastq INPUT=erx000019.sam FASTQ=/default_1.fastq SECOND_END_FASTQ=default_2.fastq MAX_RECORDS_IN_RAM=5000000    OUTPUT_PER_RG=false RE_REVERSE=true INCLUDE_NON_PF_READS=false READ1_TRIM=0 READ2_TRIM=0 TMP_DIR=/tmp VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false
      [Thu Aug 04 08:57:30 CEST 2011] net.sf.picard.sam.SamToFastq done.
      Runtime.totalMemory()=1179189248
      [user]$
      -rw-rw-r-- 1 user users 322M Aug  4 08:57 default_1.fastq
      -rw-rw-r-- 1 user users 322M Aug  4 08:57 default_2.fastq
      But on large SAM files (33G), samtofastq does not seem to work:
      Code:
      Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
      	at java.lang.String.substring(String.java:1951)
      	at net.sf.samtools.util.StringUtil.split(StringUtil.java:74)
      	at net.sf.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:307)
      	at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:272)
      	at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:244)
      	at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:629)
      	at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:607)
      	at net.sf.picard.sam.SamToFastq.doWork(SamToFastq.java:121)
      	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:157)
      	at net.sf.picard.sam.SamToFastq.main(SamToFastq.java:112)
      Does anyone have the same problems? And better: has anyone fixed this problem?

      Thanks,
      Joachim
      www.bits.vib.be

      Comment

      • joachim.jacob
        Junior Member
        • Jan 2011
        • 9

        #4
        samtofastq out of memory problem

        Hi all,

        Found some settings with more succes. I have adjusted the setting of the JAVA Virtual Machine as follows to run on our machine (24 CPU machine, with 96GB RAM):

        java -Xmx40g -jar -XX:-UseGCOverheadLimit -XX:-UseParallelGC -jar /opt/picardtools/SamToFastq.jar I=erx000016.sam F=default_1.fastq F2=default_2.fastq MAX_RECORDS_IN_RAM=5000000

        Steadily but firmly the fastq file is being filled (300MB now)... Let's hope it completes it...

        Joachim
        Last edited by joachim.jacob; 08-04-2011, 12:34 AM. Reason: reporting possible solution
        www.bits.vib.be

        Comment

        • joachim.jacob
          Junior Member
          • Jan 2011
          • 9

          #5
          samtofastq out of memory problem persists

          Hi all,

          No joy...

          But my fastq file contains now 304MB. Somehow I get now following error:

          Code:
          Runtime.totalMemory()=41518039040
          Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
          	at java.io.BufferedReader.readLine(BufferedReader.java:348)
          	at java.io.BufferedReader.readLine(BufferedReader.java:379)
          	at net.sf.samtools.util.BufferedLineReader.readLine(BufferedLineReader.java:65)
          	at net.sf.samtools.util.AsciiLineReader.readLine(AsciiLineReader.java:75)
          	at net.sf.samtools.SAMTextReader.advanceLine(SAMTextReader.java:203)
          	at net.sf.samtools.SAMTextReader.access$300(SAMTextReader.java:40)
          	at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:274)
          	at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:244)
          	at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:629)
          	at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:607)
          	at net.sf.picard.sam.SamToFastq.doWork(SamToFastq.java:121)
          	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:157)
          	at net.sf.picard.sam.SamToFastq.main(SamToFastq.java:112)
          Last edited by joachim.jacob; 08-04-2011, 01:38 AM.
          www.bits.vib.be

          Comment

          • cedance
            Senior Member
            • Feb 2011
            • 108

            #6
            Hi sdm, Joachim,

            Most of picard tools are designed to run on 2GB of JVM. So using -Xmx40g (IMO) wouldn't make a difference. IMHO what you have to check is the use of the parameter "TMP_DIR=file". Sometimes the default temp directory it chose ran out of space on the cluster I work on. Its worth a try.

            best.

            Comment

            • sdm
              Junior Member
              • Oct 2009
              • 9

              #7
              Hi,

              these flags have worked for me after some trial and error:
              java -Xmx3g -XX:-UseGCOverheadLimit -jar SamToFastq.jar

              Not sure if it works in any context

              Comment

              • joachim.jacob
                Junior Member
                • Jan 2011
                • 9

                #8
                Changing TMP_DIR does not work

                Thanks all for your suggestions : unfortunately, changing TMP_DIR to a bigger location does not work.

                The fastq file hangs at 304MB and I get JAVA heap space error.

                @sdm: thanks for your reply. My xmx is set to 55g already (used to be at 2g.

                It seems that I got most success by changing MAX_RECORDS_IN_RAM to 5000000. Will try a little further and keep you posted!
                www.bits.vib.be

                Comment

                • cedance
                  Senior Member
                  • Feb 2011
                  • 108

                  #9
                  Joachim,
                  Since it seems to work on small files for you (and the ones I worked on are around 8-12GB...) it seems to me that it has more to do with the code. Check this link for the chosen answer explanation.
                  I get this error message as I execute my JUnit tests: java.lang.OutOfMemoryError: GC overhead limit exceeded I know what an OutOfMemoryError is, but what does GC overhead limit mean? How can I solve


                  Best.

                  Comment

                  • dadada4ever
                    Member
                    • Mar 2010
                    • 18

                    #10
                    Originally posted by joachim.jacob View Post
                    Thanks all for your suggestions : unfortunately, changing TMP_DIR to a bigger location does not work.

                    The fastq file hangs at 304MB and I get JAVA heap space error.

                    @sdm: thanks for your reply. My xmx is set to 55g already (used to be at 2g.

                    It seems that I got most success by changing MAX_RECORDS_IN_RAM to 5000000. Will try a little further and keep you posted!
                    hi joachim, did you solve this problem? I added the MAX_RECORDS_IN_RAM=5000000 the fastq files get larger but still got the eroor of JAVA heap space at the end. Do you have any other suggestions? Thank you.

                    Comment

                    • Fusionseeker
                      Junior Member
                      • Sep 2010
                      • 1

                      #11
                      Have you had any luck solving this issue. I am having the same problem using various BAM files of ~7-10GB in size.

                      I have had success using BAM files of similar size generated in-house and from collaborators. So I was surprised when these same parameters no longer seem to work. I am starting to wonder if there is something unique to how these most recent BAM files were processed. I have gone to the picard commands page and didn't see any particular processing steps that were required. Any suggestions?

                      Comment

                      • Richard Finney
                        Senior Member
                        • Feb 2009
                        • 701

                        #12
                        The Picard bam to fastq is slow and takes a lot of memory.
                        If you have lots of time and memory it is not a problem.

                        If you dont have a lot of time or memory ... try my solution presented in this thread:
                        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                        Warning: you'll have to download the AVL library and compile it yourself using a C compiler (gcc or other). A modicum of experience in compiling and editing source files is required.

                        It was developed to run on low memory beowulf nodes and not take all day.

                        Comment

                        • arkal
                          advancing one byte at a time!
                          • Jun 2011
                          • 56

                          #13
                          kudos to you, my friend! works like a charm!

                          Comment

                          • Simon Anders
                            Senior Member
                            • Feb 2010
                            • 995

                            #14
                            Time to advertise our HTSeq library, which allows to do such tasks in two lines. And it certainly won't use any noticeable amount of memory.

                            Try this:

                            Code:
                            import sys, HTSeq
                            
                            for a in HTSeq.SAM_Reader( "myfile.sam" ):
                               a.read.write_to_fastq_file( sys.stdout )
                            The following, "more advanced" version, makes sure that each read is written only once even if multiple alignments are in the SAM file (provided the SAM file had been sorted by read name (with 'samtools sort -n')) so that multiple alignments are in adjacent lines.

                            Code:
                            import sys, HTSeq
                            
                            for a in HTSeq.bundle_multipe_alignments( HTSeq.SAM_Reader( "myfile.sam" ) ):
                               a[0].read.write_to_fastq_file( sys.stdout )
                            (The code is untested, so sorry in advance for any typos.)
                            Last edited by Simon Anders; 03-19-2013, 10:01 AM.

                            Comment

                            • lh3
                              Senior Member
                              • Feb 2008
                              • 686

                              #15
                              A better approach to group two ends together is "htscmd bamshuf" from htslib. It is much faster than name sorting.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              16 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              108 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              125 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...