SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
FASTQ to SAM conversion albrown415 Bioinformatics 36 09-18-2015 08:04 AM
FASTQ sequence converter Eugeni Bioinformatics 39 02-22-2015 12:21 PM
Sam to Bam using Picard - help! Kath Bioinformatics 6 02-12-2011 10:36 PM
Fastq to sam issues Wallysb01 Bioinformatics 1 02-02-2011 11:55 PM
Pair-End SAM/BAM to .wig converter davisc Bioinformatics 0 04-15-2010 04:49 PM

Reply
 
Thread Tools
Old 07-30-2010, 06:14 AM   #1
sdm
Junior Member
 
Location: Cambridge, UK

Join Date: Oct 2009
Posts: 9
Default SAM to FASTQ converter - Picard

Hi,
I am having problems to translate a large SAM file into FASTQ using Picard, the error messages are related to memory or heap space, or garbage

the command I am currently using is:
java -Xmx2g -XX:-UseGCOverheadLimit -jar picard-tools-1.26/SamToFastq.jar INPUT=Apollo102b.sam FASTQ=102b_1.fq SECOND_END_FASTQ=102b_2.fq INCLUDE_NON_PF_READS=True VALIDATION_STRINGENCY=SILENT &

What other flag could I set to increase memory for JAVA? Or maybe there is even a better idea how to translate SAM to FASTQ ...
sdm is offline   Reply With Quote
Old 07-30-2010, 06:55 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Quote:
Originally Posted by sdm View Post
Or maybe there is even a better idea how to translate SAM to FASTQ ...
Several suggestions on this thread initially about BAM to FASTQ will also apply for SAM to FASTQ: http://seqanswers.com/forums/showthread.php?t=6164
maubp is offline   Reply With Quote
Old 08-04-2011, 01:00 AM   #3
joachim.jacob
Junior Member
 
Location: Belgium

Join Date: Jan 2011
Posts: 9
Default samtofastq out of memory problem

Hi all,

I have the same issue: on small SAM files (730MB), samtofastq of Picard tools does a wonderful job:

Code:
[user]$ java -Xmx40g -jar /opt/picardtools/SamToFastq.jar I=RECORDS_IN_RAM=5000000
[Thu Aug 04 08:55:46 CEST 2011] net.sf.picard.sam.SamToFastq INPUT=erx000019.sam FASTQ=/default_1.fastq SECOND_END_FASTQ=default_2.fastq MAX_RECORDS_IN_RAM=5000000    OUTPUT_PER_RG=false RE_REVERSE=true INCLUDE_NON_PF_READS=false READ1_TRIM=0 READ2_TRIM=0 TMP_DIR=/tmp VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=false
[Thu Aug 04 08:57:30 CEST 2011] net.sf.picard.sam.SamToFastq done.
Runtime.totalMemory()=1179189248
[user]$
-rw-rw-r-- 1 user users 322M Aug  4 08:57 default_1.fastq
-rw-rw-r-- 1 user users 322M Aug  4 08:57 default_2.fastq
But on large SAM files (33G), samtofastq does not seem to work:
Code:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.lang.String.substring(String.java:1951)
	at net.sf.samtools.util.StringUtil.split(StringUtil.java:74)
	at net.sf.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:307)
	at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:272)
	at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:244)
	at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:629)
	at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:607)
	at net.sf.picard.sam.SamToFastq.doWork(SamToFastq.java:121)
	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:157)
	at net.sf.picard.sam.SamToFastq.main(SamToFastq.java:112)
Does anyone have the same problems? And better: has anyone fixed this problem?

Thanks,
Joachim
joachim.jacob is offline   Reply With Quote
Old 08-04-2011, 01:29 AM   #4
joachim.jacob
Junior Member
 
Location: Belgium

Join Date: Jan 2011
Posts: 9
Default samtofastq out of memory problem

Hi all,

Found some settings with more succes. I have adjusted the setting of the JAVA Virtual Machine as follows to run on our machine (24 CPU machine, with 96GB RAM):

java -Xmx40g -jar -XX:-UseGCOverheadLimit -XX:-UseParallelGC -jar /opt/picardtools/SamToFastq.jar I=erx000016.sam F=default_1.fastq F2=default_2.fastq MAX_RECORDS_IN_RAM=5000000

Steadily but firmly the fastq file is being filled (300MB now)... Let's hope it completes it...

Joachim

Last edited by joachim.jacob; 08-04-2011 at 01:34 AM. Reason: reporting possible solution
joachim.jacob is offline   Reply With Quote
Old 08-04-2011, 02:36 AM   #5
joachim.jacob
Junior Member
 
Location: Belgium

Join Date: Jan 2011
Posts: 9
Default samtofastq out of memory problem persists

Hi all,

No joy...

But my fastq file contains now 304MB. Somehow I get now following error:

Code:
Runtime.totalMemory()=41518039040
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
	at java.io.BufferedReader.readLine(BufferedReader.java:348)
	at java.io.BufferedReader.readLine(BufferedReader.java:379)
	at net.sf.samtools.util.BufferedLineReader.readLine(BufferedLineReader.java:65)
	at net.sf.samtools.util.AsciiLineReader.readLine(AsciiLineReader.java:75)
	at net.sf.samtools.SAMTextReader.advanceLine(SAMTextReader.java:203)
	at net.sf.samtools.SAMTextReader.access$300(SAMTextReader.java:40)
	at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:274)
	at net.sf.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:244)
	at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:629)
	at net.sf.samtools.SAMFileReader$AssertableIterator.next(SAMFileReader.java:607)
	at net.sf.picard.sam.SamToFastq.doWork(SamToFastq.java:121)
	at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:157)
	at net.sf.picard.sam.SamToFastq.main(SamToFastq.java:112)

Last edited by joachim.jacob; 08-04-2011 at 02:38 AM.
joachim.jacob is offline   Reply With Quote
Old 08-04-2011, 02:42 AM   #6
cedance
Senior Member
 
Location: Germany

Join Date: Feb 2011
Posts: 108
Default

Hi sdm, Joachim,

Most of picard tools are designed to run on 2GB of JVM. So using -Xmx40g (IMO) wouldn't make a difference. IMHO what you have to check is the use of the parameter "TMP_DIR=file". Sometimes the default temp directory it chose ran out of space on the cluster I work on. Its worth a try.

best.
cedance is offline   Reply With Quote
Old 08-04-2011, 03:24 AM   #7
sdm
Junior Member
 
Location: Cambridge, UK

Join Date: Oct 2009
Posts: 9
Default

Hi,

these flags have worked for me after some trial and error:
java -Xmx3g -XX:-UseGCOverheadLimit -jar SamToFastq.jar

Not sure if it works in any context
sdm is offline   Reply With Quote
Old 08-04-2011, 04:21 AM   #8
joachim.jacob
Junior Member
 
Location: Belgium

Join Date: Jan 2011
Posts: 9
Default Changing TMP_DIR does not work

Thanks all for your suggestions : unfortunately, changing TMP_DIR to a bigger location does not work.

The fastq file hangs at 304MB and I get JAVA heap space error.

@sdm: thanks for your reply. My xmx is set to 55g already (used to be at 2g.

It seems that I got most success by changing MAX_RECORDS_IN_RAM to 5000000. Will try a little further and keep you posted!
joachim.jacob is offline   Reply With Quote
Old 08-04-2011, 04:35 AM   #9
cedance
Senior Member
 
Location: Germany

Join Date: Feb 2011
Posts: 108
Default

Joachim,
Since it seems to work on small files for you (and the ones I worked on are around 8-12GB...) it seems to me that it has more to do with the code. Check this link for the chosen answer explanation.
http://stackoverflow.com/questions/1...d-limit-exceed

Best.
cedance is offline   Reply With Quote
Old 05-28-2012, 03:53 AM   #10
dadada4ever
Member
 
Location: Beijing

Join Date: Mar 2010
Posts: 18
Default

Quote:
Originally Posted by joachim.jacob View Post
Thanks all for your suggestions : unfortunately, changing TMP_DIR to a bigger location does not work.

The fastq file hangs at 304MB and I get JAVA heap space error.

@sdm: thanks for your reply. My xmx is set to 55g already (used to be at 2g.

It seems that I got most success by changing MAX_RECORDS_IN_RAM to 5000000. Will try a little further and keep you posted!
hi joachim, did you solve this problem? I added the MAX_RECORDS_IN_RAM=5000000 the fastq files get larger but still got the eroor of JAVA heap space at the end. Do you have any other suggestions? Thank you.
dadada4ever is offline   Reply With Quote
Old 11-10-2012, 09:32 AM   #11
Fusionseeker
Junior Member
 
Location: Midwest

Join Date: Sep 2010
Posts: 1
Default

Have you had any luck solving this issue. I am having the same problem using various BAM files of ~7-10GB in size.

I have had success using BAM files of similar size generated in-house and from collaborators. So I was surprised when these same parameters no longer seem to work. I am starting to wonder if there is something unique to how these most recent BAM files were processed. I have gone to the picard commands page and didn't see any particular processing steps that were required. Any suggestions?
Fusionseeker is offline   Reply With Quote
Old 11-10-2012, 11:27 AM   #12
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 699
Default

The Picard bam to fastq is slow and takes a lot of memory.
If you have lots of time and memory it is not a problem.

If you dont have a lot of time or memory ... try my solution presented in this thread:
http://seqanswers.com/forums/showthread.php?t=16395

Warning: you'll have to download the AVL library and compile it yourself using a C compiler (gcc or other). A modicum of experience in compiling and editing source files is required.

It was developed to run on low memory beowulf nodes and not take all day.
Richard Finney is offline   Reply With Quote
Old 03-19-2013, 06:25 AM   #13
arkal
advancing one byte at a time!
 
Location: Bangalore, India

Join Date: Jun 2011
Posts: 56
Default

kudos to you, my friend! works like a charm!
arkal is offline   Reply With Quote
Old 03-19-2013, 10:54 AM   #14
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 993
Default

Time to advertise our HTSeq library, which allows to do such tasks in two lines. And it certainly won't use any noticeable amount of memory.

Try this:

Code:
import sys, HTSeq

for a in HTSeq.SAM_Reader( "myfile.sam" ):
   a.read.write_to_fastq_file( sys.stdout )
The following, "more advanced" version, makes sure that each read is written only once even if multiple alignments are in the SAM file (provided the SAM file had been sorted by read name (with 'samtools sort -n')) so that multiple alignments are in adjacent lines.

Code:
import sys, HTSeq

for a in HTSeq.bundle_multipe_alignments( HTSeq.SAM_Reader( "myfile.sam" ) ):
   a[0].read.write_to_fastq_file( sys.stdout )
(The code is untested, so sorry in advance for any typos.)

Last edited by Simon Anders; 03-19-2013 at 11:01 AM.
Simon Anders is offline   Reply With Quote
Old 03-19-2013, 11:19 AM   #15
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

A better approach to group two ends together is "htscmd bamshuf" from htslib. It is much faster than name sorting.
lh3 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:45 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO