Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
cufflinks segmentation fault and taking very long to run Dessi Bioinformatics 21 01-07-2015 02:53 PM
bamToFastq not working with .bam file from bwa-mem prs321 Bioinformatics 13 06-26-2014 08:55 PM
tophat taking too long to run fahim RNA Sequencing 7 05-14-2014 08:59 AM
STAR Mapping taking too long mtiwaridros RNA Sequencing 1 05-02-2014 09:26 AM
Difference between "bedtools bamtofastq" and Picard's SamToFastq thedamian Bioinformatics 3 01-21-2014 06:00 AM

Thread Tools
Old 05-26-2015, 01:09 PM   #1
Location: New York NY

Join Date: May 2015
Posts: 24
Arrow bedtools bamtofastq taking too long

I am running the bedtools bamtofastq for converting bam to splitted fastq (read1 and read2) files for paired end reads with an expected wall time of 24 hours and memory of 4000mb.

The input bam file is about 140 GB whereas the output file size has reached to merely 0.3 GB. So it seems that the tools is running too slow for some reason. The following is command was used. The input file has been sorted by samtools sort by names.

bedtools bamtofastq -i X_Sorted.bam -fq X.R1.fq -fq2 X.R2.fq

Any suggestions appreciated.
ty23991 is offline   Reply With Quote
Old 05-26-2015, 03:30 PM   #2
Senior Member
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,825

Perhaps you have hit memory/wall time limit for this job?

An alternative may be from BBMap suite. Brian's code tends to be very efficient so you may want to give it a try. Make sure samtools is in your path for something like following to work.

$ in=file.bam out=file.fq.gz
Are you sure that both reads have been kept in the bam file? If not you will have a problem of singletons and you will need to separate them before de-interleaving the fastq file. Start with this:

$ in=file.fq.gz verifypairing
$ in=file.fq.gz out1=R1.fq out2=R2.fq
GenoMax is offline   Reply With Quote
Old 05-26-2015, 04:21 PM   #3
Location: New York NY

Join Date: May 2015
Posts: 24

Thanks so much. It turned out that the output directory was located in a different file system in a cluster and that somehow causes bedtools output to be written at extremely slow rate. The problem solves if i use the output directory location within the same file system in the cluster. I will post more details in case this happens only with bedtools or is a general I/O issue.

Regardless- I tried and based on output rate, it turns out to be relatively efficient. Thanks so much for the clue.
ty23991 is offline   Reply With Quote
Old 04-12-2018, 08:13 PM   #4
Junior Member
Location: uk

Join Date: Apr 2011
Posts: 3

Use the "bamtofastq" in Biobambam instead. It runs way faster and the code is highly optimized. Bedtools is written by amateurs. That's why it's slow.
muxingu is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 04:17 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO