SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
cufflinks segmentation fault and taking very long to run Dessi Bioinformatics 21 01-07-2015 01:53 PM
bamToFastq not working with .bam file from bwa-mem prs321 Bioinformatics 13 06-26-2014 07:55 PM
tophat taking too long to run fahim RNA Sequencing 7 05-14-2014 07:59 AM
STAR Mapping taking too long mtiwaridros RNA Sequencing 1 05-02-2014 08:26 AM
Difference between "bedtools bamtofastq" and Picard's SamToFastq thedamian Bioinformatics 3 01-21-2014 05:00 AM

Reply
 
Thread Tools
Old 05-26-2015, 12:09 PM   #1
ty23991
Member
 
Location: New York NY

Join Date: May 2015
Posts: 24
Arrow bedtools bamtofastq taking too long

Hi
I am running the bedtools bamtofastq for converting bam to splitted fastq (read1 and read2) files for paired end reads with an expected wall time of 24 hours and memory of 4000mb.

The input bam file is about 140 GB whereas the output file size has reached to merely 0.3 GB. So it seems that the tools is running too slow for some reason. The following is command was used. The input file has been sorted by samtools sort by names.

bedtools bamtofastq -i X_Sorted.bam -fq X.R1.fq -fq2 X.R2.fq

Any suggestions appreciated.
ty23991 is offline   Reply With Quote
Old 05-26-2015, 02:30 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,783
Default

Perhaps you have hit memory/wall time limit for this job?

An alternative may be reformat.sh from BBMap suite. Brian's code tends to be very efficient so you may want to give it a try. Make sure samtools is in your path for something like following to work.

Code:
$ reformat.sh in=file.bam out=file.fq.gz
Are you sure that both reads have been kept in the bam file? If not you will have a problem of singletons and you will need to separate them before de-interleaving the fastq file. Start with this:

Code:
$ reformat.sh in=file.fq.gz verifypairing
$ reformat.sh in=file.fq.gz out1=R1.fq out2=R2.fq
GenoMax is offline   Reply With Quote
Old 05-26-2015, 03:21 PM   #3
ty23991
Member
 
Location: New York NY

Join Date: May 2015
Posts: 24
Default

Thanks so much. It turned out that the output directory was located in a different file system in a cluster and that somehow causes bedtools output to be written at extremely slow rate. The problem solves if i use the output directory location within the same file system in the cluster. I will post more details in case this happens only with bedtools or is a general I/O issue.

Regardless- I tried reformat.sh and based on output rate, it turns out to be relatively efficient. Thanks so much for the clue.
ty23991 is offline   Reply With Quote
Old 04-12-2018, 07:13 PM   #4
muxingu
Junior Member
 
Location: uk

Join Date: Apr 2011
Posts: 3
Default

Use the "bamtofastq" in Biobambam instead. It runs way faster and the code is highly optimized. Bedtools is written by amateurs. That's why it's slow.
muxingu is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:57 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO