SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
To convert BAM file to BED file Anjali Bioinformatics 10 04-28-2014 05:43 AM
For MAQ: Is there a Tool to convert sanger-format fastq file to illumina-fotmat fastq byb121 Bioinformatics 6 12-20-2013 02:26 AM
Convert merged BAM back to per lane BAM or FASTQ file danielsbrewer Bioinformatics 6 10-03-2013 08:29 AM
Are there any good ways to use SAMtools java API to convert .bam file into .txt file? alextree Bioinformatics 8 01-24-2012 10:20 AM
How to convert a bam file to sam file badhikari Bioinformatics 2 04-01-2011 09:56 AM

Reply
 
Thread Tools
Old 09-28-2010, 01:49 PM   #1
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 117
Default Convert BAM file to FASTQ

After a quick search I found these:

Hydra
Picard (SAMToFastq)
HudsonAlpha
Possibly EMBOSS

Any comments on these? Any other options for BAM-to-FASTQ conversion?

Basically I want to recover all paired-end reads (both R1 and R2) that were fed into the alignment that produced the BAM file, whether they mapped successfully or not.

Last edited by malachig; 09-28-2010 at 02:04 PM.
malachig is offline   Reply With Quote
Old 09-28-2010, 02:42 PM   #2
shurjo
Senior Member
 
Location: Rockville, MD

Join Date: Jan 2009
Posts: 126
Default

I've used Picard and it works fine for me.
shurjo is offline   Reply With Quote
Old 09-29-2010, 02:55 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

You may want to filter the BAM file to remove any non-primary mappings (otherwise you could get duplicate entries in the FASTQ file). The tools may do that for you.

You may also want to append /1 and /2 to the forward and reverse read names (this information isn't currently stored in SAM/BAM format but there is a proposed tag for the read name suffix in the draft standard update).

Also double check that any reads mapped to the reverse stand get reverse complemented when writing the FASTQ file since you want to recover the input sequences.

There are also DIY approaches, for example BAM to SAM and then a Perl/Python script. I have some experimental code for Biopython to do this too.

There was a thread on this on the samtools-help mailing list in August 2010, "BAM to fastq how?"
maubp is offline   Reply With Quote
Old 09-29-2010, 01:58 PM   #4
ekg
Member
 
Location: Boston, MA

Join Date: Apr 2010
Posts: 36
Default

Bamtools (http://github.com/pezmaster31/bamtools) can convert BAM to FASTQ.

bamtools convert -in file1.bam -in file2.bam ... -format fastq >reads.fq
ekg is offline   Reply With Quote
Old 11-15-2010, 10:38 AM   #5
ElMichael
Member
 
Location: UK

Join Date: Jun 2009
Posts: 31
Default

Hi,

For BamtoFastq convertion I use Bamtools.
But when I try to convert one of my bam files to fastq I get the following error message
"BGZF ERROR: read block failed - could not read data from block"
The problem is that after this step bamtools exits. Is it possible to avoid it? I don't know, somehow to tell bamtools just to skip such block and continue. Or, like in the picard, is there any VALIDATION_STRIGENCY option that could be set lenient or silent?
Just to mention, these bam files contain unmapped PE reads.
thanks

Last edited by ElMichael; 11-15-2010 at 10:44 AM.
ElMichael is offline   Reply With Quote
Old 11-15-2011, 11:57 PM   #6
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 197
Default

On Picard,
my service provider mentioned this
"Using picard tools directly has one significant drawback. Picard tools will read in sequence from the BAM
line by line and cache it until it has both reads. Once it has both reads it will print them out and free the
memory. Unfortunately this means that every read which doesn't have the pairs near each other will
take memory. In the example above it took 2.5GB of memory for 120GB of sequence but this is not
guaranteed and will get worse on larger builds.
"

Sounds terrible to me..

fortunately there's method 2

'You can specify samtools memory usage (it'll use temporary files) so if you sort the BAM by name prior
to running picard tools on it you guarantee the reads are next to each other and picard tools will barely
use any memory. '



side question, was there anything in the original fastq one might want to keep that you can't find in the sorted bams? I am inclined to retrieve the original fastq files but data storage might be a problem for me.
KevinLam is offline   Reply With Quote
Old 11-16-2011, 09:39 AM   #7
swbarnes2
Senior Member
 
Location: San Diego

Join Date: May 2008
Posts: 912
Default

I've use Picard on .bams generated by bwa/samtools, and it definately keeps the unmapped reads. But that's because the .bam has them. If you used an aligner that tossed them, or put them in another .bam (didn't bowtie used to do that be default?) Then there's nothing any software can do about that.

I've never tried to get them back out as paired reads. I assume that it uses the flag to know which is read 1 and which is read 2, but it might not know to order them properly. If your .bam has all the reads sorted by name, and you haven't filtered out any single reads, I bet the fastqs would be in the right order.

Last edited by swbarnes2; 11-16-2011 at 10:41 AM.
swbarnes2 is offline   Reply With Quote
Old 02-13-2012, 12:46 PM   #8
tsucheta
Member
 
Location: Arlington, TX

Join Date: Nov 2009
Posts: 17
Default

Try using bam2fastq from hudsonalpha at http://www.hudsonalpha.org/gsl/software/bam2fastq.php. It is very quick (processed my bam files size ranging from 0.5 - 4 GB(8 files) in less than 10 minutes in a standard 2 core linux machine.)
tsucheta is offline   Reply With Quote
Old 02-12-2013, 07:04 AM   #9
Johnnyalive
Junior Member
 
Location: Oxford

Join Date: Feb 2013
Posts: 1
Default Help using bamtools

I'm new to this and looking for help too - when I use bamtools to convert my .bam file to fastq, I only get one output file. Is it possible to split pair-ended reads into two output files? Can someone suggest a method?
Many thanks,
Johnny.
Johnnyalive is offline   Reply With Quote
Old 02-12-2013, 08:02 AM   #10
vivek_
PhD Student
 
Location: Denmark

Join Date: Jul 2012
Posts: 163
Default

You just specify two different output files like:

java picard-tools/SamToFastq.jar I=Input.bam F=seq1_1.fastq F2=seq1_2.fastq

You can also split these by read groups using additional command line arguments.
vivek_ is offline   Reply With Quote
Old 03-05-2013, 12:42 AM   #11
abhinay
Junior Member
 
Location: Saudi Arabia

Join Date: Mar 2013
Posts: 2
Default TopHat

The following command in Tophat can convert bam to fastq (with basic settings)

bam2fastx -q -Q -A -o output.fastq input.bam

for more manipulation

bam2fastx [--fasta|-a|--fastq|-q] [--color] [-Q] [--sam|-s|-t]
[-M|--mapped-only|-A|--all] [-o <outfile>] [-P|--paired] [-N] <in.bam>

Note: By default, reads flagged as not passing quality controls are
discarded; the -Q option can be used to ignore the QC flag.

Use the -N option if the /1 and /2 suffixes should be appended to
read names according to the SAM flags
abhinay is offline   Reply With Quote
Old 03-05-2013, 08:01 AM   #12
amarth
Member
 
Location: Mexico City

Join Date: Dec 2012
Posts: 14
Smile

Quote:
Originally Posted by abhinay View Post
The following command in Tophat can convert bam to fastq (with basic settings)

bam2fastx -q -Q -A -o output.fastq input.bam

for more manipulation

bam2fastx [--fasta|-a|--fastq|-q] [--color] [-Q] [--sam|-s|-t]
[-M|--mapped-only|-A|--all] [-o <outfile>] [-P|--paired] [-N] <in.bam>

Note: By default, reads flagged as not passing quality controls are
discarded; the -Q option can be used to ignore the QC flag.

Use the -N option if the /1 and /2 suffixes should be appended to
read names according to the SAM flags
I second that
amarth is offline   Reply With Quote
Old 11-26-2014, 09:19 AM   #13
nahalm63
Junior Member
 
Location: New York

Join Date: Nov 2014
Posts: 1
Default

Hi, I am new here. Can any one tell me what script you use to convert BAM files to FASTQ in PICARD? tnx



Quote:
Originally Posted by malachig View Post
After a quick search I found these:

Hydra
Picard (SAMToFastq)
HudsonAlpha
Possibly EMBOSS

Any comments on these? Any other options for BAM-to-FASTQ conversion?

Basically I want to recover all paired-end reads (both R1 and R2) that were fed into the alignment that produced the BAM file, whether they mapped successfully or not.
nahalm63 is offline   Reply With Quote
Old 11-26-2014, 09:28 AM   #14
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

Code:
java -jar /usr/local/tools/picard-tools-1.114/SamToFastq.jar \
VALIDATION_STRINGENCY=SILENT \
INPUT=HI.1965.007.Index_1.FL_K562-110k-A.bam \
FASTQ=HI.1965.007.Index_1.FL_K562-110k-A_R1.fastq \
SECOND_END_FASTQ=HI.1965.007.Index_1.FL_K562-110k-A_R2.fastq \
&> bamtofastq.sh.log
blancha is offline   Reply With Quote
Old 03-26-2015, 09:47 AM   #15
Thorondor
Member
 
Location: Heidelberg

Join Date: Feb 2011
Posts: 69
Default

found this thread and decided to revive it.
Did anyone tried to get back to several fastq pairs r1 and r2 merged into one bam file. Alignment was done with bwa mem, merging with biobambam.
3 seperately sequenced lanes where the input.
Right now I use picard bam2fastq are there any other feasible options?
And do I really get back to the 100% identical fastq files which where the original input?
Thorondor is offline   Reply With Quote
Old 03-26-2015, 10:15 AM   #16
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

Have you tried reformat.sh that is part of BBMap suite (http://seqanswers.com/forums/showthr...=reformat.sh)?
You should be able to get back the lane specific files (you will need to parse them out) as long as fastq identifier was not changed.
GenoMax is offline   Reply With Quote
Old 03-26-2015, 03:37 PM   #17
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Sam/bam to fastq is tricky for a couple reasons:

1) Sam format requires read 1 and read 2 have identical names, which will necessarily strip off some information when they don't - for example, Illumina reads typically have " /1" and " /2" at the end or something similar to indicate read 1 and read 2. This cannot be recovered; typically, everything after the first whitespace is removed.
2) Sam format can have have an arbitrary number of lines per fastq read, and they are not necessarily in any particular order.
3) If the sam file is hard-clipped, bases will be lost.

Therefore, no, in the general case it is impossible to get the 100% identical original fastq from some arbitrary bam file. Reformat will do a fairly good job, though, if you run it with the "primaryonly" flag. If the bam file was sorted, then you can use the accompanying repair.sh script to reorder the resulting fastq file so that pairs are together.
Brian Bushnell is offline   Reply With Quote
Old 03-27-2015, 12:50 AM   #18
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

There is also
Code:
samtools bam2fq
although there is a bug where it gives FASTA records instead of FASTQ records if your SAM/BAM file is missing qualities:
https://github.com/samtools/samtools/issues/313
maubp is offline   Reply With Quote
Old 03-27-2015, 06:47 AM   #19
maxsalm
Member
 
Location: London

Join Date: Feb 2015
Posts: 18
Default

Hi everyone,

the ever impressive bedtools also has a BAM-to-FASTQ utility ( http://bedtools.readthedocs.org/en/l...amtofastq.html )
maxsalm is offline   Reply With Quote
Old 05-19-2015, 03:25 AM   #20
kaps
Member
 
Location: Uganda

Join Date: Jan 2015
Posts: 71
Default

Quote:
Originally Posted by maxsalm View Post
Hi everyone,

the ever impressive bedtools also has a BAM-to-FASTQ utility ( http://bedtools.readthedocs.org/en/l...amtofastq.html )
Hello I have bedtools but not succeeded.

here is the error!

bedtools bamtofastq -i lib4seq.align.qsort.bam -fq lib4_align.end1.fq -fq2 lib4_align.end2.fq
*****WARNING: Query M01601:32:000000000-A7VGV:1:1101:5288:17174 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query M01601:32:000000000-A7VGV:1:1101:5769:20128 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query M01601:32:000000000-A7VGV:1:1101:5815:17504 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query M01601:32:000000000-A7VGV:1:1101:5824:20930 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query M01601:32:000000000-A7VGV:1:1101:5991:18768 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping.
*****WARNING: Query M01601:32:000000000-A7VGV:1:1101:6432:20279 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping.


where is the problem?
kaps is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:14 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO