![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
To convert BAM file to BED file | Anjali | Bioinformatics | 10 | 04-28-2014 05:43 AM |
For MAQ: Is there a Tool to convert sanger-format fastq file to illumina-fotmat fastq | byb121 | Bioinformatics | 6 | 12-20-2013 02:26 AM |
Convert merged BAM back to per lane BAM or FASTQ file | danielsbrewer | Bioinformatics | 6 | 10-03-2013 08:29 AM |
Are there any good ways to use SAMtools java API to convert .bam file into .txt file? | alextree | Bioinformatics | 8 | 01-24-2012 10:20 AM |
How to convert a bam file to sam file | badhikari | Bioinformatics | 2 | 04-01-2011 09:56 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: WashU Join Date: Aug 2010
Posts: 117
|
![]()
After a quick search I found these:
Hydra Picard (SAMToFastq) HudsonAlpha Possibly EMBOSS Any comments on these? Any other options for BAM-to-FASTQ conversion? Basically I want to recover all paired-end reads (both R1 and R2) that were fed into the alignment that produced the BAM file, whether they mapped successfully or not. Last edited by malachig; 09-28-2010 at 02:04 PM. |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Rockville, MD Join Date: Jan 2009
Posts: 126
|
![]()
I've used Picard and it works fine for me.
|
![]() |
![]() |
![]() |
#3 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
You may want to filter the BAM file to remove any non-primary mappings (otherwise you could get duplicate entries in the FASTQ file). The tools may do that for you.
You may also want to append /1 and /2 to the forward and reverse read names (this information isn't currently stored in SAM/BAM format but there is a proposed tag for the read name suffix in the draft standard update). Also double check that any reads mapped to the reverse stand get reverse complemented when writing the FASTQ file since you want to recover the input sequences. There are also DIY approaches, for example BAM to SAM and then a Perl/Python script. I have some experimental code for Biopython to do this too. There was a thread on this on the samtools-help mailing list in August 2010, "BAM to fastq how?" |
![]() |
![]() |
![]() |
#4 |
Member
Location: Boston, MA Join Date: Apr 2010
Posts: 36
|
![]()
Bamtools (http://github.com/pezmaster31/bamtools) can convert BAM to FASTQ.
bamtools convert -in file1.bam -in file2.bam ... -format fastq >reads.fq |
![]() |
![]() |
![]() |
#5 |
Member
Location: UK Join Date: Jun 2009
Posts: 31
|
![]()
Hi,
For BamtoFastq convertion I use Bamtools. But when I try to convert one of my bam files to fastq I get the following error message "BGZF ERROR: read block failed - could not read data from block" The problem is that after this step bamtools exits. Is it possible to avoid it? I don't know, somehow to tell bamtools just to skip such block and continue. Or, like in the picard, is there any VALIDATION_STRIGENCY option that could be set lenient or silent? Just to mention, these bam files contain unmapped PE reads. thanks Last edited by ElMichael; 11-15-2010 at 10:44 AM. |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: SEA Join Date: Nov 2009
Posts: 203
|
![]()
On Picard,
my service provider mentioned this "Using picard tools directly has one significant drawback. Picard tools will read in sequence from the BAM line by line and cache it until it has both reads. Once it has both reads it will print them out and free the memory. Unfortunately this means that every read which doesn't have the pairs near each other will take memory. In the example above it took 2.5GB of memory for 120GB of sequence but this is not guaranteed and will get worse on larger builds. " Sounds terrible to me.. fortunately there's method 2 'You can specify samtools memory usage (it'll use temporary files) so if you sort the BAM by name prior to running picard tools on it you guarantee the reads are next to each other and picard tools will barely use any memory. ' side question, was there anything in the original fastq one might want to keep that you can't find in the sorted bams? I am inclined to retrieve the original fastq files but data storage might be a problem for me.
__________________
http://kevin-gattaca.blogspot.com/ |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: San Diego Join Date: May 2008
Posts: 912
|
![]()
I've use Picard on .bams generated by bwa/samtools, and it definately keeps the unmapped reads. But that's because the .bam has them. If you used an aligner that tossed them, or put them in another .bam (didn't bowtie used to do that be default?) Then there's nothing any software can do about that.
I've never tried to get them back out as paired reads. I assume that it uses the flag to know which is read 1 and which is read 2, but it might not know to order them properly. If your .bam has all the reads sorted by name, and you haven't filtered out any single reads, I bet the fastqs would be in the right order. Last edited by swbarnes2; 11-16-2011 at 10:41 AM. |
![]() |
![]() |
![]() |
#8 |
Member
Location: Arlington, TX Join Date: Nov 2009
Posts: 17
|
![]()
Try using bam2fastq from hudsonalpha at http://www.hudsonalpha.org/gsl/software/bam2fastq.php. It is very quick (processed my bam files size ranging from 0.5 - 4 GB(8 files) in less than 10 minutes in a standard 2 core linux machine.)
|
![]() |
![]() |
![]() |
#9 |
Junior Member
Location: Oxford Join Date: Feb 2013
Posts: 1
|
![]()
I'm new to this and looking for help too - when I use bamtools to convert my .bam file to fastq, I only get one output file. Is it possible to split pair-ended reads into two output files? Can someone suggest a method?
Many thanks, Johnny. |
![]() |
![]() |
![]() |
#10 |
PhD Student
Location: Denmark Join Date: Jul 2012
Posts: 164
|
![]()
You just specify two different output files like:
java picard-tools/SamToFastq.jar I=Input.bam F=seq1_1.fastq F2=seq1_2.fastq You can also split these by read groups using additional command line arguments. |
![]() |
![]() |
![]() |
#11 |
Junior Member
Location: Saudi Arabia Join Date: Mar 2013
Posts: 2
|
![]()
The following command in Tophat can convert bam to fastq (with basic settings)
bam2fastx -q -Q -A -o output.fastq input.bam for more manipulation bam2fastx [--fasta|-a|--fastq|-q] [--color] [-Q] [--sam|-s|-t] [-M|--mapped-only|-A|--all] [-o <outfile>] [-P|--paired] [-N] <in.bam> Note: By default, reads flagged as not passing quality controls are discarded; the -Q option can be used to ignore the QC flag. Use the -N option if the /1 and /2 suffixes should be appended to read names according to the SAM flags |
![]() |
![]() |
![]() |
#12 | |
Member
Location: Mexico City Join Date: Dec 2012
Posts: 14
|
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#13 | |
Junior Member
Location: New York Join Date: Nov 2014
Posts: 1
|
![]()
Hi, I am new here. Can any one tell me what script you use to convert BAM files to FASTQ in PICARD? tnx
Quote:
|
|
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: Montreal Join Date: May 2013
Posts: 367
|
![]() Code:
java -jar /usr/local/tools/picard-tools-1.114/SamToFastq.jar \ VALIDATION_STRINGENCY=SILENT \ INPUT=HI.1965.007.Index_1.FL_K562-110k-A.bam \ FASTQ=HI.1965.007.Index_1.FL_K562-110k-A_R1.fastq \ SECOND_END_FASTQ=HI.1965.007.Index_1.FL_K562-110k-A_R2.fastq \ &> bamtofastq.sh.log |
![]() |
![]() |
![]() |
#15 |
Member
Location: Heidelberg Join Date: Feb 2011
Posts: 69
|
![]()
found this thread and decided to revive it.
Did anyone tried to get back to several fastq pairs r1 and r2 merged into one bam file. Alignment was done with bwa mem, merging with biobambam. 3 seperately sequenced lanes where the input. Right now I use picard bam2fastq are there any other feasible options? And do I really get back to the 100% identical fastq files which where the original input? |
![]() |
![]() |
![]() |
#16 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,083
|
![]()
Have you tried reformat.sh that is part of BBMap suite (http://seqanswers.com/forums/showthr...=reformat.sh)?
You should be able to get back the lane specific files (you will need to parse them out) as long as fastq identifier was not changed. |
![]() |
![]() |
![]() |
#17 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Sam/bam to fastq is tricky for a couple reasons:
1) Sam format requires read 1 and read 2 have identical names, which will necessarily strip off some information when they don't - for example, Illumina reads typically have " /1" and " /2" at the end or something similar to indicate read 1 and read 2. This cannot be recovered; typically, everything after the first whitespace is removed. 2) Sam format can have have an arbitrary number of lines per fastq read, and they are not necessarily in any particular order. 3) If the sam file is hard-clipped, bases will be lost. Therefore, no, in the general case it is impossible to get the 100% identical original fastq from some arbitrary bam file. Reformat will do a fairly good job, though, if you run it with the "primaryonly" flag. If the bam file was sorted, then you can use the accompanying repair.sh script to reorder the resulting fastq file so that pairs are together. |
![]() |
![]() |
![]() |
#18 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
There is also
Code:
samtools bam2fq https://github.com/samtools/samtools/issues/313 |
![]() |
![]() |
![]() |
#19 |
Member
Location: London Join Date: Feb 2015
Posts: 18
|
![]()
Hi everyone,
the ever impressive bedtools also has a BAM-to-FASTQ utility ( http://bedtools.readthedocs.org/en/l...amtofastq.html ) |
![]() |
![]() |
![]() |
#20 | |
Member
Location: Uganda Join Date: Jan 2015
Posts: 71
|
![]() Quote:
here is the error! bedtools bamtofastq -i lib4seq.align.qsort.bam -fq lib4_align.end1.fq -fq2 lib4_align.end2.fq *****WARNING: Query M01601:32:000000000-A7VGV:1:1101:5288:17174 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping. *****WARNING: Query M01601:32:000000000-A7VGV:1:1101:5769:20128 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping. *****WARNING: Query M01601:32:000000000-A7VGV:1:1101:5815:17504 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping. *****WARNING: Query M01601:32:000000000-A7VGV:1:1101:5824:20930 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping. *****WARNING: Query M01601:32:000000000-A7VGV:1:1101:5991:18768 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping. *****WARNING: Query M01601:32:000000000-A7VGV:1:1101:6432:20279 is marked as paired, but it's mate does not occur next to it in your BAM file. Skipping. where is the problem? |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|