SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Convert BAM file to FASTQ malachig Bioinformatics 28 02-02-2016 03:04 AM
Convert merged BAM back to per lane BAM or FASTQ file danielsbrewer Bioinformatics 6 10-03-2013 07:29 AM
Reverse engineering BAM files: BAM -> FASTQ gene coder Bioinformatics 3 01-03-2012 02:42 PM
stand-alone blast problem tujchl Bioinformatics 1 08-17-2011 09:58 PM
stand-alone eland_rna donnyrayK Illumina/Solexa 0 12-02-2009 10:28 AM

Reply
 
Thread Tools
Old 07-29-2010, 05:02 AM   #1
dcfargo
Member
 
Location: Chapel Hill

Join Date: Aug 2008
Posts: 22
Default Stand Alone Bam to FASTQ

Does anyone have a suggested best practice utility for this?
dcfargo is offline   Reply With Quote
Old 07-29-2010, 05:14 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

What are you trying to do?
Do you want to pull out the reads as FASTQ records?
Do you care about the strand used for reads which mapped to the reverse stand?
Do you care about how paired end reads are named?

You could try seqret from EMBOSS 6.3.0,
http://lists.open-bio.org/pipermail/...ly/003947.html
maubp is offline   Reply With Quote
Old 07-29-2010, 06:23 AM   #3
dcfargo
Member
 
Location: Chapel Hill

Join Date: Aug 2008
Posts: 22
Default

I do care about recovery of all of the information.

I'd like to essentially recover all the initial text information that went into making the BAM file.
dcfargo is offline   Reply With Quote
Old 07-29-2010, 06:37 AM   #4
Martin R
Junior Member
 
Location: Germany

Join Date: May 2010
Posts: 7
Default

the problem you can run into is, that after alignment the quality values might change.
Martin R is offline   Reply With Quote
Old 07-29-2010, 06:45 AM   #5
dcfargo
Member
 
Location: Chapel Hill

Join Date: Aug 2008
Posts: 22
Default

Sorry for my ignorance - why might the quality values change?
dcfargo is offline   Reply With Quote
Old 07-29-2010, 06:50 AM   #6
Martin R
Junior Member
 
Location: Germany

Join Date: May 2010
Posts: 7
Default

That is no problem. It's also some point that confused me, btw is still confusing me.

The experince I made, ist that the aligned quality values (qv) in the sam files from e.g. bowtie are different from the ones in the original file. I think the values you get after the alignment are the qv from the alignment and not the one from the original file.
Martin R is offline   Reply With Quote
Old 07-29-2010, 06:53 AM   #7
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by dcfargo View Post
I do care about recovery of all of the information.

I'd like to essentially recover all the initial text information that went into making the BAM file.
Assuming I have understood your aim, that is not entirely possible.

e.g. Support you had some paired FASTQ reads like this:

Code:
@SRR001666.1/1 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
@SRR001666.1/2 071112_SLXA-EAS1_s_7:5:1:817:345 length=36
AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
+
IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
...
All that will be stored in SAM/BAM is the pair name without the suffix, here SRR001666.1, the sequence and quality. You lose any description from the FASTQ lines after the ID. Potentially the alignment tool may hard clip the reads so you don't even get the full sequence and quality.

If on converting SAM/BAM back to FASTQ you specify suffixes of /1 and /2, the best you can hope to recover is:

Code:
@SRR001666.1/1
GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIII9IG9IC
@SRR001666.1/2
AAGTTACCCTTAACAACTTAAGGGTTTTCAAATAGA
+
IIIIIIIIIIIIIIIIIIIIDIIIIIII>IIIIII/
...
This may or may not suffice for your needs.
maubp is offline   Reply With Quote
Old 07-29-2010, 07:01 AM   #8
dcfargo
Member
 
Location: Chapel Hill

Join Date: Aug 2008
Posts: 22
Default

Thanks so much.

Given some information may be lost and we'll just have to accept that would the best model for conversion be 2 steps such as:

1) SAMtools for BAM -> SAM

2) followed by a home made script for SAM -> FASTQ
dcfargo is offline   Reply With Quote
Old 07-29-2010, 07:12 AM   #9
Martin R
Junior Member
 
Location: Germany

Join Date: May 2010
Posts: 7
Default

Well you don't have to think that complicated. There are two libraries you can use, and than you have your converter. I e.g. prefer Java and use biojava to read/write FastQ (http://www.biojava.org/wiki/BioJavaownload_1.7.1) and use samtools (http://sourceforge.net/projects/picard/files/) to read BAM/SAM files (it's the same).
Then you only have to transform from a SAM Object to a FastQBuilder:

public FastqBuilder convert(SAMRecord element2) {
FastqBuilder builder = new FastqBuilder();
builder.withDescription(element2.getReadName());
builder.withQuality(element2.getBaseQualityString());
builder.withSequence(element2.getReadString());
return builder;
}

that's the easiest way.

good luck
Martin R is offline   Reply With Quote
Old 07-29-2010, 07:14 AM   #10
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by dcfargo View Post
Thanks so much.

Given some information may be lost and we'll just have to accept that would the best model for conversion be 2 steps such as:

1) SAMtools for BAM -> SAM

2) followed by a home made script for SAM -> FASTQ
Not necessarily.

As mentioned above, EMBOSS 6.3.x can do SAM/BAM direct to FASTQ, although it may not do exactly what you want it to do.

You could also write a script to go from BAM to FASTQ, for example using pysam to access the samtools C API from Python.

Personally I've been doing with SAM/BAM to FASTQ in Biopython (to recover reads to redo a mapping), but this is with an experimental branch and is not ready for general use.
maubp is offline   Reply With Quote
Old 07-29-2010, 07:16 AM   #11
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by Martin R View Post
Well you don't have to think that complicated. There are two libraries you can use, and than you have your converter. I e.g. prefer Java and use biojava to read/write FastQ (http://www.biojava.org/wiki/BioJavaownload_1.7.1) and use samtools (http://sourceforge.net/projects/picard/files/) to read BAM/SAM files (it's the same).
Then you only have to transform from a SAM Object to a FastQBuilder:

public FastqBuilder convert(SAMRecord element2) {
FastqBuilder builder = new FastqBuilder();
builder.withDescription(element2.getReadName());
builder.withQuality(element2.getBaseQualityString());
builder.withSequence(element2.getReadString());
return builder;
}

that's the easiest way.

good luck
Plus potentially add code to append /1 and /2 if dealing with paired end data.

Also I would reverse complement any reads mapped to the reverse strand to recover them in their original orientation pre-mapping.
maubp is offline   Reply With Quote
Old 08-02-2010, 05:56 AM   #12
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by maubp View Post
As mentioned above, EMBOSS 6.3.x can do SAM/BAM direct to FASTQ, although it may not do exactly what you want it to do.
Well, EMBOSS 6.3.1 isn't doing what I want it to do
http://lists.open-bio.org/pipermail/...st/000667.html
http://lists.open-bio.org/pipermail/...st/000668.html
This should be resolved in the next patch or point release though
http://lists.open-bio.org/pipermail/...st/000669.html

Peter

Last edited by maubp; 08-03-2010 at 01:38 AM. Reason: Adding link
maubp is offline   Reply With Quote
Old 07-25-2021, 04:26 PM   #13
divon
Junior Member
 
Location: Australia

Join Date: Jul 2021
Posts: 8
Default

For the sake of completeness, I will just mention that you can also achieve this with my Genozip program:

genozip file.bam <---- compresses the BAM file
genocat file.bam.genozip --output file.fq.gz <---- converts it to FASTQ


See documentation here: https://genozip.com/sam2fq.html

Paper here: https://www.researchgate.net/publica...ata_Compressor
divon is offline   Reply With Quote
Old 08-20-2021, 12:12 AM   #14
divon
Junior Member
 
Location: Australia

Join Date: Jul 2021
Posts: 8
Default

Hi Andrey, which file can't you open?
divon is offline   Reply With Quote
Old 08-20-2021, 12:47 AM   #15
divon
Junior Member
 
Location: Australia

Join Date: Jul 2021
Posts: 8
Default

Here's an alternative link: https://genozip.readthedocs.io/sam2fq.html

Does this work?
divon is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:51 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO