SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting unpaired reads from BAM file JChase Bioinformatics 4 08-13-2012 07:23 PM
Alternative tools for BAM to Paired-End FASTQ oiiio Bioinformatics 1 07-05-2012 12:54 PM
Unpaired reads in paired data Nick General 0 06-22-2011 08:19 AM
RNA-Seq analysis with paired and unpaired reads bzhang Bioinformatics 0 05-14-2010 03:15 PM

Reply
 
Thread Tools
Old 06-18-2013, 12:33 AM   #1
Kennels
Senior Member
 
Location: Sydney

Join Date: Feb 2011
Posts: 149
Default bam to paired AND unpaired fastq reads

Hi,

Some of the posts I found here are a bit old, so in case there new tools...

Does anyone know of a tool that extracts from a .bam file into both the paired reads fastq AND unpaired (single) reads fastq?
Picard SamtoFastq does not do this, and a tool from bamUtils does it but it is quite slow.

I don't care about mapping information, I just want the paired and orphaned reads.

Thank you
Kennels is offline   Reply With Quote
Old 06-18-2013, 02:31 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

If you're familiar with programming, the samtools C API is pretty convenient and could be used to do this quickly. There's also pysam in python, which I assume is a bit more approachable than using C (I've never used it and don't know what your background is).
dpryan is offline   Reply With Quote
Old 06-18-2013, 04:35 AM   #3
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Try piping samtools and picard together. Using the filtering flags for samtools view you can output only single or paired reads and strem those to picard SamToFastq.

Paired end
Code:
samtools view -uf 1 input.bam | java -jar SamToFastq INPUT=/dev/stdin [rest of options]
For single end change the 'f' to 'F' in the samtools command.

*This solution is untested
kmcarr is offline   Reply With Quote
Old 06-18-2013, 04:59 AM   #4
sBeier
Member
 
Location: Germany

Join Date: Jan 2013
Posts: 42
Default

If you filter your bamfile to get a) only the mappings where both reads map and b) the others you could use bedtools bamtofastq. It's not fast, but you can get both paired fastq for the first file and single ones for the second.
sBeier is offline   Reply With Quote
Old 06-18-2013, 11:55 PM   #5
Kennels
Senior Member
 
Location: Sydney

Join Date: Feb 2011
Posts: 149
Default

thanks everyone.
I tried kmcarr's solution, but ran into errors due to some mates not being present in the .bam.

I found this solution: http://seqanswers.com/forums/showthread.php?t=16395

and eventually was able to compile the code, and it works very nicely and quickly (under 5 mins for a 2.5Gb bam). It outputs both the paired reads and orphan'ed reads, but found that it will also duplicate a read if there are multiple alignments for it.

Since my goal was to just get the reads that could simply align, I regenerated the .bam (accepted_hits.bam from tophat2 output) to report only one alignment per read.
Kennels is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:43 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO