SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to extract mapped and unmapped raw reads from bwa's sam file ? vaibhavvsk Bioinformatics 11 02-07-2013 09:01 AM
.SAM to .BAM with SAM file header @PG emilyjia2000 Bioinformatics 13 06-14-2011 12:21 PM
How to convert a bam file to sam file badhikari Bioinformatics 2 04-01-2011 08:56 AM
mate unmapped and read unmapped rururara Bioinformatics 1 02-25-2011 01:31 AM
BWA: specifying SAM/BAM file header fields before read alignment? nora Bioinformatics 3 12-04-2010 09:11 PM

Reply
 
Thread Tools
Old 07-01-2010, 07:33 AM   #1
genelab
Member
 
Location: honston

Join Date: Nov 2009
Posts: 27
Default Find unmapped read from sam/bam file

Hi guys

Could anyone tell me how to extract the unmapped reads from sam/bam file and convert the result to fastq format using samtools?

Thx!
genelab is offline   Reply With Quote
Old 07-01-2010, 07:51 AM   #2
Adrian_H
Member
 
Location: Cambridge, MA

Join Date: Feb 2010
Posts: 10
Default

samtools view -f 4 yourbamfile.bam will give you unmapped reads

Then pull out the first column of read names (cut -f1 -d" ") and extract those reads from your original fastq files, or make an awk script to reformat the readid, sequence, and quality scores into fastq.

Note that depending on the alignment program that you are using, unmapped reads may or may not be reported in the results. Also, some programs trim off the /1 or /2 of the read ID, if you are working with paired ends). Finally, keep in mind that if you use this to extract other flags, the sequence in the BAM file is only what aligned, and could be the reverse complement of the input. (Shouldn't be an issue for unmapped reads)

Last edited by Adrian_H; 07-01-2010 at 07:52 AM. Reason: changed filename
Adrian_H is offline   Reply With Quote
Old 07-02-2010, 12:30 AM   #3
Adamo
Member
 
Location: Paris

Join Date: Jun 2010
Posts: 28
Default

If you've used bwasw then the command line suggested by Adrian won't work.
You'll have to write your own perl or awk script to extract unmapped reads comparing your bam output with your fastq file and rewrite it omitting aligned reads.
Adamo is offline   Reply With Quote
Old 12-09-2010, 09:10 AM   #4
bhootnaath
Junior Member
 
Location: NM

Join Date: Jul 2009
Posts: 5
Default

can use bam2fastq

http://www.hudsonalpha.org/gsl/software/bam2fastq.php
bhootnaath is offline   Reply With Quote
Old 12-09-2010, 10:27 AM   #5
csquared
Member
 
Location: Huntsville, AL

Join Date: May 2008
Posts: 67
Default

+1 on the BAM2FASTQ. Great tool...of course I'm biased as it came from my group but it is well documented and fast. Let us know if you have any questions or problems.
__________________
HudsonAlpha Institute for Biotechnology
http://www.hudsonalpha.org/gsl
csquared is offline   Reply With Quote
Old 04-27-2011, 01:20 AM   #6
byb121
Member
 
Location: Newcastle upon Tyne

Join Date: Aug 2009
Posts: 18
Default

Hi,

I used bam2fastq tool to extract unmapped reads, it's really fast and better documented. but I had difficulties to address the cause of the warning message:

Code:
$ ./bam2fastq -o s_%#_extracted_reads.txt -f --no-aligned --unaligned --no-filter alignments.bam 
[bam_header_read] EOF marker is absent.
This looks like paired data from lane 1.
Output will be in s_1_1_extracted_reads.txt and s_1_2_extracted_reads.txt
55130926 sequences in the BAM file
8238703 sequences exported
WARNING: 5947209 reads could not be matched to a mate and were not exported
Fastq files contain 1145747 reads each, which means those 5947209 unmapped reads are discarded. But I really would like to have them included in the result. Could you help me out here?

PS: Reads are pair end, ranging from 25 - 78 after quality trimming.


Quote:
Originally Posted by csquared View Post
+1 on the BAM2FASTQ. Great tool...of course I'm biased as it came from my group but it is well documented and fast. Let us know if you have any questions or problems.
byb121 is offline   Reply With Quote
Old 08-15-2013, 03:05 AM   #7
vishal.rossi
Member
 
Location: Bonn Germany

Join Date: Apr 2013
Posts: 25
Default

samtools view -bh -f 0*4 -o output.file input
vishal.rossi is offline   Reply With Quote
Old 03-18-2014, 03:53 AM   #8
JonB
Member
 
Location: Norway

Join Date: Jan 2010
Posts: 83
Default

What about reads mapping to the reverse strand? Should they be reverse complemented before converting to fastq?
JonB is offline   Reply With Quote
Old 03-18-2014, 08:48 AM   #9
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Quote:
Originally Posted by JonB View Post
What about reads mapping to the reverse strand? Should they be reverse complemented before converting to fastq?
No, sequences and qualities are always the same as the source fastq, regardless of mapped strand.
Brian Bushnell is offline   Reply With Quote
Old 03-18-2014, 01:35 PM   #10
JonB
Member
 
Location: Norway

Join Date: Jan 2010
Posts: 83
Default

Good to know, thanks!
JonB is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:08 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO