Seqanswers Leaderboard Ad

**Adrian_H** · 07-01-2010, 07:51 AM

samtools view -f 4 yourbamfile.bam will give you unmapped reads

Then pull out the first column of read names (cut -f1 -d" ") and extract those reads from your original fastq files, or make an awk script to reformat the readid, sequence, and quality scores into fastq.

Note that depending on the alignment program that you are using, unmapped reads may or may not be reported in the results. Also, some programs trim off the /1 or /2 of the read ID, if you are working with paired ends). Finally, keep in mind that if you use this to extract other flags, the sequence in the BAM file is only what aligned, and could be the reverse complement of the input. (Shouldn't be an issue for unmapped reads)

**Adamo** · 07-02-2010, 12:30 AM

If you've used bwasw then the command line suggested by Adrian won't work.
You'll have to write your own perl or awk script to extract unmapped reads comparing your bam output with your fastq file and rewrite it omitting aligned reads.

**bhootnaath** · 12-09-2010, 10:10 AM

can use bam2fastq

Page not found – HudsonAlpha Institute for Biotechnology

http://www.hudsonalpha.org/gsl/software/bam2fastq.php

**csquared** · 12-09-2010, 11:27 AM

+1 on the BAM2FASTQ. Great tool...of course I'm biased as it came from my group but it is well documented and fast. Let us know if you have any questions or problems.

**byb121** · 04-27-2011, 01:20 AM

Hi,

I used bam2fastq tool to extract unmapped reads, it's really fast and better documented. but I had difficulties to address the cause of the warning message:

Code:

$ ./bam2fastq -o s_%#_extracted_reads.txt -f --no-aligned --unaligned --no-filter alignments.bam 
[bam_header_read] EOF marker is absent.
This looks like paired data from lane 1.
Output will be in s_1_1_extracted_reads.txt and s_1_2_extracted_reads.txt
55130926 sequences in the BAM file
8238703 sequences exported
WARNING: 5947209 reads could not be matched to a mate and were not exported

Fastq files contain 1145747 reads each, which means those 5947209 unmapped reads are discarded. But I really would like to have them included in the result. Could you help me out here?

PS: Reads are pair end, ranging from 25 - 78 after quality trimming.

Originally posted by csquared View Post

+1 on the BAM2FASTQ. Great tool...of course I'm biased as it came from my group but it is well documented and fast. Let us know if you have any questions or problems.

**vishal.rossi** · 08-15-2013, 03:05 AM

samtools view -bh -f 0*4 -o output.file input

**JonB** · 03-18-2014, 03:53 AM

What about reads mapping to the reverse strand? Should they be reverse complemented before converting to fastq?

**Brian Bushnell** · 03-18-2014, 08:48 AM

Originally posted by JonB View Post

What about reads mapping to the reverse strand? Should they be reverse complemented before converting to fastq?

No, sequences and qualities are always the same as the source fastq, regardless of mapped strand.

**JonB** · 03-18-2014, 01:35 PM

Good to know, thanks!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 51 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 50 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 44 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Find unmapped read from sam/bam file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News