Seqanswers Leaderboard Ad

**maubp** · 05-31-2012, 07:00 AM

Normally unaligned reads are stored in FASTQ, although you can also store unaligned reads in SAM/BAM which is supported by some alignment tools (including BWA).

See also my blog post which might be of interest:

FASTQ must die! Long live SAM/BAM!

http://blastedbio.blogspot.co.uk/2011/10/fastq-must-die-long-live-sambam.html

I think it is time to retire the FASTQ file format in favour of storing unaligned reads in SAM/BAM format . I will try to explain, as thi...

Are you asking in general how to go from aligned reads in SAM/BAM to unaligned reads?

**rakesh12** · 05-31-2012, 07:11 AM

Hi Peter,

Thanks for your detailed response and very interesting blog.

Specifically, I want to align my RNA-seq data (in fastq format) using BWA with RefSeq reference database and at same time I want to store un-align reads (in fastq or SAM format) in separate file. As BWA does not have any option to store un-align reads. Can you please guide me?

Thanks again for your kind help,

Rakesh

**chadn737** · 05-31-2012, 08:02 AM

This is a perl script that will take your bam file and your original fastq file and generate an fastq file containing only your unaligned reads.

One caveat, you can't have any unaligned reads in your bam file.

I should also note that I did not write this script. If I remember correctly it was written by Simon Anders, the author of DESeq, DEXseq, and other useful things. I think he posted it on here sometime ago, but I have no idea which thread anymore.

Code:

#!/usr/bin/perl
use warnings;
use strict;

my ($fastq,$sam,$outfile) = @ARGV;

unless ($outfile) {
  die "Usage is filter_unmapped_reads.pl [FastQ file] [SAM File] [File for unmapped reads]\n";
}

if (-e $outfile) {
  die "Won't overwrite an existing file, delete it first!";
}

open (FASTQ,$fastq) or die "Can't open fastq file: $!";
open (SAM,$sam) or die "Can't open SAM file: $!";
open (OUT,'>',$outfile) or die "Can't write to $outfile: $!";

my $ids = read_ids();

filter_fastq($ids);

close OUT or die "Can't write to $outfile: $!";


sub filter_fastq {

  warn "Filtering FastQ file\n";

  my ($ids) = @_;

  while (<FASTQ>) {

    if (/^@(\S+)/) {
      my $id = $1;

      # Remove the end designator from paired end reads
      $id =~ s/\/\d+$//;

      my $seq = <FASTQ>;
      my $id2 = <FASTQ>;
      my $qual = <FASTQ>;


      unless (exists $ids->{$id}) {
	print OUT $_,$seq,$id2,$qual;
      }
    }
    else {
      warn "Line '$_' should have been an id line, but wasn't\n";
    }

  }

}


sub read_ids {

  warn "Collecting mapped ids\n";

  my $ids;

  while (<SAM>) {

    next if (/^@/);
    my ($id) = split(/\t/);
    $ids->{$id} = 1;
  }

  return $ids;
}

To use:

Code:

$ perl unaligned_reads.pl <in.fastq> <in.bam> <out.fastq>

**chadn737** · 05-31-2012, 08:06 AM

Ah, here is the thread where Simon originally posted it:

Extract unaligned reads (Tophat) from FastQ - SEQanswers

http://seqanswers.com/forums/archive/index.php/t-6847.html

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

**rakesh12** · 06-28-2012, 05:54 AM

Hi Chadn,

Thank you so much for your detailed response.

Best wishes,

Rakesh

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Un-align reads in BWA

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News