Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
pysam: a python wrapper of samtools TGBelgard Bioinformatics 36 01-28-2016 03:13 AM
pybedtools - a Python wrapper for BEDTools daler Bioinformatics 7 09-30-2014 12:10 PM
bowtie index problem (bowtie-build and then bowtie-inspect) tgenahmet Bioinformatics 4 09-10-2013 12:51 PM
Script Help DrD2009 Bioinformatics 4 07-15-2011 09:17 AM
just perl script semna Bioinformatics 3 07-02-2011 09:42 AM

Thread Tools
Old 11-10-2009, 01:08 PM   #1
Senior Member
Location: Boston

Join Date: Feb 2008
Posts: 693
Default bowtie wrapper script

I was using the following script to launch bowtie. I think it may be useful to others and so post it here. This wrapper script uses bowtie internally, but does a little bit more things which are at least convenient to me:

1) Take gzip'ed fastq files as input. (Most fastq files are compressed)

2) For paired-end reads, align singletons in bowtie's single-end mode. (This saves a bowtie command line typed by yourself)

3) Fix a minor problem in SAM out. (Names of two ends in a pair should be identical; otherwise duplicate removal may have trouble)

4) More accurate (albeit slower) command line option: [--best --strata] -m 1. (Recommended by a users who uses bowtie a lot)

Please reply to this thread if you think the command line option is not appropriate, or the script is buggy/can be improved. Thanks.

#!/usr/bin/perl -w

# Contact: lh3
# Last modified: 05NOV2009

use strict;
use warnings;
use Cwd qw/getcwd abs_path/;
use File::Temp qw/tempfile/;

die("Usage: <bowtieIndex> <read1.fq> [read2.fq] [options] | gzip > aln.sam.gz\n") if (@ARGV < 2 || $ARGV[0] =~ /^-/ || $ARGV[1] =~ /^-/);

my $idx = shift(@ARGV);
my $rd1 = (@ARGV && $ARGV[0] !~ /^-/)? shift(@ARGV) : '-';
my $rd2 = (@ARGV && $ARGV[0] !~ /^-/)? shift(@ARGV) : '';

# set bowtie cmd options
my $opt0 = join(" ", @ARGV);
$opt0 =~ s/--maxins/-X/;
$opt0 =~ s/--minins/-I/;
$opt0 =~ s/--un \S+\b//;

# locate bowtie
my $bowtie = &gwhich('bowtie');
die("[run_bowtie] fail to find the bowtie executable.\n") unless ($bowtie);

my ($fh1, $fh2);
open($fh1, ($rd1 =~ /\.gz$/)? "gzip -dc $rd1 |" : $rd1) || die;
if ($rd2) {
  open($fh2, ($rd2 =~ /\.gz$/)? "gzip -dc $rd2 |" : $rd2) || die;

if ($fh2) { # paired-end mode
  my ($fh, $fn_ump, @col);
  if ($opt0 !~ /-X/) {
	warn("[run_bowtie] the maximum insert size is set as the default value (250).\n");
  (undef, $fn_ump) = tempfile("./rb-$$-XXXXXX");
  open($fh, qq/| $bowtie $opt0 -S -m 1 $idx --12 -/
	   . q/ | awk '{if($3!="*"||$12!="XM:i:0")print;else print $1"\t"$10"\t"$11>/ . qq/"$fn_ump"}'/ . q{ | perl -pe 's/^(\S+)\/[12]\t/$1\t/'}) || die;
  while (<$fh1>) {
	$col[0] = $1 if (/^@(\S+)/);
	$col[1] = <$fh1>; chomp($col[1]); <$fh1>;
	$col[2] = <$fh1>; chomp($col[2]);
	$col[3] = <$fh2>; chomp($col[3]); <$fh2>;
	$col[4] = <$fh2>; chomp($col[4]);
	print $fh join("\t", @col), "\n";
  close($fh); close($fh1); close($fh2);
  # map singletons
  my $opt_se = $opt0;
  $opt_se =~ s/-X\s*\d+//;
  $opt_se =~ s/-I\s*\d+//;
  system(qq/$bowtie $opt_se -S -m 1 --best --strata --sam-nohead $idx --12 $fn_ump | awk '\$3!="*"||\$12!="XM:i:0"'/);
} else { # single-end mode
  my ($fh, @col);
  open($fh, qq/| $bowtie $opt0 -S -m 1 --best --strata $idx --12 - | awk '\$3!="*"||\$12!="XM:i:0"'/) || die;
  while (<$fh1>) {
	$col[0] = $1 if (/^@(\S+)/);
	$col[1] = <$fh1>; chomp($col[1]); <$fh1>;
	$col[2] = <$fh1>; chomp($col[2]);
	print $fh join("\t", @col), "\n";
  close($fh); close($fh1);

# routines to locate an executable

sub dirname {
  my $prog = shift;
  my $cwd = getcwd;
  return $cwd if ($prog !~ /\//);
  $prog =~ s/\/[^\s\/]+$//g;
  return $prog;

sub which {
  my $file = shift;
  my $path = (@_)? shift : $ENV{PATH};
  return if (!defined($path));
  foreach my $x (split(":", $path)) {
	$x =~ s/\/$//;
	return "$x/$file" if (-x "$x/$file" && -f "$x/$file");

sub gwhich {
  my $progname = shift;
  my $dirname = &dirname($0);
  my $tmp;
  if (-x $progname && -f $progname) {
	return abs_path($progname);
  } elsif (defined($dirname) && (-x "$dirname/$progname" && -f "$dirname/$progname")) {
	return abs_path("$dirname/$progname");
  } elsif (($tmp = &which($progname))) { # on the $PATH
	return $tmp;
  } else {
	warn("[gwhich] fail to find executable $progname.\n");
lh3 is offline   Reply With Quote
Old 09-22-2010, 07:54 AM   #2
Location: Institute, WV

Join Date: May 2010
Posts: 24

The wrapper is indeed a good one to use compressed input files.
I tried to use it with SOLiD reads. Thanks for posting.
However, I found that, it can be useful only if the input is in fastq format. If we have separate compressed (cs)fasta and qual files, then, there is no way I can supply them simultaneously on command line input.

I hope I did not miss a possible way of using compressed (cs)fasta and qual files simultaneously with the wrapper script. Or are there any other methods by which you can deal with this?
sridharacharya is offline   Reply With Quote
Old 09-22-2010, 01:38 PM   #3
Lee Sam
Location: Ann Arbor, MI

Join Date: Oct 2008
Posts: 57

Thanks for posting.
Lee Sam is offline   Reply With Quote
Old 01-20-2012, 08:28 AM   #4
Location: United States of America

Join Date: Mar 2011
Posts: 52

or a patch:
earonesty is offline   Reply With Quote
Old 04-11-2012, 06:04 AM   #5
Location: UK

Join Date: Sep 2009
Posts: 20

Originally Posted by earonesty View Post
Hi earonesty,
thanks for the link. I couldn't get the patch to apply (though admittedly I'm not experienced with doing so). Is the command much different to "patch bowtie bowtie-gzip.patch" ?
blackgore is offline   Reply With Quote
Old 06-28-2012, 12:27 PM   #6
Erin Wagner
Junior Member
Location: Indianapolis, IN

Join Date: Oct 2011
Posts: 1

I also attempted to use the patch, but it failed to apply properly. Is there a chance, since the patch was last updated in Jaunary 2012, the patch is incompatible with the newest version 0.12.8 (which was released in May 2012)?

If anyone has any ideas on how to apply the patch mention above to version 0.12.8, it would be greatly appreciated.


Erin Wagner is offline   Reply With Quote


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 05:29 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO