![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
pysam: a python wrapper of samtools | TGBelgard | Bioinformatics | 36 | 01-28-2016 03:13 AM |
pybedtools - a Python wrapper for BEDTools | daler | Bioinformatics | 7 | 09-30-2014 12:10 PM |
bowtie index problem (bowtie-build and then bowtie-inspect) | tgenahmet | Bioinformatics | 4 | 09-10-2013 12:51 PM |
Script Help | DrD2009 | Bioinformatics | 4 | 07-15-2011 09:17 AM |
just perl script | semna | Bioinformatics | 3 | 07-02-2011 09:42 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Boston Join Date: Feb 2008
Posts: 693
|
![]()
I was using the following script to launch bowtie. I think it may be useful to others and so post it here. This wrapper script uses bowtie internally, but does a little bit more things which are at least convenient to me:
1) Take gzip'ed fastq files as input. (Most fastq files are compressed) 2) For paired-end reads, align singletons in bowtie's single-end mode. (This saves a bowtie command line typed by yourself) 3) Fix a minor problem in SAM out. (Names of two ends in a pair should be identical; otherwise duplicate removal may have trouble) 4) More accurate (albeit slower) command line option: [--best --strata] -m 1. (Recommended by a users who uses bowtie a lot) Please reply to this thread if you think the command line option is not appropriate, or the script is buggy/can be improved. Thanks. Code:
#!/usr/bin/perl -w # Contact: lh3 # Last modified: 05NOV2009 use strict; use warnings; use Cwd qw/getcwd abs_path/; use File::Temp qw/tempfile/; die("Usage: run_bowtie.pl <bowtieIndex> <read1.fq> [read2.fq] [options] | gzip > aln.sam.gz\n") if (@ARGV < 2 || $ARGV[0] =~ /^-/ || $ARGV[1] =~ /^-/); my $idx = shift(@ARGV); my $rd1 = (@ARGV && $ARGV[0] !~ /^-/)? shift(@ARGV) : '-'; my $rd2 = (@ARGV && $ARGV[0] !~ /^-/)? shift(@ARGV) : ''; # set bowtie cmd options my $opt0 = join(" ", @ARGV); $opt0 =~ s/--maxins/-X/; $opt0 =~ s/--minins/-I/; $opt0 =~ s/--un \S+\b//; # locate bowtie my $bowtie = &gwhich('bowtie'); die("[run_bowtie] fail to find the bowtie executable.\n") unless ($bowtie); my ($fh1, $fh2); open($fh1, ($rd1 =~ /\.gz$/)? "gzip -dc $rd1 |" : $rd1) || die; if ($rd2) { open($fh2, ($rd2 =~ /\.gz$/)? "gzip -dc $rd2 |" : $rd2) || die; } if ($fh2) { # paired-end mode my ($fh, $fn_ump, @col); if ($opt0 !~ /-X/) { warn("[run_bowtie] the maximum insert size is set as the default value (250).\n"); } (undef, $fn_ump) = tempfile("./rb-$$-XXXXXX"); open($fh, qq/| $bowtie $opt0 -S -m 1 $idx --12 -/ . q/ | awk '{if($3!="*"||$12!="XM:i:0")print;else print $1"\t"$10"\t"$11>/ . qq/"$fn_ump"}'/ . q{ | perl -pe 's/^(\S+)\/[12]\t/$1\t/'}) || die; while (<$fh1>) { $col[0] = $1 if (/^@(\S+)/); $col[1] = <$fh1>; chomp($col[1]); <$fh1>; $col[2] = <$fh1>; chomp($col[2]); <$fh2>; $col[3] = <$fh2>; chomp($col[3]); <$fh2>; $col[4] = <$fh2>; chomp($col[4]); print $fh join("\t", @col), "\n"; } close($fh); close($fh1); close($fh2); # map singletons my $opt_se = $opt0; $opt_se =~ s/-X\s*\d+//; $opt_se =~ s/-I\s*\d+//; system(qq/$bowtie $opt_se -S -m 1 --best --strata --sam-nohead $idx --12 $fn_ump | awk '\$3!="*"||\$12!="XM:i:0"'/); unlink($fn_ump); } else { # single-end mode my ($fh, @col); open($fh, qq/| $bowtie $opt0 -S -m 1 --best --strata $idx --12 - | awk '\$3!="*"||\$12!="XM:i:0"'/) || die; while (<$fh1>) { $col[0] = $1 if (/^@(\S+)/); $col[1] = <$fh1>; chomp($col[1]); <$fh1>; $col[2] = <$fh1>; chomp($col[2]); print $fh join("\t", @col), "\n"; } close($fh); close($fh1); } # routines to locate an executable sub dirname { my $prog = shift; my $cwd = getcwd; return $cwd if ($prog !~ /\//); $prog =~ s/\/[^\s\/]+$//g; return $prog; } sub which { my $file = shift; my $path = (@_)? shift : $ENV{PATH}; return if (!defined($path)); foreach my $x (split(":", $path)) { $x =~ s/\/$//; return "$x/$file" if (-x "$x/$file" && -f "$x/$file"); } return; } sub gwhich { my $progname = shift; my $dirname = &dirname($0); my $tmp; chomp($dirname); if (-x $progname && -f $progname) { return abs_path($progname); } elsif (defined($dirname) && (-x "$dirname/$progname" && -f "$dirname/$progname")) { return abs_path("$dirname/$progname"); } elsif (($tmp = &which($progname))) { # on the $PATH return $tmp; } else { warn("[gwhich] fail to find executable $progname.\n"); return; } } |
![]() |
![]() |
![]() |
#2 |
Member
Location: Institute, WV Join Date: May 2010
Posts: 24
|
![]()
The wrapper is indeed a good one to use compressed input files.
I tried to use it with SOLiD reads. Thanks for posting. However, I found that, it can be useful only if the input is in fastq format. If we have separate compressed (cs)fasta and qual files, then, there is no way I can supply them simultaneously on command line input. I hope I did not miss a possible way of using compressed (cs)fasta and qual files simultaneously with the wrapper script. Or are there any other methods by which you can deal with this? |
![]() |
![]() |
![]() |
#3 |
Member
Location: Ann Arbor, MI Join Date: Oct 2008
Posts: 57
|
![]()
Thanks for posting.
|
![]() |
![]() |
![]() |
#4 |
Member
Location: United States of America Join Date: Mar 2011
Posts: 52
|
![]() |
![]() |
![]() |
![]() |
#5 | |
Member
Location: UK Join Date: Sep 2009
Posts: 20
|
![]() Quote:
thanks for the link. I couldn't get the patch to apply (though admittedly I'm not experienced with doing so). Is the command much different to "patch bowtie bowtie-gzip.patch" ? |
|
![]() |
![]() |
![]() |
#6 |
Junior Member
Location: Indianapolis, IN Join Date: Oct 2011
Posts: 1
|
![]()
I also attempted to use the patch, but it failed to apply properly. Is there a chance, since the patch was last updated in Jaunary 2012, the patch is incompatible with the newest version 0.12.8 (which was released in May 2012)?
If anyone has any ideas on how to apply the patch mention above to version 0.12.8, it would be greatly appreciated. Thanks. Erin |
![]() |
![]() |
![]() |
Tags |
bowtie |
Thread Tools | |
|
|