Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bowtie wrapper script

    I was using the following script to launch bowtie. I think it may be useful to others and so post it here. This wrapper script uses bowtie internally, but does a little bit more things which are at least convenient to me:

    1) Take gzip'ed fastq files as input. (Most fastq files are compressed)

    2) For paired-end reads, align singletons in bowtie's single-end mode. (This saves a bowtie command line typed by yourself)

    3) Fix a minor problem in SAM out. (Names of two ends in a pair should be identical; otherwise duplicate removal may have trouble)

    4) More accurate (albeit slower) command line option: [--best --strata] -m 1. (Recommended by a users who uses bowtie a lot)

    Please reply to this thread if you think the command line option is not appropriate, or the script is buggy/can be improved. Thanks.

    Code:
    #!/usr/bin/perl -w
    
    # Contact: lh3
    # Last modified: 05NOV2009
    
    use strict;
    use warnings;
    use Cwd qw/getcwd abs_path/;
    use File::Temp qw/tempfile/;
    
    die("Usage: run_bowtie.pl <bowtieIndex> <read1.fq> [read2.fq] [options] | gzip > aln.sam.gz\n") if (@ARGV < 2 || $ARGV[0] =~ /^-/ || $ARGV[1] =~ /^-/);
    
    my $idx = shift(@ARGV);
    my $rd1 = (@ARGV && $ARGV[0] !~ /^-/)? shift(@ARGV) : '-';
    my $rd2 = (@ARGV && $ARGV[0] !~ /^-/)? shift(@ARGV) : '';
    
    # set bowtie cmd options
    my $opt0 = join(" ", @ARGV);
    $opt0 =~ s/--maxins/-X/;
    $opt0 =~ s/--minins/-I/;
    $opt0 =~ s/--un \S+\b//;
    
    # locate bowtie
    my $bowtie = &gwhich('bowtie');
    die("[run_bowtie] fail to find the bowtie executable.\n") unless ($bowtie);
    
    my ($fh1, $fh2);
    open($fh1, ($rd1 =~ /\.gz$/)? "gzip -dc $rd1 |" : $rd1) || die;
    if ($rd2) {
      open($fh2, ($rd2 =~ /\.gz$/)? "gzip -dc $rd2 |" : $rd2) || die;
    }
    
    if ($fh2) { # paired-end mode
      my ($fh, $fn_ump, @col);
      if ($opt0 !~ /-X/) {
    	warn("[run_bowtie] the maximum insert size is set as the default value (250).\n");
      }
      (undef, $fn_ump) = tempfile("./rb-$$-XXXXXX");
      open($fh, qq/| $bowtie $opt0 -S -m 1 $idx --12 -/
    	   . q/ | awk '{if($3!="*"||$12!="XM:i:0")print;else print $1"\t"$10"\t"$11>/ . qq/"$fn_ump"}'/ . q{ | perl -pe 's/^(\S+)\/[12]\t/$1\t/'}) || die;
      while (<$fh1>) {
    	$col[0] = $1 if (/^@(\S+)/);
    	$col[1] = <$fh1>; chomp($col[1]); <$fh1>;
    	$col[2] = <$fh1>; chomp($col[2]);
    	<$fh2>;
    	$col[3] = <$fh2>; chomp($col[3]); <$fh2>;
    	$col[4] = <$fh2>; chomp($col[4]);
    	print $fh join("\t", @col), "\n";
      }
      close($fh); close($fh1); close($fh2);
      # map singletons
      my $opt_se = $opt0;
      $opt_se =~ s/-X\s*\d+//;
      $opt_se =~ s/-I\s*\d+//;
      system(qq/$bowtie $opt_se -S -m 1 --best --strata --sam-nohead $idx --12 $fn_ump | awk '\$3!="*"||\$12!="XM:i:0"'/);
      unlink($fn_ump);
    } else { # single-end mode
      my ($fh, @col);
      open($fh, qq/| $bowtie $opt0 -S -m 1 --best --strata $idx --12 - | awk '\$3!="*"||\$12!="XM:i:0"'/) || die;
      while (<$fh1>) {
    	$col[0] = $1 if (/^@(\S+)/);
    	$col[1] = <$fh1>; chomp($col[1]); <$fh1>;
    	$col[2] = <$fh1>; chomp($col[2]);
    	print $fh join("\t", @col), "\n";
      }
      close($fh); close($fh1);
    }
    
    # routines to locate an executable
    
    sub dirname {
      my $prog = shift;
      my $cwd = getcwd;
      return $cwd if ($prog !~ /\//);
      $prog =~ s/\/[^\s\/]+$//g;
      return $prog;
    }
    
    sub which {
      my $file = shift;
      my $path = (@_)? shift : $ENV{PATH};
      return if (!defined($path));
      foreach my $x (split(":", $path)) {
    	$x =~ s/\/$//;
    	return "$x/$file" if (-x "$x/$file" && -f "$x/$file");
      }
      return;
    }
    
    sub gwhich {
      my $progname = shift;
      my $dirname = &dirname($0);
      my $tmp;
      chomp($dirname);
      if (-x $progname && -f $progname) {
    	return abs_path($progname);
      } elsif (defined($dirname) && (-x "$dirname/$progname" && -f "$dirname/$progname")) {
    	return abs_path("$dirname/$progname");
      } elsif (($tmp = &which($progname))) { # on the $PATH
    	return $tmp;
      } else {
    	warn("[gwhich] fail to find executable $progname.\n");
    	return;
      }
    }

  • #2
    The wrapper is indeed a good one to use compressed input files.
    I tried to use it with SOLiD reads. Thanks for posting.
    However, I found that, it can be useful only if the input is in fastq format. If we have separate compressed (cs)fasta and qual files, then, there is no way I can supply them simultaneously on command line input.

    I hope I did not miss a possible way of using compressed (cs)fasta and qual files simultaneously with the wrapper script. Or are there any other methods by which you can deal with this?

    Comment


    • #3
      Thanks for posting.

      Comment


      • #5
        Originally posted by earonesty View Post
        Hi earonesty,
        thanks for the link. I couldn't get the patch to apply (though admittedly I'm not experienced with doing so). Is the command much different to "patch bowtie bowtie-gzip.patch" ?

        Comment


        • #6
          I also attempted to use the patch, but it failed to apply properly. Is there a chance, since the patch was last updated in Jaunary 2012, the patch is incompatible with the newest version 0.12.8 (which was released in May 2012)?

          If anyone has any ideas on how to apply the patch mention above to version 0.12.8, it would be greatly appreciated.

          Thanks.

          Erin

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Advancing Precision Medicine for Rare Diseases in Children
            by seqadmin




            Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
            12-16-2024, 07:57 AM
          • seqadmin
            Recent Advances in Sequencing Technologies
            by seqadmin



            Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

            Long-Read Sequencing
            Long-read sequencing has seen remarkable advancements,...
            12-02-2024, 01:49 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 12-17-2024, 10:28 AM
          0 responses
          26 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-13-2024, 08:24 AM
          0 responses
          42 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-12-2024, 07:41 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 12-11-2024, 07:45 AM
          0 responses
          42 views
          0 likes
          Last Post seqadmin  
          Working...
          X