First, does anybody have a script that will trim reads that map off the ends of chromosomes in SAM files?
Second, in case anyone needs a perl script that will 'take care' of reads that map off the ends of chromosomes in BED files I wrote one. Seems like it works. Maybe trivial to some but not bad for a salty old bench scientist. I think it works but I am no scripting expert. If anyone has reason to believe otherwise I'd like to know.
Second, in case anyone needs a perl script that will 'take care' of reads that map off the ends of chromosomes in BED files I wrote one. Seems like it works. Maybe trivial to some but not bad for a salty old bench scientist. I think it works but I am no scripting expert. If anyone has reason to believe otherwise I'd like to know.
Code:
#/usr/bin/perl
use strict; use warnings;
#bedEndRepair.pl by Ethan Ford. Use at own risk!!!!!
#bedEndRepair.pl: Takes reads that map off the end of chromosomes in a bed file and repositions them as a two nucleotide read at
# end of the chromosome.
#Usage: 1) make a fasta index file with samtools with the following command: 'samtools faidx <ref.fasta>' or
# make a 2 column tab deliminted text file. The first column contains the chromosome name written as it is in column 1 of your bed file, e.g. chr1,
# and the second column has the length of the corresponding chromosome (and integer)
# 2) type: "perl bedEndRepair.pl path/to/your/indexfile.fai path/to/your/bedfile.bed" into the command line.
# 3) Output is saved in your current working dirctory with the name yourbedfile.repaired.bed
my $outfile = $ARGV[1];
$outfile =~ s/\.bed$/\.repaired.bed/;
open(BEDOUT, ">$outfile") or die("Failed to open output file");
open(INDEXFILE, $ARGV[0]) or die "Failed to open bed file";
open(BEDFILE, $ARGV[1]) or die "Failed to open index file";
# turns indexfile.fai into a hash
my %indexhash;
while (<INDEXFILE>) {
chomp;
my ($indexchr, $indexlength) = split("\t", $_);
$indexhash{$indexchr} = $indexlength;
};
while (<BEDFILE>) {
chomp;
(my ($chr), my ($start), my ($stop), my ($c4), my ($c5), my ($strand)) = split("\t");
if ($start < 1)
{print BEDOUT $chr, "\t", '1', "\t", '2', "\t", $c4, "\t", $c5, "\t", $strand, "\n";}
elsif ($stop > $indexhash{"$chr"})
{print BEDOUT $chr, "\t", $indexhash{"$chr"}-1, "\t", $indexhash{"$chr"}, "\t", $c4, "\t", $c5, "\t", $strand, "\n";}
else
{print BEDOUT $chr, "\t", $start, "\t", $stop, "\t", $c4, "\t", $c5, "\t", $strand, "\n";}
};
close (BEDFILE); close (INDEXFILE); close (BEDOUT);
exit;