First, does anybody have a script that will trim reads that map off the ends of chromosomes in SAM files?
Second, in case anyone needs a perl script that will 'take care' of reads that map off the ends of chromosomes in BED files I wrote one. Seems like it works. Maybe trivial to some but not bad for a salty old bench scientist. I think it works but I am no scripting expert. If anyone has reason to believe otherwise I'd like to know.
Second, in case anyone needs a perl script that will 'take care' of reads that map off the ends of chromosomes in BED files I wrote one. Seems like it works. Maybe trivial to some but not bad for a salty old bench scientist. I think it works but I am no scripting expert. If anyone has reason to believe otherwise I'd like to know.
Code:
#/usr/bin/perl use strict; use warnings; #bedEndRepair.pl by Ethan Ford. Use at own risk!!!!! #bedEndRepair.pl: Takes reads that map off the end of chromosomes in a bed file and repositions them as a two nucleotide read at # end of the chromosome. #Usage: 1) make a fasta index file with samtools with the following command: 'samtools faidx <ref.fasta>' or # make a 2 column tab deliminted text file. The first column contains the chromosome name written as it is in column 1 of your bed file, e.g. chr1, # and the second column has the length of the corresponding chromosome (and integer) # 2) type: "perl bedEndRepair.pl path/to/your/indexfile.fai path/to/your/bedfile.bed" into the command line. # 3) Output is saved in your current working dirctory with the name yourbedfile.repaired.bed my $outfile = $ARGV[1]; $outfile =~ s/\.bed$/\.repaired.bed/; open(BEDOUT, ">$outfile") or die("Failed to open output file"); open(INDEXFILE, $ARGV[0]) or die "Failed to open bed file"; open(BEDFILE, $ARGV[1]) or die "Failed to open index file"; # turns indexfile.fai into a hash my %indexhash; while (<INDEXFILE>) { chomp; my ($indexchr, $indexlength) = split("\t", $_); $indexhash{$indexchr} = $indexlength; }; while (<BEDFILE>) { chomp; (my ($chr), my ($start), my ($stop), my ($c4), my ($c5), my ($strand)) = split("\t"); if ($start < 1) {print BEDOUT $chr, "\t", '1', "\t", '2', "\t", $c4, "\t", $c5, "\t", $strand, "\n";} elsif ($stop > $indexhash{"$chr"}) {print BEDOUT $chr, "\t", $indexhash{"$chr"}-1, "\t", $indexhash{"$chr"}, "\t", $c4, "\t", $c5, "\t", $strand, "\n";} else {print BEDOUT $chr, "\t", $start, "\t", $stop, "\t", $c4, "\t", $c5, "\t", $strand, "\n";} }; close (BEDFILE); close (INDEXFILE); close (BEDOUT); exit;