![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Is there a BED file format validator? Does a BED file have to be sorted position? | LauraSmith | Bioinformatics | 3 | 05-21-2013 12:54 PM |
Eland to Bed | __sequence | Illumina/Solexa | 0 | 06-08-2011 05:07 AM |
How to convert eland file to BED format | huo | Bioinformatics | 1 | 01-19-2011 11:43 AM |
Are Solexa GAPipeline v.1.0 ELAND results realignable with new v.1.4 ELAND module? | marlei | Bioinformatics | 1 | 10-15-2009 06:51 AM |
BED to ELAND/bowtie/MAQ ? | ewilbanks | Bioinformatics | 1 | 08-21-2009 01:25 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: ca Join Date: Feb 2008
Posts: 39
|
![]()
Can anybody share an algorithm that takes eland files and make bed files out of them?
Thanks Joseph |
![]() |
![]() |
![]() |
#2 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
If you don't get any help...post a few lines of the eland file and i can try to slap something together.
|
![]() |
![]() |
![]() |
#3 |
Member
Location: FL Join Date: Jun 2008
Posts: 19
|
![]()
ElandtoBed.jar at FindPeak package (http://www.bcgsc.ca/platform/bioinfo/software/findpeaks)
|
![]() |
![]() |
![]() |
#4 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
java...*shudders*
![]() |
![]() |
![]() |
![]() |
#5 |
Senior Member
Location: Oakland, California Join Date: Feb 2008
Posts: 236
|
![]()
Hey - Java's not THAT bad. I like that I can get 600%+ CPU usage with it, without any explicit multi-threading.
Anyhow, I have much better translators, now, in the Vancouver Short Read Analysis Package.... but they're still java. :P The manuals are a work in progress. If anyone would like to give them a try, I'll update that part of the manual.
__________________
The more you know, the more you know you don't know. —Aristotle |
![]() |
![]() |
![]() |
#6 |
Member
Location: EU Join Date: Jun 2011
Posts: 13
|
![]()
apfejes, could you please tell, what is the last downloadable version of your program, and what is an example of a command line to use it? I need to convert a single file (whole genome, not divided into chromosomes) as provided by the Eland export, convert it to .Bed
|
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: Kansas City Join Date: Mar 2008
Posts: 197
|
![]()
Here's a script from a colleague that I've used before.
Code:
#!/usr/bin/perl # Program to convert eland export format to BED format # Chris Seidel, June 2009 # # Requires tab delim file of chromosome or contig names # (eland fa match files) in the format: # UCSC_chr_name chr_length eland_name # corrects for alignments that go off the ends of the chrs # negative bases are trimmed to 1, # bases > chr_length are set to chr_length # (I know the former exist, I don't know if the latter exist) # results are not sorted, but can be sorted in linux by: # sort -o infile.bed -k 1,1 -k 2,2n infile.bed # (sort in place, first column, then by second column numeric) die("usage: $0 chrmap.txt eland_export.txt") unless(scalar(@ARGV) == 2); # create output filename $outfile = $ARGV[1]; $outfile =~ s/\.txt$/\.bed/; open(FOUT, ">$outfile") || die("can't open output file: $outfile"); # get info on chromosomes open(cmap, $ARGV[0]) || die("no chromosome name mapping file!"); %chrmap = {}; while($line = <cmap>){ chomp($line); ($newval, $size, $oldval) = split(/\t/, $line); $chrmap{$oldval} = $newval; $chrsize{$oldval} = $size; } # open input file open(fp, $ARGV[1]) || die("can't open eland file"); $lines = 0; while(<fp>){ chop; ++$lines; @bits = split(/\t/); # skip reads that didn't pass filtering next if($bits[21] eq "N"); # get match name $seqname = $bits[10]; # skip No Matches or QC failures # next if($seqname =~ /NM|QC/); # skip repeat matches # next if($seqname =~ /\d+:\d+:\d+/); # we're only interested in sequences that match our chrs next unless(exists($chrmap{$seqname})); $seqlen = length($bits[8]); $start = $bits[12]; $end = $start + $seqlen - 1; $strand = $bits[13]; # parse match descriptor $n = ($bits[14] =~ tr/[ACGTN]/[ACGTN]/); # skip reads beyond a certain threshold next if($n > 2); $read_code = "U".$n; # correct for alignments off the chromosome ends if( $start <= 0 ){ print STDERR "start less than or equal to 0: ", $start, "\n"; print STDERR join("\t", @bits), "\n"; $start = 1; } if($end > $chrsize{$seqname}){ print STDERR "end greater than chr end $chrsize{$seqname}: $end, diff: ", $end - $chrsize{$seqname}, "\n"; print STDERR join("\t", @bits), "\n"; $end = $chrsize{$seqname}; } if($strand eq "F"){ $strand = "+"; $color = "0,0,255"; } else{ $strand = "-"; $color = "255,0,0"; } $score = 0; print FOUT join("\t", $chrmap{$seqname}, $start, $end, $read_code, $score, $strand, $start, $end, $color), "\n"; # give some feedback print STDERR "$lines processed\n" if(!($lines % 100000)); } close(FOUT); print STDERR "output file: $outfile\n"; |
![]() |
![]() |
![]() |
#8 | |
Member
Location: EU Join Date: Jun 2011
Posts: 13
|
![]()
mgogol, Thank you for posting. I have tried this script, and it returns an empty file. May be the problem is that it requires an additional file as input?
Quote:
|
|
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Oakland, California Join Date: Feb 2008
Posts: 236
|
![]()
Sorry for the slow reply - I'm currently away at a conference.
The latest versions can be found as part of the Vancouver Short Read Analysis Package: http://vancouvershortr.sourceforge.net/ There should be several work flows here, depending on the starting format. http://sourceforge.net/apps/mediawik...e=ConvertToBed If that doesn't work for you, please let me know, and I'll provide more explicit information when I return to the office. Cheers
__________________
The more you know, the more you know you don't know. —Aristotle |
![]() |
![]() |
![]() |
#10 | |
Member
Location: EU Join Date: Jun 2011
Posts: 13
|
![]() Quote:
Thank you for your reply. I tried the following format: java -jar conversion_util/ConvertToBed.jar -aligner eland -input "input_dir/name" -output "output_dir" -name "name" -noprepend As a result I got the following error: Version: Initializing class ElandIterator $Revision: 2933 $ Error: Line 1 has an invalid read: Error: Mismatches is less than 0 Last edited by __sequence; 06-09-2011 at 04:22 AM. |
|
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: Oakland, California Join Date: Feb 2008
Posts: 236
|
![]()
You need to put the log file in a directory that exists and for which you have write permissions. If the directory you've given above does not exits, then it will not be able to create the log file.
__________________
The more you know, the more you know you don't know. —Aristotle |
![]() |
![]() |
![]() |
#12 |
Member
Location: EU Join Date: Jun 2011
Posts: 13
|
![]()
Oups, I just edited my post above. I figured out about the directory, but there is still an error:
Error: Line 1 has an invalid read: Error: Mismatches is less than 0 |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: Oakland, California Join Date: Feb 2008
Posts: 236
|
![]()
That is some serious spam above - and worse, copied from my own blog!
Anyhow, can you paste the first line of the file? There's probably something simple that's going wrong, eg, you haven't followed the work flow to remove unmapped reads. http://sourceforge.net/apps/mediawik...itle=WorkFlows
__________________
The more you know, the more you know you don't know. —Aristotle |
![]() |
![]() |
![]() |
#14 | |||
Member
Location: EU Join Date: Jun 2011
Posts: 13
|
![]() Quote:
Quote:
Here is how the first line looks like: Quote:
Last edited by __sequence; 06-09-2011 at 07:19 AM. |
|||
![]() |
![]() |
![]() |
#15 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
Sorry about the spam...the guy used an interesting strategy of spamming with Anthony's content to get past the filters. Grrrrrr.
|
![]() |
![]() |
![]() |
#16 |
Senior Member
Location: Oakland, California Join Date: Feb 2008
Posts: 236
|
![]()
Thanks ECO.. it was original. (=
In any case, I'll have a look at this on monday. I don't have time to look at it until I'm back in Canada. If you haven't heard from me by then, please send me a reminder.
__________________
The more you know, the more you know you don't know. —Aristotle |
![]() |
![]() |
![]() |
#17 |
Member
Location: Toronto Join Date: Aug 2008
Posts: 42
|
![]()
Talk about coincidence, I was working on a piece of code that would take Illumina TruSeq export files and merge them into a single bed file just this morning. It outputs only those reads that align uniquely and pass filtering and collapses potential PCR biases (via start point).
Here's some sample usage: rdeborja@hn1:~/tmp$ export2bed.pl --bed s_8_1_export.bed s_8_1_export.txt Processing file s_8_1_export.txt... Total collapsed unique aligned reads 656 Completed processing file s_8_1_export.txt Hope it helps! Code:
#!/usr/bin/perl # # Created by Richard de Borja on 2011-06-10. # Copyright (C) 2011 The Ontario Institute for Cancer Research # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 3 of the License, or (at # your option) any later version. # # This program is distributed in the hope that it will be useful, but # WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, # MA 02110-1301, USA. # # $Id export2bed.pl rdeborja$ use strict; use Carp; use Getopt::Long; use Pod::Usage; use Data::Dumper; # command line default arguments our %opts = ( # list of arguments and default values "bed" => undef, "length" => 101, ); our %_exportfield = ( # list of columns for export file # See CASAVA 1.7 User Guide page 61 "machine" => 0, "run_number" => 1, "lane" => 2, "tile" => 3, "x_coord_cluster" => 4, "y_coord_cluster" => 5, "index_seq" => 6, "read_number" => 7, "sequence" => 8, "quality" => 9, "match_chr" => 10, "match_contig" => 11, "match_position" => 12, "match_strand" => 13, "match_desc" => 14, "single_read_score" => 15, "paired_read_score" => 16, "partner_chr" => 17, "partner_contig" => 18, "partner_offset" => 19, "partner_strand" => 20, "filtering" => 21, ); my $result = main(); exit $result; ############################## # Main ############################## sub main { # get command line arguments GetOptions( \%opts, "help|?", "man", "bed=s", "length:i", ) or pod2usage(64); # 64=>EX_USAGE pod2usage(1) if $opts{'help'}; pod2usage(-exitstatus => 0, -verbose => 2) if $opts{'man'}; while(my ($arg, $value) = each(%opts)) { if(!defined $value) { print "ERROR: Missing argument $arg\n"; pod2usage(128); } } my $read_length = $opts{'length'}; my %unique_read_hash; my $output_file = $opts{'bed'}; foreach my $file (@ARGV) { print "Processing file $file...\n"; my $processed_export = process_export_file($file, $read_length, \%unique_read_hash, $output_file); print "Completed processing file $file\n"; } my $create_bed_file = print_bed_file($output_file, \%unique_read_hash); return 0; } ############################## # Functions ############################## sub process_export_file { my $export_file = shift; my $read_length = shift; my $unique_read_hash = shift; my $output_file; my $output_file = $export_file; $output_file =~ s/txt/bed/; open(my $ifh, '<', $export_file) or croak "Cannot open file $export_file: $!"; while(my $line = <$ifh>) { $line =~ s/^\s+//; $line =~ s/\s+$//; my @input_line = split(/\t/, $line); my $chr = $input_line[$_exportfield{'match_chr'}]; my $start = $input_line[$_exportfield{'match_position'}]; my $end = $start + $read_length - 1; my $pf = $input_line[$_exportfield{'filtering'}]; # Skip any reads that do not align, do not uniquely align, or pass # the Illumina filtering if (($chr =~ /chr/) && ($pf eq "Y")) { $chr =~ s/\.fa//g; my $bed_line = join("_", $chr, $start, $end); $unique_read_hash->{$bed_line} = 1; } else { next; } } close($ifh); #my $create_bed_file = print_bed_file($output_file, \$unique_read_hash); my $count = scalar(keys(%$unique_read_hash)); print "Total collapsed unique aligned reads $count\n"; return 0; } sub print_bed_file { my $bed_file = shift; my $read_hash = shift; open(my $ofh, '>', $bed_file) or croak "Cannot open file $bed_file"; foreach my $bed_entry (sort(keys(%$read_hash))) { my ($chr, $start, $end) = split("_", $bed_entry); print {$ofh} "$chr\t$start\t$end\n"; } close($ofh); } __END__ =head1 NAME export2bed.pl =head1 SYNOPSIS B<export2bed.pl> [options] [file ...] Options: --help brief help message --man full documentation --bed name of output BED file --length read length (optional: default 101bp) =head1 OPTIONS =over 8 =item B<--help> Print a brief help message and exit. =item B<--man> Print the manual page. =item B<--bed> Name of output BED file =item B<--length> Read length (optional, default is 101bp) =back =head1 DESCRIPTION B<export2bed.pl> convert an Illumina export.txt file to a BED file =head1 EXAMPLES export2bed.pl --bed out.bed --length 101 s_1_1_export.txt s_1_2_export.txt =head1 AUTHORS The Ontario Institute for Cancer Research =head1 SEE ALSO =cut |
![]() |
![]() |
![]() |
#18 |
Senior Member
Location: Kansas City Join Date: Mar 2008
Posts: 197
|
![]()
Okay, I rewrote it a little. It now takes a fasta index file (produced by samtools faidx genome.fa)
Of course, now you're getting a script hacked away at by two people.... ![]() perl export2bed.pl mm9_full.fa.fai s_1_1_export.txt > s_1_1_export.bed Code:
#!/usr/bin/perl # Program to convert eland export format to BED format # for running MACS # Chris Seidel, June 2009 # # Requires tab delim file of chromosome names in the format: # chr_name chr_length # corrects for alignments that go off the ends of the chrs # negative bases are trimmed to 1, # bases > chr_length are set to chr_length # (I know the former exist, I don't know if the latter exist) # results are not sorted, but can be sorted in linux by: # sort -o infile.bed -k1,1 -k2,2n infile.bed # (sort in place, first column, then by second column numeric) use File::Basename; die("usage: $0 chrmap.txt eland_export.txt") unless(scalar(@ARGV) == 2); # get info on chromosomes open(cmap, $ARGV[0]) || die("no chromosome length file!"); %chrmap = {}; while($line = <cmap>){ chomp($line); ($chrom, $size) = split(/\t/, $line); $chrmap{$chrom} = $chrom; $chrsize{$chrom} = $size; } # open input file open(fp, $ARGV[1]) || die("can't open eland file"); $lines = 0; while(<fp>){ chop; ++$lines; @bits = split(/\t/); # skip reads that didn't pass filtering next if($bits[21] eq "N"); # get match name $seqname = $bits[10]; # skip No Matches or QC failures # next if($seqname =~ /NM|QC/); # skip repeat matches # next if($seqname =~ /\d+:\d+:\d+/); # we're only interested in sequences that match our chrs $chrom = $bits[10]; $seqlen = length($bits[8]); next unless(exists($chrmap{$chrom})); $start = $bits[12]; $end = $start + $seqlen - 1; $strand = $bits[13]; #print "start:$start end:$end strand:$strand\n"; # parse match descriptor $n = ($bits[14] =~ tr/[ACGTN]/[ACGTN]/); # skip reads beyond a certain threshold next if($n > 2); $read_code = "U".$n; # correct for alignments off the chromosome ends if( $start <= 0 ){ print STDERR "start less than or equal to 0: ", $start, "\n"; print STDERR join("\t", @bits), "\n"; $start = 1; } if($end > $chrsize{$seqname}){ print STDERR "end greater than chr end $chrsize{$seqname}: $end, diff: ", $end - $chrsize{$seqname}, "\n"; print STDERR join("\t", @bits), "\n"; $end = $chrsize{$seqname}; } if($strand eq "F"){ $strand = "+"; $color = "0,0,255"; } else{ $strand = "-"; $color = "255,0,0"; } $score = 1; $name = join('',("@",$bits[0],"_",$bits[1],":",$bits[2],":",$bits[3],":",$bits[4],":",$bits[5],"#",$bits[6],"/",$bits[7])); my $chr = $chrmap{$seqname}; $chr =~ s/\.fa$//g; print join("\t", $chr, $start, $end, $name, $score, $strand), "\n"; # give some feedback print STDERR "$lines processed\n" if(!($lines % 100000)); } |
![]() |
![]() |
![]() |
#19 | |
Junior Member
Location: China Join Date: Apr 2012
Posts: 5
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#20 |
Junior Member
Location: China Join Date: Apr 2012
Posts: 5
|
![]() |
![]() |
![]() |
![]() |
Thread Tools | |
|
|