Seqanswers Leaderboard Ad

**ECO** · 10-05-2008, 07:26 PM

If you don't get any help...post a few lines of the eland file and i can try to slap something together.

**jlli** · 10-05-2008, 07:55 PM

ElandtoBed.jar at FindPeak package (http://www.bcgsc.ca/platform/bioinfo/software/findpeaks)

**ECO** · 10-05-2008, 07:58 PM

java...*shudders*

**apfejes** · 10-05-2008, 08:23 PM

Hey - Java's not THAT bad. I like that I can get 600%+ CPU usage with it, without any explicit multi-threading.

Anyhow, I have much better translators, now, in the Vancouver Short Read Analysis Package.... but they're still java. :P

The manuals are a work in progress. If anyone would like to give them a try, I'll update that part of the manual.

**__sequence** · 06-08-2011, 04:14 AM

apfejes, could you please tell, what is the last downloadable version of your program, and what is an example of a command line to use it? I need to convert a single file (whole genome, not divided into chromosomes) as provided by the Eland export, convert it to .Bed

**mgogol** · 06-08-2011, 06:46 AM

perl script export2bed.pl

Here's a script from a colleague that I've used before.

Code:

#!/usr/bin/perl
# Program to convert eland export format to BED format
# Chris Seidel, June 2009
#
# Requires tab delim file of chromosome or contig names 
# (eland fa match files) in the format:
# UCSC_chr_name chr_length eland_name
# corrects for alignments that go off the ends of the chrs
# negative bases are trimmed to 1, 
# bases > chr_length are set to chr_length
# (I know the former exist, I don't know if the latter exist)
# results are not sorted, but can be sorted in linux by:
# sort -o infile.bed -k 1,1 -k 2,2n infile.bed
# (sort in place, first column, then by second column numeric)

die("usage: $0 chrmap.txt eland_export.txt") unless(scalar(@ARGV) == 2);

# create output filename
$outfile = $ARGV[1];
$outfile =~ s/\.txt$/\.bed/;
open(FOUT, ">$outfile") || die("can't open output file: $outfile");

# get info on chromosomes
open(cmap, $ARGV[0]) || die("no chromosome name mapping file!");
%chrmap = {};
while($line = <cmap>){
    chomp($line);
    ($newval, $size, $oldval) = split(/\t/, $line);
    $chrmap{$oldval} = $newval;
    $chrsize{$oldval} = $size;
}

# open input file
open(fp, $ARGV[1]) || die("can't open eland file");

$lines = 0;
while(<fp>){
    chop;
    ++$lines;
    @bits = split(/\t/);
    # skip reads that didn't pass filtering
    next if($bits[21] eq "N");
    # get match name
    $seqname = $bits[10];
    # skip No Matches or QC failures
    # next if($seqname =~ /NM|QC/);
    # skip repeat matches
    # next if($seqname =~ /\d+:\d+:\d+/);
    # we're only interested in sequences that match our chrs
    next unless(exists($chrmap{$seqname}));

    $seqlen = length($bits[8]);
    $start = $bits[12];
    $end = $start + $seqlen - 1;
    $strand = $bits[13];

    # parse match descriptor
    $n = ($bits[14] =~ tr/[ACGTN]/[ACGTN]/);
    # skip reads beyond a certain threshold
    next if($n > 2);
    $read_code = "U".$n;

    # correct for alignments off the chromosome ends
    if( $start <= 0 ){
        print STDERR "start less than or equal to 0:   ", $start, "\n";
        print STDERR join("\t", @bits), "\n";
        $start = 1;
    }

    if($end > $chrsize{$seqname}){
        print STDERR "end greater than chr end $chrsize{$seqname}:   $end, diff: ", $end - $chrsize{$seqname}, "\n";
        print STDERR join("\t", @bits), "\n";
        $end = $chrsize{$seqname};
    }

    if($strand eq "F"){
        $strand = "+";
        $color = "0,0,255";
    }
    else{
        $strand = "-";
        $color = "255,0,0";
    }

    $score = 0;
    print FOUT join("\t", $chrmap{$seqname}, $start, $end, $read_code, $score, $strand, $start, $end, $color), "\n";

    # give some feedback
    print STDERR "$lines processed\n" if(!($lines % 100000));
}

close(FOUT);
print STDERR "output file: $outfile\n";

**__sequence** · 06-08-2011, 07:00 AM

mgogol, Thank you for posting. I have tried this script, and it returns an empty file. May be the problem is that it requires an additional file as input?

Requires tab delim file of chromosome or contig names
# (eland fa match files) in the format:
# UCSC_chr_name chr_length eland_name

I have all chromosomes in one input file. So I still need to create this additional file with chromosome names? Not clear how.

**apfejes** · 06-08-2011, 10:07 PM

Sorry for the slow reply - I'm currently away at a conference.

The latest versions can be found as part of the Vancouver Short Read Analysis Package: http://vancouvershortr.sourceforge.net/

There should be several work flows here, depending on the starting format.

Vancouver Short Read Analysis Package

http://sourceforge.net/apps/mediawiki/vancouvershortr/index.php?title=ConvertToBed

Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.

If that doesn't work for you, please let me know, and I'll provide more explicit information when I return to the office.
Cheers

**__sequence** · 06-09-2011, 03:03 AM

Originally posted by apfejes View Post

The latest versions can be found as part of the Vancouver Short Read Analysis Package: http://vancouvershortr.sourceforge.net/

There should be several work flows here, depending on the starting format.

Vancouver Short Read Analysis Package

http://sourceforge.net/apps/mediawiki/vancouvershortr/index.php?title=ConvertToBed

Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.

If that doesn't work for you, please let me know, and I'll provide more explicit information when I return to the office.

Hi apfejes,

Thank you for your reply. I tried the following format: java -jar conversion_util/ConvertToBed.jar -aligner eland -input "input_dir/name" -output "output_dir" -name "name" -noprepend

As a result I got the following error:

Version: Initializing class ElandIterator $Revision: 2933 $
Error: Line 1 has an invalid read:
Error: Mismatches is less than 0

**apfejes** · 06-09-2011, 03:30 AM

You need to put the log file in a directory that exists and for which you have write permissions. If the directory you've given above does not exits, then it will not be able to create the log file.

**__sequence** · 06-09-2011, 03:39 AM

Oups, I just edited my post above. I figured out about the directory, but there is still an error:
Error: Line 1 has an invalid read:
Error: Mismatches is less than 0

**apfejes** · 06-09-2011, 05:00 AM

That is some serious spam above - and worse, copied from my own blog!

Anyhow, can you paste the first line of the file? There's probably something simple that's going wrong, eg, you haven't followed the work flow to remove unmapped reads.

Vancouver Short Read Analysis Package

http://sourceforge.net/apps/mediawiki/vancouvershortr/index.php?title=WorkFlows

Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.

**__sequence** · 06-09-2011, 06:02 AM

Originally posted by apfejes View Post

That is some serious spam above - and worse, copied from my own blog!

Anyhow, can you paste the first line of the file? There's probably something simple that's going wrong, eg, you haven't followed the work flow to remove unmapped reads.

http://sourceforge.net/apps/mediawik...itle=WorkFlows

I tried using

grep “U[012]” Input.eland > Input.um.eland

as suggested at your manual, but it returns an empty file. May be my file is not in a standard Eland export format? I have been told that this is the output from Eland.

Here is how the first line looks like:

XX-XXXX01 21 1 1 1065 918 0 1 NTCAAAAACCCAGCGAACATCATTCTTTGGCTAGGG BMMMNVWTTVb____b_bb__b_b__bQQ______Y chr7.fa 128316527 R A35 111 Y

Is it Eland Export format?

**ECO** · 06-09-2011, 06:10 AM

Sorry about the spam...the guy used an interesting strategy of spamming with Anthony's content to get past the filters. Grrrrrr.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 25 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Eland-to-Bed algorithm

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News