Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • joseph
    Member
    • Feb 2008
    • 39

    Eland-to-Bed algorithm

    Can anybody share an algorithm that takes eland files and make bed files out of them?
    Thanks
    Joseph
  • ECO
    --Site Admin--
    • Oct 2007
    • 1360

    #2
    If you don't get any help...post a few lines of the eland file and i can try to slap something together.

    Comment

    • jlli
      Member
      • Jun 2008
      • 19

      #3
      ElandtoBed.jar at FindPeak package (http://www.bcgsc.ca/platform/bioinfo/software/findpeaks)

      Comment

      • ECO
        --Site Admin--
        • Oct 2007
        • 1360

        #4
        java...*shudders*

        Comment

        • apfejes
          Senior Member
          • Feb 2008
          • 236

          #5
          Hey - Java's not THAT bad. I like that I can get 600%+ CPU usage with it, without any explicit multi-threading.

          Anyhow, I have much better translators, now, in the Vancouver Short Read Analysis Package.... but they're still java. :P

          The manuals are a work in progress. If anyone would like to give them a try, I'll update that part of the manual.
          The more you know, the more you know you don't know. —Aristotle

          Comment

          • __sequence
            Member
            • Jun 2011
            • 13

            #6
            apfejes, could you please tell, what is the last downloadable version of your program, and what is an example of a command line to use it? I need to convert a single file (whole genome, not divided into chromosomes) as provided by the Eland export, convert it to .Bed

            Comment

            • mgogol
              Senior Member
              • Mar 2008
              • 197

              #7
              perl script export2bed.pl

              Here's a script from a colleague that I've used before.

              Code:
              #!/usr/bin/perl
              # Program to convert eland export format to BED format
              # Chris Seidel, June 2009
              #
              # Requires tab delim file of chromosome or contig names 
              # (eland fa match files) in the format:
              # UCSC_chr_name chr_length eland_name
              # corrects for alignments that go off the ends of the chrs
              # negative bases are trimmed to 1, 
              # bases > chr_length are set to chr_length
              # (I know the former exist, I don't know if the latter exist)
              # results are not sorted, but can be sorted in linux by:
              # sort -o infile.bed -k 1,1 -k 2,2n infile.bed
              # (sort in place, first column, then by second column numeric)
              
              die("usage: $0 chrmap.txt eland_export.txt") unless(scalar(@ARGV) == 2);
              
              # create output filename
              $outfile = $ARGV[1];
              $outfile =~ s/\.txt$/\.bed/;
              open(FOUT, ">$outfile") || die("can't open output file: $outfile");
              
              # get info on chromosomes
              open(cmap, $ARGV[0]) || die("no chromosome name mapping file!");
              %chrmap = {};
              while($line = <cmap>){
                  chomp($line);
                  ($newval, $size, $oldval) = split(/\t/, $line);
                  $chrmap{$oldval} = $newval;
                  $chrsize{$oldval} = $size;
              }
              
              # open input file
              open(fp, $ARGV[1]) || die("can't open eland file");
              
              $lines = 0;
              while(<fp>){
                  chop;
                  ++$lines;
                  @bits = split(/\t/);
                  # skip reads that didn't pass filtering
                  next if($bits[21] eq "N");
                  # get match name
                  $seqname = $bits[10];
                  # skip No Matches or QC failures
                  # next if($seqname =~ /NM|QC/);
                  # skip repeat matches
                  # next if($seqname =~ /\d+:\d+:\d+/);
                  # we're only interested in sequences that match our chrs
                  next unless(exists($chrmap{$seqname}));
              
                  $seqlen = length($bits[8]);
                  $start = $bits[12];
                  $end = $start + $seqlen - 1;
                  $strand = $bits[13];
              
                  # parse match descriptor
                  $n = ($bits[14] =~ tr/[ACGTN]/[ACGTN]/);
                  # skip reads beyond a certain threshold
                  next if($n > 2);
                  $read_code = "U".$n;
              
                  # correct for alignments off the chromosome ends
                  if( $start <= 0 ){
                      print STDERR "start less than or equal to 0:   ", $start, "\n";
                      print STDERR join("\t", @bits), "\n";
                      $start = 1;
                  }
              
                  if($end > $chrsize{$seqname}){
                      print STDERR "end greater than chr end $chrsize{$seqname}:   $end, diff: ", $end - $chrsize{$seqname}, "\n";
                      print STDERR join("\t", @bits), "\n";
                      $end = $chrsize{$seqname};
                  }
              
                  if($strand eq "F"){
                      $strand = "+";
                      $color = "0,0,255";
                  }
                  else{
                      $strand = "-";
                      $color = "255,0,0";
                  }
              
                  $score = 0;
                  print FOUT join("\t", $chrmap{$seqname}, $start, $end, $read_code, $score, $strand, $start, $end, $color), "\n";
              
                  # give some feedback
                  print STDERR "$lines processed\n" if(!($lines % 100000));
              }
              
              close(FOUT);
              print STDERR "output file: $outfile\n";

              Comment

              • __sequence
                Member
                • Jun 2011
                • 13

                #8
                mgogol, Thank you for posting. I have tried this script, and it returns an empty file. May be the problem is that it requires an additional file as input?

                Requires tab delim file of chromosome or contig names
                # (eland fa match files) in the format:
                # UCSC_chr_name chr_length eland_name
                I have all chromosomes in one input file. So I still need to create this additional file with chromosome names? Not clear how.

                Comment

                • apfejes
                  Senior Member
                  • Feb 2008
                  • 236

                  #9
                  Sorry for the slow reply - I'm currently away at a conference.

                  The latest versions can be found as part of the Vancouver Short Read Analysis Package: http://vancouvershortr.sourceforge.net/

                  There should be several work flows here, depending on the starting format.

                  Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.


                  If that doesn't work for you, please let me know, and I'll provide more explicit information when I return to the office.
                  Cheers
                  The more you know, the more you know you don't know. —Aristotle

                  Comment

                  • __sequence
                    Member
                    • Jun 2011
                    • 13

                    #10
                    Originally posted by apfejes View Post
                    The latest versions can be found as part of the Vancouver Short Read Analysis Package: http://vancouvershortr.sourceforge.net/

                    There should be several work flows here, depending on the starting format.

                    Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.


                    If that doesn't work for you, please let me know, and I'll provide more explicit information when I return to the office.
                    Hi apfejes,

                    Thank you for your reply. I tried the following format: java -jar conversion_util/ConvertToBed.jar -aligner eland -input "input_dir/name" -output "output_dir" -name "name" -noprepend

                    As a result I got the following error:

                    Version: Initializing class ElandIterator $Revision: 2933 $
                    Error: Line 1 has an invalid read:
                    Error: Mismatches is less than 0
                    Last edited by __sequence; 06-09-2011, 03:22 AM.

                    Comment

                    • apfejes
                      Senior Member
                      • Feb 2008
                      • 236

                      #11
                      You need to put the log file in a directory that exists and for which you have write permissions. If the directory you've given above does not exits, then it will not be able to create the log file.
                      The more you know, the more you know you don't know. —Aristotle

                      Comment

                      • __sequence
                        Member
                        • Jun 2011
                        • 13

                        #12
                        Oups, I just edited my post above. I figured out about the directory, but there is still an error:
                        Error: Line 1 has an invalid read:
                        Error: Mismatches is less than 0

                        Comment

                        • apfejes
                          Senior Member
                          • Feb 2008
                          • 236

                          #13
                          That is some serious spam above - and worse, copied from my own blog!

                          Anyhow, can you paste the first line of the file? There's probably something simple that's going wrong, eg, you haven't followed the work flow to remove unmapped reads.

                          Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.
                          The more you know, the more you know you don't know. —Aristotle

                          Comment

                          • __sequence
                            Member
                            • Jun 2011
                            • 13

                            #14
                            Originally posted by apfejes View Post
                            That is some serious spam above - and worse, copied from my own blog!

                            Anyhow, can you paste the first line of the file? There's probably something simple that's going wrong, eg, you haven't followed the work flow to remove unmapped reads.

                            http://sourceforge.net/apps/mediawik...itle=WorkFlows
                            I tried using
                            grep “U[012]” Input.eland > Input.um.eland
                            as suggested at your manual, but it returns an empty file. May be my file is not in a standard Eland export format? I have been told that this is the output from Eland.

                            Here is how the first line looks like:

                            XX-XXXX01 21 1 1 1065 918 0 1 NTCAAAAACCCAGCGAACATCATTCTTTGGCTAGGG BMMMNVWTTVb____b_bb__b_b__bQQ______Y chr7.fa 128316527 R A35 111 Y
                            Is it Eland Export format?
                            Last edited by __sequence; 06-09-2011, 06:19 AM.

                            Comment

                            • ECO
                              --Site Admin--
                              • Oct 2007
                              • 1360

                              #15
                              Sorry about the spam...the guy used an interesting strategy of spamming with Anthony's content to get past the filters. Grrrrrr.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...