Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • joseph
    Member
    • Feb 2008
    • 39

    Eland-to-Bed algorithm

    Can anybody share an algorithm that takes eland files and make bed files out of them?
    Thanks
    Joseph
  • ECO
    --Site Admin--
    • Oct 2007
    • 1360

    #2
    If you don't get any help...post a few lines of the eland file and i can try to slap something together.

    Comment

    • jlli
      Member
      • Jun 2008
      • 19

      #3
      ElandtoBed.jar at FindPeak package (http://www.bcgsc.ca/platform/bioinfo/software/findpeaks)

      Comment

      • ECO
        --Site Admin--
        • Oct 2007
        • 1360

        #4
        java...*shudders*

        Comment

        • apfejes
          Senior Member
          • Feb 2008
          • 236

          #5
          Hey - Java's not THAT bad. I like that I can get 600%+ CPU usage with it, without any explicit multi-threading.

          Anyhow, I have much better translators, now, in the Vancouver Short Read Analysis Package.... but they're still java. :P

          The manuals are a work in progress. If anyone would like to give them a try, I'll update that part of the manual.
          The more you know, the more you know you don't know. —Aristotle

          Comment

          • __sequence
            Member
            • Jun 2011
            • 13

            #6
            apfejes, could you please tell, what is the last downloadable version of your program, and what is an example of a command line to use it? I need to convert a single file (whole genome, not divided into chromosomes) as provided by the Eland export, convert it to .Bed

            Comment

            • mgogol
              Senior Member
              • Mar 2008
              • 197

              #7
              perl script export2bed.pl

              Here's a script from a colleague that I've used before.

              Code:
              #!/usr/bin/perl
              # Program to convert eland export format to BED format
              # Chris Seidel, June 2009
              #
              # Requires tab delim file of chromosome or contig names 
              # (eland fa match files) in the format:
              # UCSC_chr_name chr_length eland_name
              # corrects for alignments that go off the ends of the chrs
              # negative bases are trimmed to 1, 
              # bases > chr_length are set to chr_length
              # (I know the former exist, I don't know if the latter exist)
              # results are not sorted, but can be sorted in linux by:
              # sort -o infile.bed -k 1,1 -k 2,2n infile.bed
              # (sort in place, first column, then by second column numeric)
              
              die("usage: $0 chrmap.txt eland_export.txt") unless(scalar(@ARGV) == 2);
              
              # create output filename
              $outfile = $ARGV[1];
              $outfile =~ s/\.txt$/\.bed/;
              open(FOUT, ">$outfile") || die("can't open output file: $outfile");
              
              # get info on chromosomes
              open(cmap, $ARGV[0]) || die("no chromosome name mapping file!");
              %chrmap = {};
              while($line = <cmap>){
                  chomp($line);
                  ($newval, $size, $oldval) = split(/\t/, $line);
                  $chrmap{$oldval} = $newval;
                  $chrsize{$oldval} = $size;
              }
              
              # open input file
              open(fp, $ARGV[1]) || die("can't open eland file");
              
              $lines = 0;
              while(<fp>){
                  chop;
                  ++$lines;
                  @bits = split(/\t/);
                  # skip reads that didn't pass filtering
                  next if($bits[21] eq "N");
                  # get match name
                  $seqname = $bits[10];
                  # skip No Matches or QC failures
                  # next if($seqname =~ /NM|QC/);
                  # skip repeat matches
                  # next if($seqname =~ /\d+:\d+:\d+/);
                  # we're only interested in sequences that match our chrs
                  next unless(exists($chrmap{$seqname}));
              
                  $seqlen = length($bits[8]);
                  $start = $bits[12];
                  $end = $start + $seqlen - 1;
                  $strand = $bits[13];
              
                  # parse match descriptor
                  $n = ($bits[14] =~ tr/[ACGTN]/[ACGTN]/);
                  # skip reads beyond a certain threshold
                  next if($n > 2);
                  $read_code = "U".$n;
              
                  # correct for alignments off the chromosome ends
                  if( $start <= 0 ){
                      print STDERR "start less than or equal to 0:   ", $start, "\n";
                      print STDERR join("\t", @bits), "\n";
                      $start = 1;
                  }
              
                  if($end > $chrsize{$seqname}){
                      print STDERR "end greater than chr end $chrsize{$seqname}:   $end, diff: ", $end - $chrsize{$seqname}, "\n";
                      print STDERR join("\t", @bits), "\n";
                      $end = $chrsize{$seqname};
                  }
              
                  if($strand eq "F"){
                      $strand = "+";
                      $color = "0,0,255";
                  }
                  else{
                      $strand = "-";
                      $color = "255,0,0";
                  }
              
                  $score = 0;
                  print FOUT join("\t", $chrmap{$seqname}, $start, $end, $read_code, $score, $strand, $start, $end, $color), "\n";
              
                  # give some feedback
                  print STDERR "$lines processed\n" if(!($lines % 100000));
              }
              
              close(FOUT);
              print STDERR "output file: $outfile\n";

              Comment

              • __sequence
                Member
                • Jun 2011
                • 13

                #8
                mgogol, Thank you for posting. I have tried this script, and it returns an empty file. May be the problem is that it requires an additional file as input?

                Requires tab delim file of chromosome or contig names
                # (eland fa match files) in the format:
                # UCSC_chr_name chr_length eland_name
                I have all chromosomes in one input file. So I still need to create this additional file with chromosome names? Not clear how.

                Comment

                • apfejes
                  Senior Member
                  • Feb 2008
                  • 236

                  #9
                  Sorry for the slow reply - I'm currently away at a conference.

                  The latest versions can be found as part of the Vancouver Short Read Analysis Package: http://vancouvershortr.sourceforge.net/

                  There should be several work flows here, depending on the starting format.

                  Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.


                  If that doesn't work for you, please let me know, and I'll provide more explicit information when I return to the office.
                  Cheers
                  The more you know, the more you know you don't know. —Aristotle

                  Comment

                  • __sequence
                    Member
                    • Jun 2011
                    • 13

                    #10
                    Originally posted by apfejes View Post
                    The latest versions can be found as part of the Vancouver Short Read Analysis Package: http://vancouvershortr.sourceforge.net/

                    There should be several work flows here, depending on the starting format.

                    Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.


                    If that doesn't work for you, please let me know, and I'll provide more explicit information when I return to the office.
                    Hi apfejes,

                    Thank you for your reply. I tried the following format: java -jar conversion_util/ConvertToBed.jar -aligner eland -input "input_dir/name" -output "output_dir" -name "name" -noprepend

                    As a result I got the following error:

                    Version: Initializing class ElandIterator $Revision: 2933 $
                    Error: Line 1 has an invalid read:
                    Error: Mismatches is less than 0
                    Last edited by __sequence; 06-09-2011, 03:22 AM.

                    Comment

                    • apfejes
                      Senior Member
                      • Feb 2008
                      • 236

                      #11
                      You need to put the log file in a directory that exists and for which you have write permissions. If the directory you've given above does not exits, then it will not be able to create the log file.
                      The more you know, the more you know you don't know. —Aristotle

                      Comment

                      • __sequence
                        Member
                        • Jun 2011
                        • 13

                        #12
                        Oups, I just edited my post above. I figured out about the directory, but there is still an error:
                        Error: Line 1 has an invalid read:
                        Error: Mismatches is less than 0

                        Comment

                        • apfejes
                          Senior Member
                          • Feb 2008
                          • 236

                          #13
                          That is some serious spam above - and worse, copied from my own blog!

                          Anyhow, can you paste the first line of the file? There's probably something simple that's going wrong, eg, you haven't followed the work flow to remove unmapped reads.

                          Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.
                          The more you know, the more you know you don't know. —Aristotle

                          Comment

                          • __sequence
                            Member
                            • Jun 2011
                            • 13

                            #14
                            Originally posted by apfejes View Post
                            That is some serious spam above - and worse, copied from my own blog!

                            Anyhow, can you paste the first line of the file? There's probably something simple that's going wrong, eg, you haven't followed the work flow to remove unmapped reads.

                            http://sourceforge.net/apps/mediawik...itle=WorkFlows
                            I tried using
                            grep “U[012]” Input.eland > Input.um.eland
                            as suggested at your manual, but it returns an empty file. May be my file is not in a standard Eland export format? I have been told that this is the output from Eland.

                            Here is how the first line looks like:

                            XX-XXXX01 21 1 1 1065 918 0 1 NTCAAAAACCCAGCGAACATCATTCTTTGGCTAGGG BMMMNVWTTVb____b_bb__b_b__bQQ______Y chr7.fa 128316527 R A35 111 Y
                            Is it Eland Export format?
                            Last edited by __sequence; 06-09-2011, 06:19 AM.

                            Comment

                            • ECO
                              --Site Admin--
                              • Oct 2007
                              • 1360

                              #15
                              Sorry about the spam...the guy used an interesting strategy of spamming with Anthony's content to get past the filters. Grrrrrr.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 11:58 AM
                              0 responses
                              6 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              23 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              34 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              55 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...