Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Eland-to-Bed algorithm

    Can anybody share an algorithm that takes eland files and make bed files out of them?
    Thanks
    Joseph

  • #2
    If you don't get any help...post a few lines of the eland file and i can try to slap something together.

    Comment


    • #3
      ElandtoBed.jar at FindPeak package (http://www.bcgsc.ca/platform/bioinfo/software/findpeaks)

      Comment


      • #4
        java...*shudders*

        Comment


        • #5
          Hey - Java's not THAT bad. I like that I can get 600%+ CPU usage with it, without any explicit multi-threading.

          Anyhow, I have much better translators, now, in the Vancouver Short Read Analysis Package.... but they're still java. :P

          The manuals are a work in progress. If anyone would like to give them a try, I'll update that part of the manual.
          The more you know, the more you know you don't know. —Aristotle

          Comment


          • #6
            apfejes, could you please tell, what is the last downloadable version of your program, and what is an example of a command line to use it? I need to convert a single file (whole genome, not divided into chromosomes) as provided by the Eland export, convert it to .Bed

            Comment


            • #7
              perl script export2bed.pl

              Here's a script from a colleague that I've used before.

              Code:
              #!/usr/bin/perl
              # Program to convert eland export format to BED format
              # Chris Seidel, June 2009
              #
              # Requires tab delim file of chromosome or contig names 
              # (eland fa match files) in the format:
              # UCSC_chr_name chr_length eland_name
              # corrects for alignments that go off the ends of the chrs
              # negative bases are trimmed to 1, 
              # bases > chr_length are set to chr_length
              # (I know the former exist, I don't know if the latter exist)
              # results are not sorted, but can be sorted in linux by:
              # sort -o infile.bed -k 1,1 -k 2,2n infile.bed
              # (sort in place, first column, then by second column numeric)
              
              die("usage: $0 chrmap.txt eland_export.txt") unless(scalar(@ARGV) == 2);
              
              # create output filename
              $outfile = $ARGV[1];
              $outfile =~ s/\.txt$/\.bed/;
              open(FOUT, ">$outfile") || die("can't open output file: $outfile");
              
              # get info on chromosomes
              open(cmap, $ARGV[0]) || die("no chromosome name mapping file!");
              %chrmap = {};
              while($line = <cmap>){
                  chomp($line);
                  ($newval, $size, $oldval) = split(/\t/, $line);
                  $chrmap{$oldval} = $newval;
                  $chrsize{$oldval} = $size;
              }
              
              # open input file
              open(fp, $ARGV[1]) || die("can't open eland file");
              
              $lines = 0;
              while(<fp>){
                  chop;
                  ++$lines;
                  @bits = split(/\t/);
                  # skip reads that didn't pass filtering
                  next if($bits[21] eq "N");
                  # get match name
                  $seqname = $bits[10];
                  # skip No Matches or QC failures
                  # next if($seqname =~ /NM|QC/);
                  # skip repeat matches
                  # next if($seqname =~ /\d+:\d+:\d+/);
                  # we're only interested in sequences that match our chrs
                  next unless(exists($chrmap{$seqname}));
              
                  $seqlen = length($bits[8]);
                  $start = $bits[12];
                  $end = $start + $seqlen - 1;
                  $strand = $bits[13];
              
                  # parse match descriptor
                  $n = ($bits[14] =~ tr/[ACGTN]/[ACGTN]/);
                  # skip reads beyond a certain threshold
                  next if($n > 2);
                  $read_code = "U".$n;
              
                  # correct for alignments off the chromosome ends
                  if( $start <= 0 ){
                      print STDERR "start less than or equal to 0:   ", $start, "\n";
                      print STDERR join("\t", @bits), "\n";
                      $start = 1;
                  }
              
                  if($end > $chrsize{$seqname}){
                      print STDERR "end greater than chr end $chrsize{$seqname}:   $end, diff: ", $end - $chrsize{$seqname}, "\n";
                      print STDERR join("\t", @bits), "\n";
                      $end = $chrsize{$seqname};
                  }
              
                  if($strand eq "F"){
                      $strand = "+";
                      $color = "0,0,255";
                  }
                  else{
                      $strand = "-";
                      $color = "255,0,0";
                  }
              
                  $score = 0;
                  print FOUT join("\t", $chrmap{$seqname}, $start, $end, $read_code, $score, $strand, $start, $end, $color), "\n";
              
                  # give some feedback
                  print STDERR "$lines processed\n" if(!($lines % 100000));
              }
              
              close(FOUT);
              print STDERR "output file: $outfile\n";

              Comment


              • #8
                mgogol, Thank you for posting. I have tried this script, and it returns an empty file. May be the problem is that it requires an additional file as input?

                Requires tab delim file of chromosome or contig names
                # (eland fa match files) in the format:
                # UCSC_chr_name chr_length eland_name
                I have all chromosomes in one input file. So I still need to create this additional file with chromosome names? Not clear how.

                Comment


                • #9
                  Sorry for the slow reply - I'm currently away at a conference.

                  The latest versions can be found as part of the Vancouver Short Read Analysis Package: http://vancouvershortr.sourceforge.net/

                  There should be several work flows here, depending on the starting format.

                  Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.


                  If that doesn't work for you, please let me know, and I'll provide more explicit information when I return to the office.
                  Cheers
                  The more you know, the more you know you don't know. —Aristotle

                  Comment


                  • #10
                    Originally posted by apfejes View Post
                    The latest versions can be found as part of the Vancouver Short Read Analysis Package: http://vancouvershortr.sourceforge.net/

                    There should be several work flows here, depending on the starting format.

                    Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.


                    If that doesn't work for you, please let me know, and I'll provide more explicit information when I return to the office.
                    Hi apfejes,

                    Thank you for your reply. I tried the following format: java -jar conversion_util/ConvertToBed.jar -aligner eland -input "input_dir/name" -output "output_dir" -name "name" -noprepend

                    As a result I got the following error:

                    Version: Initializing class ElandIterator $Revision: 2933 $
                    Error: Line 1 has an invalid read:
                    Error: Mismatches is less than 0
                    Last edited by __sequence; 06-09-2011, 03:22 AM.

                    Comment


                    • #11
                      You need to put the log file in a directory that exists and for which you have write permissions. If the directory you've given above does not exits, then it will not be able to create the log file.
                      The more you know, the more you know you don't know. —Aristotle

                      Comment


                      • #12
                        Oups, I just edited my post above. I figured out about the directory, but there is still an error:
                        Error: Line 1 has an invalid read:
                        Error: Mismatches is less than 0

                        Comment


                        • #13
                          That is some serious spam above - and worse, copied from my own blog!

                          Anyhow, can you paste the first line of the file? There's probably something simple that's going wrong, eg, you haven't followed the work flow to remove unmapped reads.

                          Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.
                          The more you know, the more you know you don't know. —Aristotle

                          Comment


                          • #14
                            Originally posted by apfejes View Post
                            That is some serious spam above - and worse, copied from my own blog!

                            Anyhow, can you paste the first line of the file? There's probably something simple that's going wrong, eg, you haven't followed the work flow to remove unmapped reads.

                            http://sourceforge.net/apps/mediawik...itle=WorkFlows
                            I tried using
                            grep “U[012]” Input.eland > Input.um.eland
                            as suggested at your manual, but it returns an empty file. May be my file is not in a standard Eland export format? I have been told that this is the output from Eland.

                            Here is how the first line looks like:

                            XX-XXXX01 21 1 1 1065 918 0 1 NTCAAAAACCCAGCGAACATCATTCTTTGGCTAGGG BMMMNVWTTVb____b_bb__b_b__bQQ______Y chr7.fa 128316527 R A35 111 Y
                            Is it Eland Export format?
                            Last edited by __sequence; 06-09-2011, 06:19 AM.

                            Comment


                            • #15
                              Sorry about the spam...the guy used an interesting strategy of spamming with Anthony's content to get past the filters. Grrrrrr.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X