Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Eland-to-Bed algorithm

    Can anybody share an algorithm that takes eland files and make bed files out of them?
    Thanks
    Joseph

  • #2
    If you don't get any help...post a few lines of the eland file and i can try to slap something together.

    Comment


    • #3
      ElandtoBed.jar at FindPeak package (http://www.bcgsc.ca/platform/bioinfo/software/findpeaks)

      Comment


      • #4
        java...*shudders*

        Comment


        • #5
          Hey - Java's not THAT bad. I like that I can get 600%+ CPU usage with it, without any explicit multi-threading.

          Anyhow, I have much better translators, now, in the Vancouver Short Read Analysis Package.... but they're still java. :P

          The manuals are a work in progress. If anyone would like to give them a try, I'll update that part of the manual.
          The more you know, the more you know you don't know. —Aristotle

          Comment


          • #6
            apfejes, could you please tell, what is the last downloadable version of your program, and what is an example of a command line to use it? I need to convert a single file (whole genome, not divided into chromosomes) as provided by the Eland export, convert it to .Bed

            Comment


            • #7
              perl script export2bed.pl

              Here's a script from a colleague that I've used before.

              Code:
              #!/usr/bin/perl
              # Program to convert eland export format to BED format
              # Chris Seidel, June 2009
              #
              # Requires tab delim file of chromosome or contig names 
              # (eland fa match files) in the format:
              # UCSC_chr_name chr_length eland_name
              # corrects for alignments that go off the ends of the chrs
              # negative bases are trimmed to 1, 
              # bases > chr_length are set to chr_length
              # (I know the former exist, I don't know if the latter exist)
              # results are not sorted, but can be sorted in linux by:
              # sort -o infile.bed -k 1,1 -k 2,2n infile.bed
              # (sort in place, first column, then by second column numeric)
              
              die("usage: $0 chrmap.txt eland_export.txt") unless(scalar(@ARGV) == 2);
              
              # create output filename
              $outfile = $ARGV[1];
              $outfile =~ s/\.txt$/\.bed/;
              open(FOUT, ">$outfile") || die("can't open output file: $outfile");
              
              # get info on chromosomes
              open(cmap, $ARGV[0]) || die("no chromosome name mapping file!");
              %chrmap = {};
              while($line = <cmap>){
                  chomp($line);
                  ($newval, $size, $oldval) = split(/\t/, $line);
                  $chrmap{$oldval} = $newval;
                  $chrsize{$oldval} = $size;
              }
              
              # open input file
              open(fp, $ARGV[1]) || die("can't open eland file");
              
              $lines = 0;
              while(<fp>){
                  chop;
                  ++$lines;
                  @bits = split(/\t/);
                  # skip reads that didn't pass filtering
                  next if($bits[21] eq "N");
                  # get match name
                  $seqname = $bits[10];
                  # skip No Matches or QC failures
                  # next if($seqname =~ /NM|QC/);
                  # skip repeat matches
                  # next if($seqname =~ /\d+:\d+:\d+/);
                  # we're only interested in sequences that match our chrs
                  next unless(exists($chrmap{$seqname}));
              
                  $seqlen = length($bits[8]);
                  $start = $bits[12];
                  $end = $start + $seqlen - 1;
                  $strand = $bits[13];
              
                  # parse match descriptor
                  $n = ($bits[14] =~ tr/[ACGTN]/[ACGTN]/);
                  # skip reads beyond a certain threshold
                  next if($n > 2);
                  $read_code = "U".$n;
              
                  # correct for alignments off the chromosome ends
                  if( $start <= 0 ){
                      print STDERR "start less than or equal to 0:   ", $start, "\n";
                      print STDERR join("\t", @bits), "\n";
                      $start = 1;
                  }
              
                  if($end > $chrsize{$seqname}){
                      print STDERR "end greater than chr end $chrsize{$seqname}:   $end, diff: ", $end - $chrsize{$seqname}, "\n";
                      print STDERR join("\t", @bits), "\n";
                      $end = $chrsize{$seqname};
                  }
              
                  if($strand eq "F"){
                      $strand = "+";
                      $color = "0,0,255";
                  }
                  else{
                      $strand = "-";
                      $color = "255,0,0";
                  }
              
                  $score = 0;
                  print FOUT join("\t", $chrmap{$seqname}, $start, $end, $read_code, $score, $strand, $start, $end, $color), "\n";
              
                  # give some feedback
                  print STDERR "$lines processed\n" if(!($lines % 100000));
              }
              
              close(FOUT);
              print STDERR "output file: $outfile\n";

              Comment


              • #8
                mgogol, Thank you for posting. I have tried this script, and it returns an empty file. May be the problem is that it requires an additional file as input?

                Requires tab delim file of chromosome or contig names
                # (eland fa match files) in the format:
                # UCSC_chr_name chr_length eland_name
                I have all chromosomes in one input file. So I still need to create this additional file with chromosome names? Not clear how.

                Comment


                • #9
                  Sorry for the slow reply - I'm currently away at a conference.

                  The latest versions can be found as part of the Vancouver Short Read Analysis Package: http://vancouvershortr.sourceforge.net/

                  There should be several work flows here, depending on the starting format.

                  Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.


                  If that doesn't work for you, please let me know, and I'll provide more explicit information when I return to the office.
                  Cheers
                  The more you know, the more you know you don't know. —Aristotle

                  Comment


                  • #10
                    Originally posted by apfejes View Post
                    The latest versions can be found as part of the Vancouver Short Read Analysis Package: http://vancouvershortr.sourceforge.net/

                    There should be several work flows here, depending on the starting format.

                    Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.


                    If that doesn't work for you, please let me know, and I'll provide more explicit information when I return to the office.
                    Hi apfejes,

                    Thank you for your reply. I tried the following format: java -jar conversion_util/ConvertToBed.jar -aligner eland -input "input_dir/name" -output "output_dir" -name "name" -noprepend

                    As a result I got the following error:

                    Version: Initializing class ElandIterator $Revision: 2933 $
                    Error: Line 1 has an invalid read:
                    Error: Mismatches is less than 0
                    Last edited by __sequence; 06-09-2011, 03:22 AM.

                    Comment


                    • #11
                      You need to put the log file in a directory that exists and for which you have write permissions. If the directory you've given above does not exits, then it will not be able to create the log file.
                      The more you know, the more you know you don't know. —Aristotle

                      Comment


                      • #12
                        Oups, I just edited my post above. I figured out about the directory, but there is still an error:
                        Error: Line 1 has an invalid read:
                        Error: Mismatches is less than 0

                        Comment


                        • #13
                          That is some serious spam above - and worse, copied from my own blog!

                          Anyhow, can you paste the first line of the file? There's probably something simple that's going wrong, eg, you haven't followed the work flow to remove unmapped reads.

                          Download Vancouver Short Read Analysis Package for free. This package contains code for use with Short Read DNA Sequencing technologies, and includes packages for ChIP-Seq, Whole Transcriptome Shotgun Sequencing, Whole Genome Shotgun Sequencing, SNP Detection, Transcript expression and file conversion.
                          The more you know, the more you know you don't know. —Aristotle

                          Comment


                          • #14
                            Originally posted by apfejes View Post
                            That is some serious spam above - and worse, copied from my own blog!

                            Anyhow, can you paste the first line of the file? There's probably something simple that's going wrong, eg, you haven't followed the work flow to remove unmapped reads.

                            http://sourceforge.net/apps/mediawik...itle=WorkFlows
                            I tried using
                            grep “U[012]” Input.eland > Input.um.eland
                            as suggested at your manual, but it returns an empty file. May be my file is not in a standard Eland export format? I have been told that this is the output from Eland.

                            Here is how the first line looks like:

                            XX-XXXX01 21 1 1 1065 918 0 1 NTCAAAAACCCAGCGAACATCATTCTTTGGCTAGGG BMMMNVWTTVb____b_bb__b_b__bQQ______Y chr7.fa 128316527 R A35 111 Y
                            Is it Eland Export format?
                            Last edited by __sequence; 06-09-2011, 06:19 AM.

                            Comment


                            • #15
                              Sorry about the spam...the guy used an interesting strategy of spamming with Anthony's content to get past the filters. Grrrrrr.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              66 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X