Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sratoolkit v.2.3.1

    Has anyone tried the sratoolkit (v.2.3.1) that is currently available from NCBI SRA. The development kit user guide documentation seems to refer to *two* perl scripts (config-assistant.pl and reference-assistant.pl) but I can only see one (config-assistant.pl) in the download tarball for CentOS.

    I am trying to extract files from a recent SRA accession and the process is failing because reference files are not available.

    Main announcement on the SRA page seems to indicate that the process of downloading necessary data/reference files should be automatic. Anyone managed to get this to work?

  • #2
    Update:

    I noticed after I posted the original thread that this page seems to have notes that one should run a java jar file (on linux and OS X) that will automatically set up the download environment. This java file seems to be present only in the pre-compiled tar (if one builds the toolkit from source then there is no corresponding jar file).

    Unfortunately neither (pre-compiled binary or built from source) seem to be working with SRA files (including this test dataset SRR390728) at the moment.

    Code:
    fastq-dump.2.3.1 err: manager not found while constructing path within virtual file system module - failed SRR390728.sra Written 0 spots total
    Odd thing is it is still making a fastq file (can't tell if it is complete).

    Time to contact "sra@ncbi". If they respond, I will update this thread.

    Comment


    • #3
      Try building from source.

      There's always little nagging stuff with SRA.

      Comment


      • #4
        Originally posted by Richard Finney View Post
        Try building from source.

        There's always little nagging stuff with SRA.
        Already tried that. No go.

        Comment


        • #5
          I had a old bzip tar ball of sra_sdk-2.1.6 if you still need it


          -bash-3.00$ cat sra_sdk-2.1.6/reference-assistant.pl
          Code:
          
          #!/usr/local/bin/perl -w
          ################################################################################
          use strict;
          
          use File::Basename;
          use File::Spec;
          
          sub println { print @_; print "\n"; }
          
          my $MSWIN;
          ++$MSWIN if ($^O =~ /mswin/i);
          
          print "Checking refseq configuration... ";
          my $VDB_CONFIG = find_bin("vdb-config");
          die "not found" unless ($VDB_CONFIG);
          println "OK";
          
          print "Checking align-info... ";
          my $ALIGN_INFO = find_bin("align-info");
          die "not found" unless ($ALIGN_INFO);
          println "found";
          
          my $WGET;
          print "Checking wget... ";
          my $out = `wget -h 2>&1`;
          if ($? == 0) {
            println "found";
            $WGET = "wget -O";
          } else {
            println "not found";
          }
          unless ($WGET) {
            print "Checking curl...";
            $out = `curl -h 2>&1`;
            if ($? == 0) {
              println "found";
              $WGET = "curl -o";
            } else {
              println "not found";
            }
          }
          unless ($WGET) {
            print "Checking ./wget... ";
            my $cmd = dirname($0) ."/wget";
            $out = `$cmd -h 2>&1`;
            if ($? == 0) {
              println "found";
              $WGET = "$cmd -O";
            } else {
              println "not found.\nCannot continue.";
              exit 1;
            }
          }
          
          my $refseq_dir = simple_refseq_path();
          
          if ($#ARGV > -1) {
              foreach (@ARGV) {
                  load($_);
              }
          } else {
              while (1) {
                  my $f = ask("Enter cSRA file name (Press Enter to exit)");
                  last unless ($f);
                  load($f);
              }
          }
          
          sub ask {
              my ($prompt) = @_;
              print "$prompt: ";
              my $in = <STDIN>;
              chomp $in;
              return $in;
          }
          
          sub load {
              my ($f) = @_;
              println "Determining $f external dependencies...";
              my $cmd = "$ALIGN_INFO $f";
              my @info = `$cmd`;
              my $refs = 0;
              if ($?) {
                  println "$f: failed";
              } else {
                  my $ok = 0;
                  my $ko = 0;
                  foreach (@info) {
                      chomp;
                      my @r = split /,/;
                      if ($#r >= 3) {
                          my ($seqId, $remote) = ($r[0], $r[3]);
                          ++$refs;
                          if ($remote eq 'remote') {
                              print "Downloading $seqId... ";
                              my $cmd = "$WGET \"$refseq_dir/$seqId\""
                                  . " http://ftp-trace.ncbi.nlm.nih.gov/sra/refseq/$seqId"
                                  . " 2>&1";
                              `$cmd`;
                              if ($?) {
                                  println "failed";
                                  ++$ko;
                              }
                              else {
                                  println "OK";
                                  ++$ok;
                              }
                          }
                      }
                  }
                  print "All " . $refs . " references were checked (";
                  print "$ko failed, " if ($ko);
                  println "$ok downloaded)";
              }
          }
          
          sub simple_refseq_path {
              my %refseq;
              $refseq{s} = refseq_config('servers');
              $refseq{v} = refseq_config('volumes');
              $refseq{p} = refseq_config('paths');
          
              if (   ($refseq{s} && !$refseq{v})
                  || ($refseq{v} && !$refseq{s}))
              {   die "Invalid configuration"; }
          
              if ($refseq{s} && $refseq{v}) {
                  if ((index($refseq{s}, ":") != -1) || (index($refseq{v}, ":") != -1)) {
                      die "Unexpected '$refseq{s}/$refseq{v}'";
                  } else {
                      return "$refseq{s}/$refseq{v}";
                  }
              } elsif ($refseq{p}) {
                  return PATH_VDB2WIN($refseq{p});
              } else {
                  print "Cannot find configuration. Please run 'config-assistant.pl'\n";
                  exit 1;
              }
          }
          
          sub refseq_config {
              my ($nm) = @_;
              my $v = `$VDB_CONFIG refseq/$nm 2>&1`;
              if ($?) {
                  if ($v =~ /path not found while opening node/) {
                      $v = '';
                  } else {
                      die $!;
                  }
              } else {
                  $v =~ /<$nm>(.*)<\/$nm>/;
                  die "Invalid 'refseq/$nm' configuration" unless ($1);
                  $v = $1;
              }
              return $v;
          }
          
          sub find_bin {
            my ($name) = @_;
          
            my $basedir = dirname($0);
          
            # built from sources
            if (-e File::Spec->catfile($basedir, "Makefile")) {
              my $f = File::Spec->catfile($basedir, "build");
              $f = File::Spec->catfile($f, "Makefile.env");
              if (-e $f) {
                my $try = `make -s bindir -C $basedir 2>&1`;
                if ($? == 0) {
                  chomp $try;
                  $try = File::Spec->catfile($try, $name);
                  my $tmp = `$try -h 2>&1`;
                  if ($? == 0) {
                    return $try;
                  }
                }
              }
            }
          
            # try the same directory as the script
            my $try = File::Spec->catfile($basedir, $name);
            my $tmp = `$try -h 2>&1`;
            if ($? == 0) {
              return $try;
            }
          
            # check from PATH
            $try = "$name";
            $tmp = `$try -h 2>&1`;
            if ($? == 0) {
              return $try;
            }
          
            return 0;
          }
          
          sub WIN_TRANSLATE {
            ($_) = @_;
            return $_ unless($MSWIN);
            tr|/|\\|;
            return $_;
          }
          
          sub PATH_VDB2WIN {
            ($_) = @_;
            return $_ unless($MSWIN);
            $_ = WIN_TRANSLATE($_);
            s/^\\([a-zA-Z])\\/$1:\\/;
            return $_;
          }
          
          ################################################################################
          # EOF #
          The SRA documentation appears out of sync with the code.

          Comment


          • #6
            After some communication with the SRA support here is the new protocol to be followed.

            Each user needs to run the "configuration-assistant.perl" that is present in the "bin" directory (if you compile from source) or you could use the Java jar found in the precompiled tarball (do one or the other).

            While you are running this perl script you will reach a point where the software asks you "Would you like to test SRA files for remote reference dependencies? [y/N]". Choose "Yes" (default answer is No) which will then prompt you to provide an SRA accession number a step or two down the road. Have a test SRA# handy (if you are not working with a specific one). If you have correctly setup everything then the script will "download" the reference files it needs on the fly and store them in a directory that you designate when you run the perl script first time. A group writable directory can be used for this purpose, if multiple people need to dump SRA data.

            PS: The SRA# that I was originally working with turned out to have a corrupt .sra file at source. SRA is going to fix that problem.
            Last edited by GenoMax; 03-28-2013, 12:19 PM.

            Comment


            • #7
              Dear all, try using the ABSOLUTE PATH of that SRA file. The error means the file could not be found.

              Comment


              • #8
                Yeah, using an absolute path works.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                25 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                24 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X