SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
problem with sra toolkit fastq-dump sratoolkit.2.1.10-win64 hui_shi Bioinformatics 13 05-21-2015 05:21 PM

Reply
 
Thread Tools
Old 03-27-2013, 05:39 AM   #1
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Angry sratoolkit v.2.3.1

Has anyone tried the sratoolkit (v.2.3.1) that is currently available from NCBI SRA. The development kit user guide documentation seems to refer to *two* perl scripts (config-assistant.pl and reference-assistant.pl) but I can only see one (config-assistant.pl) in the download tarball for CentOS.

I am trying to extract files from a recent SRA accession and the process is failing because reference files are not available.

Main announcement on the SRA page seems to indicate that the process of downloading necessary data/reference files should be automatic. Anyone managed to get this to work?
GenoMax is offline   Reply With Quote
Old 03-27-2013, 10:56 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

Update:

I noticed after I posted the original thread that this page seems to have notes that one should run a java jar file (on linux and OS X) that will automatically set up the download environment. This java file seems to be present only in the pre-compiled tar (if one builds the toolkit from source then there is no corresponding jar file).

Unfortunately neither (pre-compiled binary or built from source) seem to be working with SRA files (including this test dataset SRR390728) at the moment.

Code:
fastq-dump.2.3.1 err: manager not found while constructing path within virtual file system module - failed SRR390728.sra Written 0 spots total
Odd thing is it is still making a fastq file (can't tell if it is complete).

Time to contact "sra@ncbi". If they respond, I will update this thread.
GenoMax is offline   Reply With Quote
Old 03-27-2013, 11:26 AM   #3
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

Try building from source.

There's always little nagging stuff with SRA.
Richard Finney is offline   Reply With Quote
Old 03-27-2013, 11:37 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

Quote:
Originally Posted by Richard Finney View Post
Try building from source.

There's always little nagging stuff with SRA.
Already tried that. No go.
GenoMax is offline   Reply With Quote
Old 03-27-2013, 01:11 PM   #5
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 700
Default

I had a old bzip tar ball of sra_sdk-2.1.6 if you still need it


-bash-3.00$ cat sra_sdk-2.1.6/reference-assistant.pl
Code:

#!/usr/local/bin/perl -w
################################################################################
use strict;

use File::Basename;
use File::Spec;

sub println { print @_; print "\n"; }

my $MSWIN;
++$MSWIN if ($^O =~ /mswin/i);

print "Checking refseq configuration... ";
my $VDB_CONFIG = find_bin("vdb-config");
die "not found" unless ($VDB_CONFIG);
println "OK";

print "Checking align-info... ";
my $ALIGN_INFO = find_bin("align-info");
die "not found" unless ($ALIGN_INFO);
println "found";

my $WGET;
print "Checking wget... ";
my $out = `wget -h 2>&1`;
if ($? == 0) {
  println "found";
  $WGET = "wget -O";
} else {
  println "not found";
}
unless ($WGET) {
  print "Checking curl...";
  $out = `curl -h 2>&1`;
  if ($? == 0) {
    println "found";
    $WGET = "curl -o";
  } else {
    println "not found";
  }
}
unless ($WGET) {
  print "Checking ./wget... ";
  my $cmd = dirname($0) ."/wget";
  $out = `$cmd -h 2>&1`;
  if ($? == 0) {
    println "found";
    $WGET = "$cmd -O";
  } else {
    println "not found.\nCannot continue.";
    exit 1;
  }
}

my $refseq_dir = simple_refseq_path();

if ($#ARGV > -1) {
    foreach (@ARGV) {
        load($_);
    }
} else {
    while (1) {
        my $f = ask("Enter cSRA file name (Press Enter to exit)");
        last unless ($f);
        load($f);
    }
}

sub ask {
    my ($prompt) = @_;
    print "$prompt: ";
    my $in = <STDIN>;
    chomp $in;
    return $in;
}

sub load {
    my ($f) = @_;
    println "Determining $f external dependencies...";
    my $cmd = "$ALIGN_INFO $f";
    my @info = `$cmd`;
    my $refs = 0;
    if ($?) {
        println "$f: failed";
    } else {
        my $ok = 0;
        my $ko = 0;
        foreach (@info) {
            chomp;
            my @r = split /,/;
            if ($#r >= 3) {
                my ($seqId, $remote) = ($r[0], $r[3]);
                ++$refs;
                if ($remote eq 'remote') {
                    print "Downloading $seqId... ";
                    my $cmd = "$WGET \"$refseq_dir/$seqId\""
                        . " http://ftp-trace.ncbi.nlm.nih.gov/sra/refseq/$seqId"
                        . " 2>&1";
                    `$cmd`;
                    if ($?) {
                        println "failed";
                        ++$ko;
                    }
                    else {
                        println "OK";
                        ++$ok;
                    }
                }
            }
        }
        print "All " . $refs . " references were checked (";
        print "$ko failed, " if ($ko);
        println "$ok downloaded)";
    }
}

sub simple_refseq_path {
    my %refseq;
    $refseq{s} = refseq_config('servers');
    $refseq{v} = refseq_config('volumes');
    $refseq{p} = refseq_config('paths');

    if (   ($refseq{s} && !$refseq{v})
        || ($refseq{v} && !$refseq{s}))
    {   die "Invalid configuration"; }

    if ($refseq{s} && $refseq{v}) {
        if ((index($refseq{s}, ":") != -1) || (index($refseq{v}, ":") != -1)) {
            die "Unexpected '$refseq{s}/$refseq{v}'";
        } else {
            return "$refseq{s}/$refseq{v}";
        }
    } elsif ($refseq{p}) {
        return PATH_VDB2WIN($refseq{p});
    } else {
        print "Cannot find configuration. Please run 'config-assistant.pl'\n";
        exit 1;
    }
}

sub refseq_config {
    my ($nm) = @_;
    my $v = `$VDB_CONFIG refseq/$nm 2>&1`;
    if ($?) {
        if ($v =~ /path not found while opening node/) {
            $v = '';
        } else {
            die $!;
        }
    } else {
        $v =~ /<$nm>(.*)<\/$nm>/;
        die "Invalid 'refseq/$nm' configuration" unless ($1);
        $v = $1;
    }
    return $v;
}

sub find_bin {
  my ($name) = @_;

  my $basedir = dirname($0);

  # built from sources
  if (-e File::Spec->catfile($basedir, "Makefile")) {
    my $f = File::Spec->catfile($basedir, "build");
    $f = File::Spec->catfile($f, "Makefile.env");
    if (-e $f) {
      my $try = `make -s bindir -C $basedir 2>&1`;
      if ($? == 0) {
        chomp $try;
        $try = File::Spec->catfile($try, $name);
        my $tmp = `$try -h 2>&1`;
        if ($? == 0) {
          return $try;
        }
      }
    }
  }

  # try the same directory as the script
  my $try = File::Spec->catfile($basedir, $name);
  my $tmp = `$try -h 2>&1`;
  if ($? == 0) {
    return $try;
  }

  # check from PATH
  $try = "$name";
  $tmp = `$try -h 2>&1`;
  if ($? == 0) {
    return $try;
  }

  return 0;
}

sub WIN_TRANSLATE {
  ($_) = @_;
  return $_ unless($MSWIN);
  tr|/|\\|;
  return $_;
}

sub PATH_VDB2WIN {
  ($_) = @_;
  return $_ unless($MSWIN);
  $_ = WIN_TRANSLATE($_);
  s/^\\([a-zA-Z])\\/$1:\\/;
  return $_;
}

################################################################################
# EOF #
The SRA documentation appears out of sync with the code.
Richard Finney is offline   Reply With Quote
Old 03-28-2013, 12:16 PM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,982
Default

After some communication with the SRA support here is the new protocol to be followed.

Each user needs to run the "configuration-assistant.perl" that is present in the "bin" directory (if you compile from source) or you could use the Java jar found in the precompiled tarball (do one or the other).

While you are running this perl script you will reach a point where the software asks you "Would you like to test SRA files for remote reference dependencies? [y/N]". Choose "Yes" (default answer is No) which will then prompt you to provide an SRA accession number a step or two down the road. Have a test SRA# handy (if you are not working with a specific one). If you have correctly setup everything then the script will "download" the reference files it needs on the fly and store them in a directory that you designate when you run the perl script first time. A group writable directory can be used for this purpose, if multiple people need to dump SRA data.

PS: The SRA# that I was originally working with turned out to have a corrupt .sra file at source. SRA is going to fix that problem.

Last edited by GenoMax; 03-28-2013 at 12:19 PM.
GenoMax is offline   Reply With Quote
Old 07-11-2013, 01:45 PM   #7
giorgifm
Member
 
Location: Columbia University Medical Center

Join Date: Aug 2011
Posts: 35
Default

Dear all, try using the ABSOLUTE PATH of that SRA file. The error means the file could not be found.
giorgifm is offline   Reply With Quote
Old 04-03-2015, 04:38 PM   #8
student-t
Member
 
Location: Garvan Institute

Join Date: Mar 2015
Posts: 16
Default

Yeah, using an absolute path works.
student-t is offline   Reply With Quote
Reply

Tags
fastq-dump, sratoolkit

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:05 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO