Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Installing a command/program to a remote computer

    I downloaded fastq-dump. While it does the job on my personal computer, it runs very slow.

    I have access to a remote computer that can get the job done faster. However, upon logging in, I realized that fastq-dump has not been installed there.

    Is there a possibility I can "transfer" the program from my computer to the remote computer?

  • #2
    Yes, of course.

    If your personal computer is running the same operating system as your server, you can just transfer the precompiled binaries from your computer to the server via SFTP.

    It is unlikely however that your server is running the same operating system as your personal computer. You should therefore download the appropriate binaries for the server's operating system from NCBI (http://www.ncbi.nlm.nih.gov/Traces/s...?view=software). You can use the wget command to place them directly on your server, rather than taking the extra step of downloading them onto your computer and then transferring them to your server.

    Here is an example of a set of commands to install fastq-dump in your home directory on your server. In this example, the server is running the CentOS operating system.

    wget --no-check-certificate http://ftp-trace.ncbi.nlm.nih.gov/sr...linux64.tar.gz
    tar xzvf sratoolkit.2.3.5-2-centos_linux64.tar.gz
    cd sratoolkit.2.3.5-2-centos_linux64/bin/
    fastq-dump

    To avoid having to enter the full path to fastq-dump, you should add the path to your $PATH environment variable in your .bash_profile file (if you are using the BASH shell).

    PATH=$PATH:$HOME/sratoolkit.2.3.5-2-centos_linux64/bin/
    export PATH

    You'll need to logout and log back in, or type the following command for the new settings to take effect.

    source ~/.bash_profile
    Last edited by blancha; 04-27-2014, 04:33 AM.

    Comment


    • #3
      Space problem too.

      Thanks for the reply, blancha. I am looking into transferring it via the helpful steps you provided.

      The other problem I am coming up with is that although the fastq-dump command works on my personal computer, the fastq files are huge (~27GB for each one). It quickly took over my space.

      My next step is to convert the fastq files to BAM files. Do you think these BAM files will be smaller than the fastq files? So that each time I create a BAM files, I could delete its fastq and have enough room on my personal computer?

      I plan to use the program DESeq to analyze ~10 BAM files. So, if there is a more efficient and space-saving way for me to convert all the .sra files to BAM files, I would consider alternatives too...

      Comment


      • #4
        If the fastq files in question aren't compressed then the BAM files will definitely be smaller (if the fastq files are gzipped, then the BAM files will probably be a bit bigger, since they contain added information). If you really want, you can even delete the BAM files after counting (with featureCounts or htseq-count), since it's the counts that are used by DESeq (use DESeq2 rather than DESeq).

        Comment


        • #5
          dpryan, many thanks for your reply.

          I think the fastq files are not compressed, because there is no extension. They are just named SRRXXXX.fastq. So, hopefully the BAM files will be smaller. Is that the right thinking?

          When I look at the manual for DESeq, it seems that only the SAM files are needed as input for htseq-count. Should I even use BAM files? I was thinking of using the following two steps after doing a bit of research. Do you think these are good programs?

          - Picard 'FastqToSam'
          - Galaxy 'SamToBam' (only if I need BAM files in the first place).

          Also, I did not even look into DESeq2. I will check that out as well. Thanks for the advice. Would it be a huge problem if I tried on DESeq first (as I have read the manual, and this is mostly for me to get my feet wet with the process, not for publishing or anything! ))

          Comment


          • #6
            The SAM files would be even bigger than the fastq files, so you really don't want to have them around!

            FastqToSam will be absolutely useless for you and the BAM files produced from that will also be completely useless. You don't want to convert file formats, but rather map the reads to the genome with a tool like tophat2 or STAR. I can't give any advice on doing that using Galaxy.

            If you're just starting off then that's all the more reason to just go directly to DESeq2 It's actually easier to use than DESeq and the results will be better.

            Comment


            • #7
              You're skipping an essential step.
              You need to generate the BAM files by aligning the reads to a reference genome first.
              Picard's FastToSam will just convert the FASTQ files to the unaligned SAM format.
              You cannot run htseq-count on these unaligned SAM files.
              In fact, I'm not even sure when this unaligned SAM format would be useful, although there must be cases where the program is useful, since it exists.

              If this is RNA-Seq data, you could use TopHat to align the reads to the reference genome.
              If is it ChIP-Seq data, you could use bowtie2 directly to align the reads to the reference genome.

              I concur with dpryan that you should be using DESeq2 and not DESeq.

              RNA-Seq data analysis steps
              1. Align to reference genome with TopHat
              2. Count aligned read with htseq-count
              3. Calculate differential expression with DESeq2.

              I'm skipping some extra quality control steps.
              I first verify the quality of the FASTQ files with FASTQC.
              If necessary, I trim low quality bases and adapter sequences.
              I also check the quality of the alignment with RNA-SeQC.
              Since this data was found online, the quality control steps may already have been performed and may not be necessary.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X