Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Installing a command/program to a remote computer

    I downloaded fastq-dump. While it does the job on my personal computer, it runs very slow.

    I have access to a remote computer that can get the job done faster. However, upon logging in, I realized that fastq-dump has not been installed there.

    Is there a possibility I can "transfer" the program from my computer to the remote computer?

  • #2
    Yes, of course.

    If your personal computer is running the same operating system as your server, you can just transfer the precompiled binaries from your computer to the server via SFTP.

    It is unlikely however that your server is running the same operating system as your personal computer. You should therefore download the appropriate binaries for the server's operating system from NCBI (http://www.ncbi.nlm.nih.gov/Traces/s...?view=software). You can use the wget command to place them directly on your server, rather than taking the extra step of downloading them onto your computer and then transferring them to your server.

    Here is an example of a set of commands to install fastq-dump in your home directory on your server. In this example, the server is running the CentOS operating system.

    wget --no-check-certificate http://ftp-trace.ncbi.nlm.nih.gov/sr...linux64.tar.gz
    tar xzvf sratoolkit.2.3.5-2-centos_linux64.tar.gz
    cd sratoolkit.2.3.5-2-centos_linux64/bin/
    fastq-dump

    To avoid having to enter the full path to fastq-dump, you should add the path to your $PATH environment variable in your .bash_profile file (if you are using the BASH shell).

    PATH=$PATH:$HOME/sratoolkit.2.3.5-2-centos_linux64/bin/
    export PATH

    You'll need to logout and log back in, or type the following command for the new settings to take effect.

    source ~/.bash_profile
    Last edited by blancha; 04-27-2014, 04:33 AM.

    Comment


    • #3
      Space problem too.

      Thanks for the reply, blancha. I am looking into transferring it via the helpful steps you provided.

      The other problem I am coming up with is that although the fastq-dump command works on my personal computer, the fastq files are huge (~27GB for each one). It quickly took over my space.

      My next step is to convert the fastq files to BAM files. Do you think these BAM files will be smaller than the fastq files? So that each time I create a BAM files, I could delete its fastq and have enough room on my personal computer?

      I plan to use the program DESeq to analyze ~10 BAM files. So, if there is a more efficient and space-saving way for me to convert all the .sra files to BAM files, I would consider alternatives too...

      Comment


      • #4
        If the fastq files in question aren't compressed then the BAM files will definitely be smaller (if the fastq files are gzipped, then the BAM files will probably be a bit bigger, since they contain added information). If you really want, you can even delete the BAM files after counting (with featureCounts or htseq-count), since it's the counts that are used by DESeq (use DESeq2 rather than DESeq).

        Comment


        • #5
          dpryan, many thanks for your reply.

          I think the fastq files are not compressed, because there is no extension. They are just named SRRXXXX.fastq. So, hopefully the BAM files will be smaller. Is that the right thinking?

          When I look at the manual for DESeq, it seems that only the SAM files are needed as input for htseq-count. Should I even use BAM files? I was thinking of using the following two steps after doing a bit of research. Do you think these are good programs?

          - Picard 'FastqToSam'
          - Galaxy 'SamToBam' (only if I need BAM files in the first place).

          Also, I did not even look into DESeq2. I will check that out as well. Thanks for the advice. Would it be a huge problem if I tried on DESeq first (as I have read the manual, and this is mostly for me to get my feet wet with the process, not for publishing or anything! ))

          Comment


          • #6
            The SAM files would be even bigger than the fastq files, so you really don't want to have them around!

            FastqToSam will be absolutely useless for you and the BAM files produced from that will also be completely useless. You don't want to convert file formats, but rather map the reads to the genome with a tool like tophat2 or STAR. I can't give any advice on doing that using Galaxy.

            If you're just starting off then that's all the more reason to just go directly to DESeq2 It's actually easier to use than DESeq and the results will be better.

            Comment


            • #7
              You're skipping an essential step.
              You need to generate the BAM files by aligning the reads to a reference genome first.
              Picard's FastToSam will just convert the FASTQ files to the unaligned SAM format.
              You cannot run htseq-count on these unaligned SAM files.
              In fact, I'm not even sure when this unaligned SAM format would be useful, although there must be cases where the program is useful, since it exists.

              If this is RNA-Seq data, you could use TopHat to align the reads to the reference genome.
              If is it ChIP-Seq data, you could use bowtie2 directly to align the reads to the reference genome.

              I concur with dpryan that you should be using DESeq2 and not DESeq.

              RNA-Seq data analysis steps
              1. Align to reference genome with TopHat
              2. Count aligned read with htseq-count
              3. Calculate differential expression with DESeq2.

              I'm skipping some extra quality control steps.
              I first verify the quality of the FASTQ files with FASTQC.
              If necessary, I trim low quality bases and adapter sequences.
              I also check the quality of the alignment with RNA-SeQC.
              Since this data was found online, the quality control steps may already have been performed and may not be necessary.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Best Practices for Single-Cell Sequencing Analysis
                by seqadmin



                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                Today, 07:15 AM
              • seqadmin
                Latest Developments in Precision Medicine
                by seqadmin



                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                Somatic Genomics
                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                05-24-2024, 01:16 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:18 AM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Today, 08:04 AM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-03-2024, 06:55 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-30-2024, 03:16 PM
              0 responses
              27 views
              0 likes
              Last Post seqadmin  
              Working...
              X