Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple fastq alignment with bowtie2 in server

    Hi!
    I'm trying to map multiple sra files (>6500) with bowtie2 against my reference genome. I am running slurm script in a server. While mapping for single sequence is working fine but when running bash loop all the time getting the following error

    "path/to/slurm_script: line 16: path/to/file1.fastq: Permission denied"

    Here is my slurm script

    #!/bin/bash
    #BATCH --job-name=ERR1135336.clean.reads.Assembly
    #SBATCH -N 1 # Number of nodes, not cores
    #SBATCH -t 2-00:00:00 # Walltime
    #SBATCH --ntasks-per-node 40 # Number of cores
    #SBATCH --output=out-%j.log # Output (console)
    #SBATCH --partition=test # Queue

    module use /gpfs/shared/modulefiles_local
    module use /gpfs/shared/modulefiles_local/bio
    module load bio/bowtie2/2.3.4

    for i in $(path/to/*.fastq)
    do
    bowtie2 -x PC_805 --threads 40 -U ${i} -S path/to/${i%%.fastq}.sam
    done


    I am not sure whether this is really a permission issue or bash scripting issue.

    Output of ls -l for the directory from where I am running slurm job

    drwxr-xr-x 2 chayan.roy domain users 4096 Apr 23 10:14 PC_805


    Output of ls -l for the directory where I am storing my fastq is

    drwxr-xr-x 22 chayan.roy domain users 4096 Apr 22 14:44 HMP_2017

    Any help will be much appreciated

    Thanks

  • #2
    You can't run a bash script inside one SLURM job and expect the jobs to be parallelised. Instead you should run bash script on the command line that in turn submits multiple/individual SLURM jobs.

    "path/to/" I assume this a real path on your system that you are obfuscating here? If not you need to have a real value there.

    Comment


    • #3
      Thanks for your prompt response.

      If I understood correctly I have to submit >6500 slurm array? Well this particular server has 56 nodes and each with 40 threads. Every single job is taking more than 3 hours. Is there any other ways to make it faster?

      p.s. I have shortened the long real path in my post.

      Comment


      • #4
        If you want true parallelization then yes you would need to submit 6500 jobs to queue. You are likely not the only user so most of them will pend but will finish eventually.

        Comment


        • #5
          Hi,
          In spite of giving the path in for loop, you can first add a prefix of the serial number in all your fastq files and then try like this

          for i in $(1 6500);
          do
          bowtie2 -x PC_805 --threads 40 -U $i -S path/to/$i\_.fastq.sam;
          done

          Hoping it will help.
          Last edited by archana87; 04-29-2019, 02:10 PM.

          Comment


          • #6
            Hi

            I am running parallel jobs but all the getting the following error which I am not sure from my array script or something else.

            Slurm Array

            PHP Code:
            #!/bin/bash

            #SBATCH --job-name=Bowtie_Array # Job name
            #SBATCH --nodes=12               # Number of nodes
            #SBATCH --ntasks-per-node=40     # CPUs per node (MAX=40 for CPU nodes and 80 for GPU)
            #SBATCH --output=bowtie-%A_%a.out  # Standard output (log file)
            #SBATCH --partition=test        # Partition/Queue
            #SBATCH --time=7-00:00:00          # Maximum walltime
            #SBATCH --array=0-12        # job array index

            module use /cm/shared/modulefiles_local
            module 
            use /gpfs/shared/modulefiles_local/bio
            module load bio
            /bowtie2/2.3.4

            names
            =($(cat jobs))
             
            echo ${
            names[${SLURM_ARRAY_TASK_ID}]}

            bowtie2 --threads 40 -/gpfs/scratch/chayan.roy/Pc_project/HGM_Genomes/Index/PC_1969.fasta -${names[${SLURM_ARRAY_TASK_ID}]} -S alignments/${names[${SLURM_ARRAY_TASK_ID}]}.sam 

            Error message

            SRR1789035.fastq
            /gpfs/shared/apps_local/bowtie2/2.3.4.3/bin/bowtie2-align-s: error while loading shared libraries: libtbb.so.2: cannot open shared object file: No such file or directory
            (ERR): Description of arguments failed!
            Exiting now ...

            Any help?

            Comment


            • #7
              Did you download the bowtie2 binaries or compile the program yourself? Looks like the thread building blocks (tbb) library is missing on your cluster. See the section on "building from source" in the manual.

              Comment


              • #8
                I don't have installation access and I just ask them but they will take month to respond I know. In the meanwhile I am trying to bypass it using Anaconda. Do let me know if there is any better ways to do it.

                Thanks

                Comment


                • #9
                  If you use the conda option make sure to remove "module load bio/bowtie2/2.3.4 " from your script.

                  Hopefully your home directory is available on all cluster nodes because conda will install programs in your home directory by default.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  9 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  49 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  67 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X