Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sbdk82
    Member
    • Jul 2014
    • 26

    STAR Aligner

    I am running STAR for aligning wheat RNA-Seq data with Ensemble reference file . The size of reference file is 4gb. The genome directory created in the first step is 42 gb. The mapping step took more than 50 hours. Some jobs are still running for more than 75 hours

    I used 5 nodes with 100gb each in our university cluster . Here is the script I used
    HTML Code:
    #!/bin/sh
    #SBATCH --job-name=STAR
    #SBATCH --nodes=5
    #SBATCH --ntasks-per-node=1
    #SBATCH --time=120:00:00
    #SBATCH --mem=100g
    #SBATCH --error=<Error File Name>
    #SBATCH --output=<Output File Name>
    
    cd  /Dir_PATH/STAR
    
    ./STAR_2.4.0b/STAR --genomeDir /Dir_PATH/STAR/index  --readFilesIn  /File_PATH/L001_R1_001.fastq,/File_PATH/L002_R1_001.fastq File_PATH/L001_R2_001.fastq,/File_PATH/_L002_R2_001.fastq --outFileNamePrefix /Dir_PATH/<Prefix_Name>/ --runThreadN 10

    I ran BWA-MEM on same data and it took less than 10 hours to complete the mapping. Am I doing something wrong or do I need to choose some other parameters ?
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    That seems quite odd. You're giving each of the nodes different files, yes?

    Comment

    • sbdk82
      Member
      • Jul 2014
      • 26

      #3
      I think it uses total 500 gb (100gb x 5 nodes) for this job. It does not distribute different files into different nodes.

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        You're just running the same thing on all of the nodes. If you're doing the same with bwa mem then that's happening there as well.

        Comment

        • dpryan
          Devon Ryan
          • Jul 2011
          • 3478

          #5
          I'll add that loading the I/O overhead of loading the index and constantly overwritting itself could cause a slow down (I limit STAR two 4 concurrent instances on our cluster when outputting to SAM since otherwise I can't guarantee that the drives can keep up if any other jobs are running).

          Comment

          • sbdk82
            Member
            • Jul 2014
            • 26

            #6
            So I should try with this?

            HTML Code:
            #SBATCH --nodes=1
            #SBATCH --ntasks-per-node=1
            #SBATCH --time=120:00:00
            #SBATCH --mem=100g

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              Sure, though you don't need to specify --ntasks-per-node when you just use one node. For reference, here is the start of mine:

              Code:
              #!/bin/bash
              #SBATCH -J STAR-align
              #SBATCH -t 4:00:00
              nNodes=4
              #SBATCH -N 4
              #SBATCH -A ryand
              #SBATCH --exclusive
              #SBATCH --partition=work
              BIN=$WORK/bin
              i=0
              for i in `seq $nNodes`
              do
                  j=$(($i-1))
                  srun -N 1 --relative $j $BIN/slurm_STAR.sh $j $nNodes &
              done
              wait
              rm Aligned.out.sam Log.out Log.progress.out
              rm -rf _STARtmp
              The slurm_STAR.sh shell script will align every Nth pair of fastq files (or single fastq file, as appropriate) in a preset directory. Every instance is run on an individual node. Note that I highly recommend using --exclusive if that's not otherwise the default on your cluster.

              Comment

              • sbdk82
                Member
                • Jul 2014
                • 26

                #8
                Thanks !!! I will try that

                Comment

                Latest Articles

                Collapse

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                24 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                29 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                39 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                61 views
                0 reactions
                Last Post SEQadmin2  
                Working...