Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • STAR Aligner

    I am running STAR for aligning wheat RNA-Seq data with Ensemble reference file . The size of reference file is 4gb. The genome directory created in the first step is 42 gb. The mapping step took more than 50 hours. Some jobs are still running for more than 75 hours

    I used 5 nodes with 100gb each in our university cluster . Here is the script I used
    HTML Code:
    #!/bin/sh
    #SBATCH --job-name=STAR
    #SBATCH --nodes=5
    #SBATCH --ntasks-per-node=1
    #SBATCH --time=120:00:00
    #SBATCH --mem=100g
    #SBATCH --error=<Error File Name>
    #SBATCH --output=<Output File Name>
    
    cd  /Dir_PATH/STAR
    
    ./STAR_2.4.0b/STAR --genomeDir /Dir_PATH/STAR/index  --readFilesIn  /File_PATH/L001_R1_001.fastq,/File_PATH/L002_R1_001.fastq File_PATH/L001_R2_001.fastq,/File_PATH/_L002_R2_001.fastq --outFileNamePrefix /Dir_PATH/<Prefix_Name>/ --runThreadN 10

    I ran BWA-MEM on same data and it took less than 10 hours to complete the mapping. Am I doing something wrong or do I need to choose some other parameters ?

  • #2
    That seems quite odd. You're giving each of the nodes different files, yes?

    Comment


    • #3
      I think it uses total 500 gb (100gb x 5 nodes) for this job. It does not distribute different files into different nodes.

      Comment


      • #4
        You're just running the same thing on all of the nodes. If you're doing the same with bwa mem then that's happening there as well.

        Comment


        • #5
          I'll add that loading the I/O overhead of loading the index and constantly overwritting itself could cause a slow down (I limit STAR two 4 concurrent instances on our cluster when outputting to SAM since otherwise I can't guarantee that the drives can keep up if any other jobs are running).

          Comment


          • #6
            So I should try with this?

            HTML Code:
            #SBATCH --nodes=1
            #SBATCH --ntasks-per-node=1
            #SBATCH --time=120:00:00
            #SBATCH --mem=100g

            Comment


            • #7
              Sure, though you don't need to specify --ntasks-per-node when you just use one node. For reference, here is the start of mine:

              Code:
              #!/bin/bash
              #SBATCH -J STAR-align
              #SBATCH -t 4:00:00
              nNodes=4
              #SBATCH -N 4
              #SBATCH -A ryand
              #SBATCH --exclusive
              #SBATCH --partition=work
              BIN=$WORK/bin
              i=0
              for i in `seq $nNodes`
              do
                  j=$(($i-1))
                  srun -N 1 --relative $j $BIN/slurm_STAR.sh $j $nNodes &
              done
              wait
              rm Aligned.out.sam Log.out Log.progress.out
              rm -rf _STARtmp
              The slurm_STAR.sh shell script will align every Nth pair of fastq files (or single fastq file, as appropriate) in a preset directory. Every instance is run on an individual node. Note that I highly recommend using --exclusive if that's not otherwise the default on your cluster.

              Comment


              • #8
                Thanks !!! I will try that

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                51 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                67 views
                0 likes
                Last Post seqadmin  
                Working...
                X