Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • STAR Aligner

    I am running STAR for aligning wheat RNA-Seq data with Ensemble reference file . The size of reference file is 4gb. The genome directory created in the first step is 42 gb. The mapping step took more than 50 hours. Some jobs are still running for more than 75 hours

    I used 5 nodes with 100gb each in our university cluster . Here is the script I used
    HTML Code:
    #!/bin/sh
    #SBATCH --job-name=STAR
    #SBATCH --nodes=5
    #SBATCH --ntasks-per-node=1
    #SBATCH --time=120:00:00
    #SBATCH --mem=100g
    #SBATCH --error=<Error File Name>
    #SBATCH --output=<Output File Name>
    
    cd  /Dir_PATH/STAR
    
    ./STAR_2.4.0b/STAR --genomeDir /Dir_PATH/STAR/index  --readFilesIn  /File_PATH/L001_R1_001.fastq,/File_PATH/L002_R1_001.fastq File_PATH/L001_R2_001.fastq,/File_PATH/_L002_R2_001.fastq --outFileNamePrefix /Dir_PATH/<Prefix_Name>/ --runThreadN 10

    I ran BWA-MEM on same data and it took less than 10 hours to complete the mapping. Am I doing something wrong or do I need to choose some other parameters ?

  • #2
    That seems quite odd. You're giving each of the nodes different files, yes?

    Comment


    • #3
      I think it uses total 500 gb (100gb x 5 nodes) for this job. It does not distribute different files into different nodes.

      Comment


      • #4
        You're just running the same thing on all of the nodes. If you're doing the same with bwa mem then that's happening there as well.

        Comment


        • #5
          I'll add that loading the I/O overhead of loading the index and constantly overwritting itself could cause a slow down (I limit STAR two 4 concurrent instances on our cluster when outputting to SAM since otherwise I can't guarantee that the drives can keep up if any other jobs are running).

          Comment


          • #6
            So I should try with this?

            HTML Code:
            #SBATCH --nodes=1
            #SBATCH --ntasks-per-node=1
            #SBATCH --time=120:00:00
            #SBATCH --mem=100g

            Comment


            • #7
              Sure, though you don't need to specify --ntasks-per-node when you just use one node. For reference, here is the start of mine:

              Code:
              #!/bin/bash
              #SBATCH -J STAR-align
              #SBATCH -t 4:00:00
              nNodes=4
              #SBATCH -N 4
              #SBATCH -A ryand
              #SBATCH --exclusive
              #SBATCH --partition=work
              BIN=$WORK/bin
              i=0
              for i in `seq $nNodes`
              do
                  j=$(($i-1))
                  srun -N 1 --relative $j $BIN/slurm_STAR.sh $j $nNodes &
              done
              wait
              rm Aligned.out.sam Log.out Log.progress.out
              rm -rf _STARtmp
              The slurm_STAR.sh shell script will align every Nth pair of fastq files (or single fastq file, as appropriate) in a preset directory. Every instance is run on an individual node. Note that I highly recommend using --exclusive if that's not otherwise the default on your cluster.

              Comment


              • #8
                Thanks !!! I will try that

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                47 views
                0 likes
                Last Post seqadmin  
                Working...
                X