Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 2-pass speed about 7 M/hr

    hello,

    is it possible that mapping speed for the 2nd pass decreases to 7 M/hr if 900.000 new splice sites and a comprehensive gene model (gencode v19) were used for index generation.
    first pass was ~100-fold faster (700 M/hr).

    my concrete syntax:
    /home/ws/SW_install/STAR/STAR/source/STAR --runThreadN 31 --outSAMstrandField intronMotif --outSAMtype BAM SortedByCoordinate --genomeDir $indices --readFilesCommand zcat --readFilesIn /home/ws/data/PatientData/$b/m*/$s/t*/date*/$f1 /home/ws/data/PatientData/$b/m*/$s/t*/date*/$f2 --outFileNamePrefix $name

    wiht 31 cores on a 192 GByte Scientific Linux 7 workstation.

    Can I do something to improve speed?

    dietmar

    Comment


    • Originally posted by neokao View Post
      Thanks, Alex.

      I don't know what's causing the problem for multi-threading on my Mac mini.
      I got the same error even after I rebooted the computer and started the mapping freshly.

      Anyways, I went ahead and started the mapping one by one with --runThreadN 1 already.
      It is weird that even with --runThreadN 1 in code like this:

      > STAR --genomeDir ./GenomeDir/ --readFilesIn ./BGI_RNAseq_data_2015/01.fq --runThreadN 1

      , it sometimes worked but sometimes did not.

      For the same .fq file, it could give the the Killed: 9 error and when I redid with the exact same code, it went through successfully. Very strange.

      My .fq files do have distinct prefixes and they are ordered by two digits number as described before: 01.fq, 02.fq, etc. Could you shed more light on --readFilesIn XX.fq --outFileNamePrefix XX ? Thanks.
      You can map each of the FASTQ files in the separate directory, e.g. 01/, 02/ ... The output files in all directories will have the same names, such as 01/Aligned.out.sam 02/Aligned.out.sam ...
      Alternatively, you can run all STAR jobs in one directory but with different prefixes corresponding to your FASTQ files, i.e.
      STAR --readFilesIn 01.fastq --outFileNamePrefix 01_
      STAR --readFilesIn 02.fastq --outFileNamePrefix 02_
      In this case the output file will have the specified prefixes for each of the runs, i.e.
      01_Aligned.out.sam, 02_Aligned.out.sam ...

      I suspect that there is some problem with RAM management as STAR takes almost all of the available RAM.
      Can you try to reboot your machine - I heard that this helps some Mac systems to "declutter" RAM?
      Also, please run "top" command while running STAR to see how much memory is being used.

      Cheers
      Alex

      Comment


      • Originally posted by dietmar13 View Post
        hello,

        is it possible that mapping speed for the 2nd pass decreases to 7 M/hr if 900.000 new splice sites and a comprehensive gene model (gencode v19) were used for index generation.
        first pass was ~100-fold faster (700 M/hr).

        my concrete syntax:
        /home/ws/SW_install/STAR/STAR/source/STAR --runThreadN 31 --outSAMstrandField intronMotif --outSAMtype BAM SortedByCoordinate --genomeDir $indices --readFilesCommand zcat --readFilesIn /home/ws/data/PatientData/$b/m*/$s/t*/date*/$f1 /home/ws/data/PatientData/$b/m*/$s/t*/date*/$f2 --outFileNamePrefix $name

        wiht 31 cores on a 192 GByte Scientific Linux 7 workstation.

        Can I do something to improve speed?

        dietmar
        Hi Dietmar,

        there were some reports of the slowdown in the 2nd pass. In one of the cases it was caused by the splice junctions (likely false positive) in the mitochondrion genome: https://groups.google.com/d/msg/rna-...Y/0jSn0vy0ccgJ.
        If filtering out chrM junctions does not help, please send me the list of junctions from the 1st pass and a few million reads for testing.

        Cheers
        Alex

        Comment


        • Originally posted by alexdobin View Post
          I suspect that there is some problem with RAM management as STAR takes almost all of the available RAM.
          Can you try to reboot your machine - I heard that this helps some Mac systems to "declutter" RAM?
          Also, please run "top" command while running STAR to see how much memory is being used.

          Cheers
          Alex
          I guess so too. I manually did these 20 .fq files with occasional Killed: 9 error. I found that it could usually go through if I run the EXACT code again (even without rebooting the OSX). However now I really got stuck with one biggest .fq file (~ 6.6G). For that particular .fg file, I got Abort trap: 6 error at ..... Started sorting BAM step. It happens everytime (tried 6~7 times so far even with a fresh reboot). I did not see anything weird with top command.
          I also tried the --limitIObufferSize 100000000 but still got the Abort trap: 6 error.
          It is frustrating since this is the last file to map. The particular log.out file is attached here. Thanks for the advice.
          Attached Files
          Last edited by neokao; 03-31-2015, 07:05 AM.

          Comment


          • Originally posted by neokao View Post
            I guess so too. I manually did these 20 .fq files with occasional Killed: 9 error. I found that it could usually go through if I run the EXACT code again (even without rebooting the OSX). However now I really got stuck with one biggest .fq file (~ 6.6G). For that particular .fg file, I got Abort trap: 6 error at ..... Started sorting BAM step. It happens everytime (tried 6~7 times so far even with a fresh reboot). I did not see anything weird with top command.
            I also tried the --limitIObufferSize 100000000 but still got the Abort trap: 6 error.
            It is frustrating since this is the last file to map. The particular log.out file is attached here. Thanks for the advice.
            Please try the latest STAR release https://github.com/alexdobin/STAR/re...ag/STAR_2.4.0k - I have improved the BAM sorting and it now should require less RAM. Also, it may be safer to use a separate BAM sorting limit for RAM, say --limitBAMsortRAM 10000000000 .

            Cheers
            Alex

            Comment


            • Originally posted by alexdobin View Post
              Please try the latest STAR release https://github.com/alexdobin/STAR/re...ag/STAR_2.4.0k - I have improved the BAM sorting and it now should require less RAM. Also, it may be safer to use a separate BAM sorting limit for RAM, say --limitBAMsortRAM 10000000000 .

              Cheers
              Alex
              I tried that particular .fq file using old STAR with SAM output and then it went through.
              I still want to test your new version.
              (Thanks for your new code.) However, I got error when I tried to compile it.
              clang: error: no such file or directory: 'htslib/libhts.a'
              make: *** [STARforMac] Error 1

              (I did install gcc on my OSX Yosemite)

              Comment


              • @neokao: You need to install the new htslib library that is part of the samtools package: http://sourceforge.net/projects/samt...iles/samtools/

                Comment


                • I did have SAMTOOLS installed. Say under my NGS folder, I have samtools-1.2 folder and STAR-STAR_2.4.0k folder. I did STARforMac in the source directory (in STAR-STAR_2.4.0k).
                  Any advice? Thanks.

                  Comment


                  • You could make a "htslib" directory in your STAR source and copy that file in there.

                    Comment


                    • Compilation on Mac is tricky because the default compiler - clang - does not support OMP used by STAR. Please try to compile with
                      make STARforMacStatic CXX=/path/to/gcc

                      Comment


                      • Originally posted by alexdobin View Post
                        Compilation on Mac is tricky because the default compiler - clang - does not support OMP used by STAR. Please try to compile with
                        make STARforMacStatic CXX=/path/to/gcc
                        Following your suggestion, I got a different error:
                        /bin/sh: /path/to/gcc: No such file or directory
                        make: *** [Depend.list] Error 127

                        I did install gcc.
                        gcc --version
                        Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
                        Apple LLVM version 6.0 (clang-600.0.57) (based on LLVM 3.5svn)
                        Target: x86_64-apple-darwin14.1.0
                        Thread model: posix

                        Thanks.

                        Comment


                        • Your gcc should be in /usr/bin. Confirm that by

                          Code:
                          $ which gcc
                          When you compile STAR do

                          Code:
                          $ make STARforMacStatic CXX=/usr/bin/gcc

                          Comment


                          • Yes. My gcc is there.
                            So I did make STARforMacStatic CXX=/usr/bin/gcc but still got error:

                            clang: error: unsupported option '-static-libgcc'
                            make: *** [STARforMacStatic] Error 1

                            Thanks folks.

                            Originally posted by GenoMax View Post
                            Your gcc should be in /usr/bin. Confirm that by

                            Code:
                            $ which gcc
                            When you compile STAR do

                            Code:
                            $ make STARforMacStatic CXX=/usr/bin/gcc

                            Comment


                            • See if the second answer in this thread helps:http://stackoverflow.com/questions/1...-osx-mavericks

                              Otherwise you will have to wait for Alex to respond.

                              Comment


                              • The Mac's /usr/bin/gcc (which is on the PATH so you can invoke it with simply gcc) symlinks to clang, so you are still trying to compile with clang.

                                When you installed (configured) true gcc, did you use --prefix option? You need to find installation path for the true gcc.
                                You need to be able to check the version:
                                /path/to/gcc/g++ -v

                                and it should say something like (not Apple LLVM clang etc):

                                Using built-in specs.
                                Target: x86_64-redhat-linux
                                Configured with: ../configure --prefix=/opt/hpc
                                Thread model: posix
                                gcc version 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC)


                                If it works, compile STAR with (it has to be g++, not gcc - I made a mistake in the previous post):
                                make STARforMacStatic CXX=/path/to/g++

                                Cheers
                                Alex

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                18 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                47 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X