Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why is the samtools multi-threaded argument undocumented?

    There are a bunch of discussion threads regarding samtools multi-threaded argument (-@). However, I can't seem to find any official documentation regarding that. Does anyone know why that is? Is it safe to use? Does it work for only a subset of samtools commands?

    On a related note, are there other undocumented arguments that might be useful?

  • #2
    Weird - I never knew about that option. Maybe that's what was absorbed from the psamtools project that disappeared. It appears to be an option for 'view' and 'sort'. I just tried it with 'view' and samtools does in fact show > 100% cpu use and it converted sam to bam much faster than without so...I guess it's legit. I tried it with 'sort' as well and it worked fine - much faster.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      There is an open pull request to rename this to -p since the '@' sign causes trouble for some scripting languages like Perl,


      (This was opened a while ago so it may be too late in terms of breaking existing scripts)

      Comment


      • #4
        There are at least two different multi-threading implementations in Samtools. Heng Li's and Nils Homer's. The latter appears to be more efficient as it multi-threads decoding too, but it's less clear how to control it to just, say, 500% cpu (as it'll use the same number of threads encoding as decoding). Nils' version takes a -n parameter between "samtools" and the subcommand. Eg samtools -n 4 flagstats a.bam.

        Samtools is undergoing a lot of changes at the moment though with the migration to htslib, and in time the multi-threading issues will be addressed too.

        Comment


        • #5
          Originally posted by jkbonfield View Post

          Samtools is undergoing a lot of changes at the moment though with the migration to htslib, and in time the multi-threading issues will be addressed too.
          Is there an ETA for this?

          Comment


          • #6
            Not that I know of. I believe the first thing planned is a new official release of the reorganised code base, so some unknown length of time after that for threading investigations.

            We also plan to add CRAM support too, but right now I'm still working on that as part of Staden io_lib's sam/bam/cram code.

            Comment


            • #7
              It's not samtools, but maybe still of interest. I released the new version of Staden Package's io_lib (aka libstaden-read in some linux distribution) which contains a multi-threaded BAM/CRAM/SAM converter called scramble.

              The threading is still a work in progress and it should therefore be considered as experimental. It works well for BAM (comparable to Nils Homer's implementation), less well for CRAM (maximum maybe 6x speed up, but varies depending on data set), and is as yet only single-threaded for SAM I/O although the SAM reading and writing is far faster than in Samtools.

              A fully developed set of DNA sequence assembly (Gap4 and Gap5), editing and analysis tools (Spin) for Unix, Linux, MacOSX and MS Windows.


              The code hasn't been tested on Windows yet. Older io_lib releases work, but almost certainly the use of pthreads has broken that. I still need to back-port it to windows, or at least the MinGW/Msys environment.

              I'm NOT planning on adding things like sorting and the basic experiments at merging and pileup are really just demonstration / testing tools for myself. Fundamentally this library was written first and foremost to provide an I/O layer for Gap5 (and gap4, xgap, etc before that). Some of the code will make its way into Samtools later on though - specifically the CRAM implementation.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              30 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Working...
              X