Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A faster read sorter

    Hello,
    I recently developed a fast stream-based .SAM sorter, streamsorter. It reads .SAM output from a stream (for instance, produced by bwa or another aligner), sorts the reads as they come in, and emits everything to standard out. Because this happens during alignment, and sorting is much faster than aligning, sorting takes essentially no extra time.

    Here's the basic usage for use with bwa:

    (long bwa command to create alignment) | sorter > sorted.sam

    Or if you prefer bam output:

    (long bwa command to create alignment) | sorter | samtools view -Sb - > sorted.bam

    For big alignments this can easily shave a substantial amount of time from the pipeline. Enjoy! Let me know if you find a bug or have other feedback.
    Last edited by brofallon; 04-25-2013, 02:02 PM.

  • #2
    How is this better than doing this via samtools? i.e. aligner -> {SAM} -> samtools view -> {uncompressed BAM} -> samtools sort -> {sorted BAM}

    Code:
    $ alignment arg1 arg2 ... | samtools -S -u - | samtools sort - prefix

    Comment


    • #3
      Huh... that -u option on samtools view does seem to speed things up a bit. Stream sorter is still a bit faster for two reasons. First, it doesn't depend on piping anything though samtools view to make an uncompressed bam that samtools sort can read. Second, samtools sort doesn't create bams very quickly. Streamsorter allows plugging in faster bamifier:

      Time to align, sort, and bamify ~500MB fastq;
      BWA mem + samtools : 643 seconds
      BWA mem + sorter : 594 seconds
      BWA mem + sorter + samtools-mt to create bam: 495 seconds

      Where samtools-mt is multithreaded samtools with 4 threads.
      OK, not a huge speedup. But we look for everything we can get around here....

      Comment


      • #4
        Try some larger files - the speedup may be more visible then.

        I'd have to look at the code, but in principle 'samtools sort' could be extended to accept SAM as an input option.

        Comment


        • #5
          just wanted to say that novosort (novocraft.com) is probably the fastest read sorting I have seen.

          In second place is the OpenGE's mergesort function which is open source (git://github.com/adaptivegenome/openge.git)

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 11:49 AM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, Yesterday, 08:47 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          61 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X