Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by StaciaWyman View Post
    Good morning--I get page not found error when I go to the above link--is there an updated one? Thanks!
    Stacia
    Check out the different branches available, as well as the commits:
    Contribute to nh13/samtools development by creating an account on GitHub.


    If you are brave, I have also been working on getting this into Picard:
    A set of tools (in Java) for working with next generation sequencing data in the BAM (http://samtools.sourceforge.net) format. - nh13/picard


    The Picard developers are more receptive than the samtools develpers to a patch.

    Comment


    • #32
      Hi Nils, why dont you use pigz/unpigz which is parallelized gzip/gunzip? It takes all the arguments as normal gzip and with -p one can specify the number of threads.

      Comment


      • #33
        Do you mean I should use the PIGZ API? Of course I could compress a SAM file with pigz, but the advantage of the BAM file (which is block gzip compressed) is the ability to index the file and then do random retrieval based on genomic coordinates.

        Can you give an example of what you mean?

        Comment


        • #34
          any update on the mpileup ??

          Comment


          • #35
            Originally posted by ersgupta View Post
            any update on the mpileup ??
            No, the individual tools were not multi-thread, just the reading/writing of the SAM/BAM files, which can be a bottleneck.

            Comment


            • #36
              Guys, we have a working version of a faster mergesort for BAMs:



              Source is also there but if you want to test speed you can grab the binary to make things easy. We implemented SAM to BAM, mergesort, mark duplicates, and some other routines.

              Would love feedback on whether it is faster or not than what others are doing...

              Comment


              • #37
                For a non-computer intelligent person like myself, I am confused regarding if I should update beyond samtools 0.1.18 to the new multithreaded versions, my confusion mainly stemming from not knowing if they work or not and how to download them. Is there some web page anywhere that documents the changes being made and when they are considered working and safe to use, and then where to download them from?

                Comment


                • #38
                  Hi Nils,

                  I gave it a try (0.1.18-r572) and had mixed results.

                  Success: going from sam to bam (samtools import) on a 102Gb sam file results in a ~10X speedup on a 24-core (HT) machine with 192GB RAM, and the output bam (27 Gb) file matches one generated from the general non-mt release (0.1.18 r982:295) (using diff).

                  Failure: sort fails with error "failed to create threads" when it attempts to merge all the intermediate sorted bam files. Running samtools merge on the same set of files also fails with the same error. Tried -n 6, 12 and 24 with no success. The general non-mt release completes the sort and merge successfully.

                  Suggestions?

                  Comment


                  • #39
                    https://github.com/nh13/samtools ?

                    How is this project going?

                    Comment


                    • #40
                      Originally posted by Richard Finney View Post
                      https://github.com/nh13/samtools ?

                      How is this project going?
                      I haven't taken a look at the sort problem, but not much else.

                      Comment


                      • #41
                        We have a parallelized mergesort, source is on github... See my post a couple posts up.... Would love feedback and suggestions, etc...

                        Comment


                        • #42
                          Originally posted by adaptivegenome View Post
                          We have a parallelized mergesort, source is on github... See my post a couple posts up.... Would love feedback and suggestions, etc...
                          Can you plugin the "pbgzf.c" source code into bamtools? I have been playing around with bamtools and I quite like it. The bottleneck in reading/writing BAM files is the compression/decompression.

                          Comment


                          • #43
                            Originally posted by nilshomer View Post
                            Can you plugin the "pbgzf.c" source code into bamtools? I have been playing around with bamtools and I quite like it. The bottleneck in reading/writing BAM files is the compression/decompression.
                            You are correct. We built a multithreaded version of a combined merge and sort from bamtools and after lots of work we got a 5X increase over the serial implementation. In comparison novosort offers a 10X increase and this is probably because we still are stuck with bamtool's serial I/O.

                            We are now replacing the serial I/O with a parallel I/O but its taken a bit of time to do. We should have something soon.

                            One other thing is that we also included (optionally) MarkDuplicates as part of mergesort so this speeds things up as well...

                            Comment


                            • #44
                              Originally posted by adaptivegenome View Post
                              You are correct. We built a multithreaded version of a combined merge and sort from bamtools and after lots of work we got a 5X increase over the serial implementation. In comparison novosort offers a 10X increase and this is probably because we still are stuck with bamtool's serial I/O.

                              We are now replacing the serial I/O with a parallel I/O but its taken a bit of time to do. We should have something soon.

                              One other thing is that we also included (optionally) MarkDuplicates as part of mergesort so this speeds things up as well...
                              Could you use the implementation that I made to do parallel I/O?

                              Comment


                              • #45
                                Oh I see what you are saying. Yes, let me check it out and see if I can figure it out!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                30 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X