Unconfigured Ad

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    Introducing BBMap, a new short-read aligner for DNA and RNA

    BBMap will be publicly released soon, pending confirmation with LBL's legal department.

    In the meantime feel free to look at these graphs of its performance:



    Note that this is a 50MB powerpoint file. It contains graphs of relative performance of BBMap and other short read aligners (bwa, bowtie2, gsnap, smalt) mapping synthetic data.

    EDIT:
    This thread is now closed; please use this one to post questions.
    Last edited by Brian Bushnell; 11-10-2014, 12:09 PM.
  • kopi-o
    Senior Member
    • Feb 2008
    • 319

    #2
    Looks very impressive! Can it beat STAR (speed and accuracy wise) for RNA-seq though? (RNA-seq is listed as one of the use cases towards the end)

    Comment

    • Brian Bushnell
      Super Moderator
      • Jan 2014
      • 2709

      #3
      I have compared it to tophat, which it greatly outperforms in speed and has higher sensitivity on real RNA-seq data. I have not yet compared it to STAR - I tried to but was unable to get STAR to run without core-dumping so I gave up. I may have compiled it wrong; I'll try again eventually.

      However, I don't have a really good tool for generating and evaluating synthetic RNA-seq data, so it's harder to quantify. The closest I can get is to generate synthetic DNA reads with very large deletions, which is not quite the same thing since RNA-seq data has other strange artifacts and the introns are not distributed randomly.

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        It'd be great if you could get in touch with the authors of this paper and just use their test datasets. That would allow comparisons against most of the popular aligners out there.

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #5
          Originally posted by dpryan View Post
          It'd be great if you could get in touch with the authors of this paper and just use their test datasets. That would allow comparisons against most of the popular aligners out there.
          Data is available here:


          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            Originally posted by dpryan View Post
            It'd be great if you could get in touch with the authors of this paper and just use their test datasets. That would allow comparisons against most of the popular aligners out there.
            Thanks for the suggestion; I'll look into that!

            Comment

            • dietmar13
              Senior Member
              • Mar 2010
              • 107

              #7
              Rum

              why is RUM always neglected by comparing RNA-seq mappers?
              In my hands RUM outperforms other pipelines, e.g. tophat, in sensitivity, especially for spliced reads...

              RUM: RNA Seq Unified Mapper
              RNA-Seq Unified Mapper. Contribute to itmat/rum development by creating an account on GitHub.


              RUM is rather slow, but using multithreaded servers allows mapping in tolerable time (compared to sample and library generation and data interpretation)

              dietmar

              Comment

              • Corydoras
                Member
                • Jan 2014
                • 20

                #8
                Hi Brian,

                I got a file with cleaned sequence data and I want to assemble this de-novo using velvet. Due to the nature of the sequencing and the library protocol, my kmer coverage is quite variable and I wanted to use BBnorm to normalize the coverage a bit to aid the assembly. Am I correct that BBnorm is the right thing to use for this?

                Anyway, currently trying to give it a go and I got this error message:

                bbmap$ sh bbnorm.sh in=Fowleri_combined.fastq out=normFowleri.fastq target=15
                bbnorm.sh: 104: bbnorm.sh: Bad substitution
                bbnorm.sh: 112: bbnorm.sh: [[: not found
                bbnorm.sh: 112: bbnorm.sh: [[: not found
                bbnorm.sh: 118: bbnorm.sh: source: not found
                bbnorm.sh: 119: bbnorm.sh: parseXmx: not found
                bbnorm.sh: 120: bbnorm.sh: [[: not found
                bbnorm.sh: 123: bbnorm.sh: freeRam: not found
                java -ea -Xmxm -cp /home/martin/Downloads/bbmap/current/ jgi.KmerNormalize bits=32 in=Fowleri_combined.fastq
                Invalid maximum heap size: -Xmxm
                Could not create the Java virtual machine.

                Any ideas?

                Many thanks,

                Sarah

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  #9
                  Sarah,

                  Yes, BBNorm is the correct tool.

                  I'm not sure, but I suspect that your shell is not bash. You could retry the command with "bash" instead of "sh", which may work. But the easier thing is just to skip the shellscript and invoke java manually:

                  java -ea -Xmx14g -cp /home/martin/Downloads/bbmap/current/ jgi.KmerNormalize bits=32 in=Fowleri_combined.fastq out=normFowleri.fastq target=15

                  That command would work if you had 16g of RAM. Just set the -Xmx parameter (highlighted in purple) to about 85% of however much RAM is on the machine. If you don't know, you should be able to find out like this on a Linux system:

                  cat /proc/meminfo

                  ...then look at the first line, "MemTotal".

                  However, 15x is a fairly low target depth. For velvet I would suggest at least 30x for an optimal assembly, unless you just don't have enough data.

                  -Brian

                  Comment

                  • Corydoras
                    Member
                    • Jan 2014
                    • 20

                    #10
                    Hi Brian,

                    That worked like a charm, thank you! The normalization also greatly improved the assemblies and the kmer-coverage distribution looks much nicer. I was just wondering: by default, bbnorm will use a kmer of 31. But for my assembly I am using 41. The assembly works fine, but is it advisable to normalize the coverage using a kmer of 41?

                    Thanks,

                    Sarah

                    Comment

                    • Brian Bushnell
                      Super Moderator
                      • Jan 2014
                      • 2709

                      #11
                      Sarah,

                      It might be better to normalize using a kmer length of 41, but BBNorm only supports a maximum of 31 In practice, it should make very little difference, though. Using long kmers is important for assembly, as it helps span short repeats that would otherwise cause contigs to terminate. But normalization is much less sensitive to that issue, and very long kmers can cause problems in the presence of errors. With k=31, a 100bp read with 1 error could yield 31 kmers with a depth of 1, out of a total of 70 kmers - in that case, the median depth would not be impacted. With k=63, there could be 63 of the 70 total kmers spanning the error, thus having a depth of 1, so the median depth of the read would look like 1 instead of its correct value. And BBNorm normalizes based on the median kmer depth of a read.

                      It's a lot more computationally efficient to use a max kmer length of 31, so that's how I designed it. I've tried shorter kmers down to about k=25 and not noticed an appreciable difference in normalization or error correction.

                      As for your prior (deleted) post, sorry for not responding - I think the problem was that you were running Java 6 instead of Java 7. Most of the programs in BBTools work fine in Java 6 but it looks like BBNorm requires Java 7 (or higher).

                      Comment

                      • Corydoras
                        Member
                        • Jan 2014
                        • 20

                        #12
                        Hi Brian,

                        Thanks so much for that explanation . I thought I wouldn't be able to go past 31 but it is best to double check.

                        Sorry as well for just deleting my post (and bombarding you with simple questions, new to the world of NGS!), I played around with updating the Java on our Linux machine and that did the trick .

                        Thanks again for your help! And the fantastic and easy to use script!!

                        Sarah

                        Comment

                        • muol
                          Member
                          • Jun 2012
                          • 10

                          #13
                          Hi Brian,

                          Is there an option to set read quality encoding in bbnorm? I had to set qin=33 in bbduk for some Illumina 1.9 paired end libraries, but this option doesn't seem to exist in bbnorm (used BBMap v. 32.32 for Java 7).

                          Thanks
                          Olaf

                          Comment

                          • Brian Bushnell
                            Super Moderator
                            • Jan 2014
                            • 2709

                            #14
                            Olaf,

                            It's there, I just forgot to document it; sorry! I'll add that to the shellscript in the next release. I think that all of the programs in the package that read fastq input allow the "qin" flag.

                            -Brian

                            Comment

                            • muol
                              Member
                              • Jun 2012
                              • 10

                              #15
                              Indeed, just tried it and it works well with bbnorm.

                              Thanks
                              Olaf

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                Yesterday, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...