Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    Because your literal adapter is also matching the beginning of the read entire sequence to the right is removed (ktrim=r). If you change the sequence at the beginning by one base you will see that the initial part is retained as you expect.
    Last edited by GenoMax; 07-20-2017, 06:59 AM.

    Comment

    • Stimpsky
      Junior Member
      • Nov 2009
      • 2

      Ah OK, that explains it, nevermind. I thought ktrim=r would go from right to left and stop at the first hit. Thanks for the quick reply!

      Comment

      • cuencam
        Junior Member
        • Aug 2017
        • 5

        Hi Brian,
        I would like to switch from solexaqa to BBduk for my read trim and filtering option, however since we are working with bacterial strains typification we would like to have the option to only keep trimmed reads where no individual base has a quality lower than a defined threshold (instead of average region quality). Could this be done with BBduk?

        Thanks!

        Comment

        • Brian Bushnell
          Super Moderator
          • Jan 2014
          • 2709

          Hi cuencam,

          It is currently not possible to do this, other than discarding all reads that have any undefined (quality 0) bases with "maxns=0". I never saw a reason to discard all reads with a single base below a specified cutoff. It would be simple enough to add (or implement via a custom script), but can you explain why you're doing it? The "average region quality" method is the best at maximizing coverage while minimizing the total number of errors.

          Edit - anyway, it was quick to add, so it will be in the next release as the "mbq" ("minbasequality") flag.
          Last edited by Brian Bushnell; 08-14-2017, 10:44 AM.

          Comment

          • cuencam
            Junior Member
            • Aug 2017
            • 5

            Hi Brian,
            Thanks for such a quick response, and implementing this so fast!
            Our interest is that during SNV calling in low coverage regions or low abundance taxa (in metagenomics) base quality can be more important than coverage. This way we can assess properly different alleles and avoid the creation of artifacts
            Cheers!

            Edit - Do you have an estimated next release date?
            Last edited by cuencam; 08-15-2017, 12:53 AM. Reason: Adding a question

            Comment

            • cuencam
              Junior Member
              • Aug 2017
              • 5

              Hi Brian,
              In the same lines of my previous question, what is the rationale of using maq=10? We are interesting in de novo assembly of metagenomic data and we were worried that low quality bases at the ends of the reads might feed artificial k-mers in to the assembler (SPADES). I read that you recommend read normalization, but since our coverage is highly unequal (due to unequal species abundance, not because sequencing artifacts) we are worried that this might introduce more biases than the ones it solves.

              We were thinking on using your newly implemented option "mbq" to secure that all bases have 20 as minimum quality. Do you believe that this is a good alternative?

              Comment

              • Brian Bushnell
                Super Moderator
                • Jan 2014
                • 2709

                "maq=10" is to throw away really junky reads. The only way to really verify whether a setting is beneficial is to actually test it, unfortunately. But personally, I think "mbq=20" would be too aggressive (particularly if your sequencing run had a single low-quality cycle, in which case it would discard all of the data)... if you really want to get rid of the low-quality trailing bases, I'd suggest quality-trimming instead (qtrim=r trimq=14 or something like that). Spades is pretty robust with respect to low-quality data anyway; the biggest problem is that it low quality reads balloon the kmer-space which can make it run out of memory.

                The main advantage of normalization with metagenomes, in fact, is that it removes a lot of data which allows Spades to run on datasets that it can't otherwise handle. It's not strictly beneficial and if you can assemble a metagenome without normalization, that may be better - sometimes normalization improves the assembly, sometimes it doesn't.

                Comment

                • cuencam
                  Junior Member
                  • Aug 2017
                  • 5

                  Thanks for this response! I'm pretty sure that your excellent user support is only comparable to the high quality of your tools!

                  I will implement quality-trimming at a higher threshold and then test. I do agree that mbq=20 is hard for assembly (but probably useful for SNV).
                  Cheers

                  Comment

                  • EssigSchurke
                    Junior Member
                    • Jul 2013
                    • 5

                    Hi Brian,

                    I tried to filter reads longer 10bp. I used the following command:

                    Code:
                    bbduk.sh -in=input.fq -out=output.fq -maxlength=10
                    However, nothing happens, I get the same amount of reads as in the input. But all reads are longer 10bp.
                    I used the latest version of bbduk 37.53

                    Test Input:

                    Code:
                    @test
                    ACTGGACTTGGAGTCAGAAGGC
                    +
                    b\\[\ZZ[][a]_]]cbbbabc
                    Code:
                    Input:                  	1 reads 		22 bases.
                    Total Removed:          	0 reads (0.00%) 	0 bases (0.00%)
                    Result:                 	1 reads (100.00%) 	22 bases (100.00%)

                    Comment

                    • jazz710
                      Member
                      • Oct 2012
                      • 41

                      The BBDuk commands don't have '-' before them. Your command should read:

                      bbduk.sh in=input.fq out=output.fq maxlength=10

                      Give that a shot?

                      Comment

                      • EssigSchurke
                        Junior Member
                        • Jul 2013
                        • 5

                        With or without "-" does not matter, I get same results.

                        Comment

                        • cuencam
                          Junior Member
                          • Aug 2017
                          • 5

                          Hi EssigSchurke
                          The flag is minlength=10

                          The whole command is
                          bbduk.sh in=input.fq out=output.fq minlength=10

                          Edit:

                          I misread your question. The command provided by jazz710 is the appropriate, and works on my computer. You want to remove the big reads, correct?
                          Last edited by cuencam; 09-15-2017, 05:33 AM.

                          Comment

                          • EssigSchurke
                            Junior Member
                            • Jul 2013
                            • 5

                            Hi cuencam,

                            minlength=10 filters only reads shorter 10bp. I want to filter reads longer 10bp, whereas 10bp is only a dummy for my test case.

                            Comment

                            • EssigSchurke
                              Junior Member
                              • Jul 2013
                              • 5

                              Yes, I want to exclude large reads, but I tested the command provided by jazz710. It produces the same result, the test read is still in the output.

                              Comment

                              • Brian Bushnell
                                Super Moderator
                                • Jan 2014
                                • 2709

                                Actually, all the BBTools strip off the leading "-" so you can put as many of them as you want

                                This is a bug. Thanks for the report! It looks like BBDuk only removes reads under minlen or over maxlen if they were trimmed; untrimmed sequences will pass regardless of their length. Sorry about that! Reformat actually works correctly in this case, though:

                                Code:
                                reformat.sh in=x.fq out=y.fq minlen=A maxlen=B
                                I'll fix BBDuk ASAP. Thanks again!

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Today, 08:59 AM
                                0 responses
                                8 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                17 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...