Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • antifolate
    Member
    • Aug 2015
    • 52

    How do you specify error rate in BBduk adapter trimming?

    Hello,

    I'm using BBduk to trim adapter sequences from my reads. From the help manual I see the editdistance and hammingdistance options, but they set a fixed number of mismatches, independent of the length of the match. Is it possible to specify an error rate (like 10%) that allows errors in the match depending on the length of the match?

    This feature is available in cutadapt, but I'd rather do it with bbduk if possible to shorten my pipeline.
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    For Illumina reads, you don't need to worry about the edit distance, just hamming distance. BBDuk supports a variable hamming distance and variable kmer length, so the error rate would be the kmer length divided by number of mismatches. So, k=25 with hdist=2 would allow an 8% error rate. As would edist=2, if you need to allow edits.

    With the "tbo" flag (for paired reads) the effect error rate allowed in the adapter portion is much higher (100%, technically), and with the "tpe" flag again a 100% error rate is allowed in 1 of 2 paired reads, so the error rate settings are not directly comparable since the methods are different.

    Comment

    • antifolate
      Member
      • Aug 2015
      • 52

      #3
      Thanks Brian. I have a question about edit distance: if I allow say editdistance=2, would it catch potentially informative bases after the found adapter? For example, if I have the adapter "ADAPTER" and the sequences:

      ADAPTERactg
      ADAPTcERactg
      cADtAPTERactg

      would they become:

      tg
      ctg
      actg

      or does the edit distance only apply before the last base in the adapter? As you can tell, in the first and second examples, we lost part of the regular sequence. The last example is what I'd like to use the edit distance for.

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        Assuming you are doing left-trimming, then in the above example, yes, those results are correct. You would get tg, ctg, and actg respectively.

        Comment

        • antifolate
          Member
          • Aug 2015
          • 52

          #5
          Thanks for the help.

          Comment

          • quattrinia
            Junior Member
            • Jun 2013
            • 5

            #6
            I have a question regarding adapter trimming and whether you have further recommendations for adapter removal?

            I have truseq indexed adapters and MiSeq reads (2*300)

            I used ktrim=r k=21 mink=11 hdist=2 tpe tbo with the following results.

            Added 5659609 kmers; time: 2.105 seconds.
            Memory: max=115011m, free=104210m, used=10801m

            Input is being processed as paired
            Started output streams: 0.022 seconds.
            Processing time: 30.272 seconds.

            Input: 5927766 reads 1784257566 bases.
            KTrimmed: 1190226 reads (20.08%) 56818182 bases (3.18%)
            Trimmed by overlap: 47342 reads (0.80%) 301726 bases (0.02%)
            Result: 5927544 reads (100.00%) 1727137658 bases (96.80%)

            Fastqc denotes that adapters have been removed, but kmer content is still fairly high. If I set k=15, approximately 50% of the reads are trimmed. This seems more appropriate, but I am also concerned that I am removing too much. Any thoughts?

            Thanks in advance~

            Comment

            • Brian Bushnell
              Super Moderator
              • Jan 2014
              • 2709

              #7
              What kind of library/experiment is this? Also, what does the quality profile look like? Posting the FastQC results would be helpful, for example. Running BBMerge for an insert size histogram could also be useful, to see how many adapter sequences you should expect:
              Code:
              bbmerge.sh reads=1m in1=r1.fq in2=r2.fq ihist=ihist.txt vloose mininsert=15 outa=adapters.fa
              Only pairs with insert size shorter than 300bp should contain adapters. If you run that, please post the screen output and both of the output files (adapters.fa and ihist.txt).

              2x300 runs often have extremely low quality, particularly in low-diversity libraries like 16S amplicons. It's harder to detect adapters with a very high error rate, so you may need more aggressive settings, but K=15 and hdist=2 with the full set of Illumina adapter sequences will probably get you a lot of false positives.

              Comment

              • quattrinia
                Junior Member
                • Jun 2013
                • 5

                #8
                Hi,

                Thanks for the quick reply. This is a denovo whole genome sequencing project of an invertebrate run with TruSeq adapters and a MiSeq Reagant kit 3. This species was multiplexed with 8 others. The indexed barcode for this species is TGACCA

                Here is the outcome of the bbmerge
                Pairs: 1000000
                Joined: 539856 53.986%
                Ambiguous: 419335 41.934%
                No Solution: 40700 4.070%
                Too Short: 109 0.011%

                Avg Insert: 329.2
                Standard Deviation: 87.7
                Mode: 317

                Insert range: 15 - 595
                90th percentile: 441
                75th percentile: 384
                50th percentile: 326
                25th percentile: 275
                10th percentile: 233

                Adapters
                >Read1_adapter
                AGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAANCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAA
                >Read2_adapter
                N

                Ihist is attached as well as QC output before trimming and adapter removing. Please let me know if I can provide anything else.
                Attached Files

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  #9
                  Oh, that's not good. You have a lot of adapter sequence, but unfortunately the read quality dropped to zero at the end and is generally terrible. That will make the data very hard to use for a quality assembly. You'll definitely need to do extensive quality-trimming.

                  If this data was generated for by a paid facility, they should replace it for you at no cost; and if it was generated internally, Illumina should replace the reagents at no cost, because it fails their specifications (assuming nothing went wrong in library creation). If you really need to use it, though, try quality-trimming to Q15 or so (qtrim=rl trimq=15). You can do more aggressive adapter trimming if you want, with say "hdist=3 hdist2=2" to yield an overall command of:

                  Code:
                  bbduk.sh (files) ktrim=r k=21 mink=11 hdist=3 tpe tbo qtrim=r trimq=15
                  For the adapter sequences, I suggest you download the latest version of BBTools (36.14) and run BBMerge again; the version you are using is a little older and has a bug that was preventing read 2's adapter-sequence from being determined. Using the actual adapter sequences of your reads is better for specificity than using all of the possible adapter sequences that come bundled with BBMap. Normally this does not matter, but when you go all the way to hdist=3 specificity becomes more important.

                  Most of the adapter sequences will be removed via quality-trimming anyway, though; the reason they are not recognized is because the quality is low, so after the low-quality sequence is removed, only high-quality adapters would be present, which would be recognized.

                  Comment

                  • quattrinia
                    Junior Member
                    • Jun 2013
                    • 5

                    #10
                    Thanks for the information. The average quality score of Q>30 for each sample was 59-70%. I have seen much worse than these data, so I thought this was OK! Thanks for the insight, I will definitely follow up.

                    Comment

                    • quattrinia
                      Junior Member
                      • Jun 2013
                      • 5

                      #11
                      Oh-And would you recommend quality trimming before adapter removal?

                      Comment

                      • Brian Bushnell
                        Super Moderator
                        • Jan 2014
                        • 2709

                        #12
                        At the same time. When you tell BBDuk to do both in one pass, it does quality-trimming after adapter-trimming, which is optimal. If you do quality-trimming first (with left-trim, as in qtrim=rl) you can't use the tbo and tpe flags anymore because their assumptions are violated.

                        Comment

                        Latest Articles

                        Collapse

                        • SEQadmin2
                          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                          by SEQadmin2


                          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                          Here are nine questions we think about, in roughly the order they matter, before...
                          06-18-2026, 07:11 AM
                        • SEQadmin2
                          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                          by SEQadmin2


                          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                          ...
                          06-02-2026, 10:05 AM
                        • SEQadmin2
                          Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                          by SEQadmin2


                          With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                          Introduction

                          Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                          05-22-2026, 06:42 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, 06-17-2026, 06:09 AM
                        0 responses
                        21 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-09-2026, 11:58 AM
                        0 responses
                        38 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-05-2026, 10:09 AM
                        0 responses
                        45 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-04-2026, 08:59 AM
                        0 responses
                        49 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...