Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    Have you looked at the in-line help for "splitnextera.sh"? The adapters for Mate Pair libraries are in the "adapters.fa" file so you should be able to trim them as usual (Nextera_LMP_Read1_External_Adapter, Nextera_LMP_Read2_External_Adapter)

    Comment

    • kokyriakidis
      Member
      • Jul 2018
      • 12

      Different Libraries

      Originally posted by GenoMax View Post
      Have you looked at the in-line help for "splitnextera.sh"? The adapters for Mate Pair libraries are in the "adapters.fa" file so you should be able to trim them as usual (Nextera_LMP_Read1_External_Adapter, Nextera_LMP_Read2_External_Adapter)
      In SplitNextera guide it states that it it different from LMP. Nextera mate pair is not the same as Mate pair library v2 and also, Mate pair library v2 does not have these LMP adapters you mentioned!

      From JGI site:
      "SplitNextera splits Nextera LMP libraries into subsets based on linker orientation. It is designed strictly for Nextera LMP (long-mate-pair) reads, not for normal libraries using a Nextera kit. Nextera LMP libraries must be split prior to further processing; they are not usable raw. Adapter-trimming should still be done on Nextera LMP libraries prior to splitting."

      Mate SamplePrep V2 Documentation:

      Comment

      • jamie225
        Junior Member
        • Jul 2018
        • 2

        hi. i am new here how are you all.

        Comment

        • jsena33
          Junior Member
          • Jul 2018
          • 2

          Hi All,

          Is it possible to match degenerate sequences like below, trim the sequences and place the degenerate sequences in the fastq header? I am attempting to trim an adapter with the following structure Adapter(21nt)-UMI(16nts)-Adapter(24nt) and place it in the fastq header.

          Matching degenerate sequences such as primers:
          bbduk.sh in=reads.fq out=matching.fq literal=ACGTTNNNNNGTC copyundefined k=13 mm=f

          Thank you for your help!

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            @jsena33: You should take a look at UMI tools for this type of application.

            Comment

            • jsena33
              Junior Member
              • Jul 2018
              • 2

              Hi GenoMax,

              Thanks for the suggestion! I have used UMI tools which works ok but I am working with long reads with a higher error rate (indel bias) than Illumina reads. Therefore, it is likely that the adapter and UMI will contain indels so the adapter structure may actually look like this, Adapter(19-21nt)-UMI(14-16nts)-Adapter(22-24nt).

              Thanks again for your advice!

              Comment

              • FlySquirrelFly
                Junior Member
                • Oct 2018
                • 2

                Hi all,

                Newcomer to RNA-seq/bbduk/the forum here.. I have a question that's probably really basic, but I have read through the bbduk docs, ctrl+F'ed "maq" and "minavgquality" through all 16 pages of this thread, and tried googling; all to no avail. So here I am.

                When using `minavgquality` (`maq`), I'm very puzzled as to how the "average quality" is calculated. I was filtering full-length reads (91bp) by average quality (no trimming involved), and was expecting a very straightforward calculation -- taking the unweighted mean of individual Phred scores across individual bases.

                For example, with

                @A00325:34:H3FM7DRXX:1:1101:1208:1047 2:N:0:0NACTCTAA
                AGTCGTACCGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACAAAGAAAAGTAAACTGCGTTTATACCAATGCGTCCGCGGACAGGCGTTT
                +
                FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,,F,,F,F,,,F,,:F,,,FF:F,F,:,,,,,,,FF:F::,,F:::F,:F

                There are 25 `,`, 10 `:`, and 56 `F`. Under Illumina 1.8+ encoding scheme, I was expecting something like (25*(44-33)+10*(58-33)+56*(70-33))/(25+10+56)=2597/91=28.5.

                I was shocked when this read got filtered with an `maq` of 20.

                I looked through some of the source code mentioning `minAvgQuality`. In BBQC.java and RQCFilter2.java, the default `minAvgQuailty` settings seem to be 8 and 5 respectively. This, plus the fact that when I tried `maq=30` all (!) my reads were filtered, made me suspect that bbduk calculates "average quality" differently somehow? Can someone please explain this? (Is this what the "Phred algorithm" alluded to by the bbduk doc is referring to?)

                (I read here during my googling attempt that "Calculating average Q (Phred) scores is a bad idea". But it's something that our lab routinely does and I think my PI would want me to do it anyways..)

                Command I was using (version 38.25):

                bbduk.sh in=raw.fastq out=raw_qual-pass.fastq outm=raw_qual-fail.fastq maq=20 ordered=t

                (also tried adding `k=91` since all my reads are 91bp, `qin=33`, `qout=33`; no difference whatsoever)

                Thanks!

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  @FlySquirrelFly: It has become difficult to get a hold of Brian (due to his day job responsibilities) but I will flag your post for him to see if he can respond.

                  I recall from some past discussion that average quality is calculated as a rolling window average and as soon as it drops below your set value it will trim/filter the rest of the read.

                  You should also consider this:
                  Note - if neither ktrim nor kmask is set, the default behavior is kfilter.
                  All three are mutually exclusive.
                  You may want to explicitly set "qtrim=" if you only want to quality trim. You may also want to use
                  trimq=6 Regions with average quality BELOW this will be trimmed, if qtrim is set to something other than f.
                  Unless you are doing de novo work there is generally no need to filter based on quality. If you have a good reference to align to data as low as Q10 should still be usable.
                  Last edited by GenoMax; 10-04-2018, 04:44 AM.

                  Comment

                  • FlySquirrelFly
                    Junior Member
                    • Oct 2018
                    • 2

                    @GenoMax:

                    Thanks for your quick reply and for flagging the post for Brian! Much appreciated.

                    Indeed, since I did not set ktrim or kmask, kfilter should have been carried out (which is what I intended).

                    I wanted to filter based on the average quality of the full-length read, so I did not use the options related to and including`qtrim`.

                    I'm doing two separate analyses. One involves the canonical type of transcriptomic analysis (quantification of gene expression, differential expression analysis, etc). For that, like you said, there's probably no need to filter based on quality. The other involves doing some de novo assembly using the raw reads (for antibody V(D)J receptor). I figured that for the latter it'd probably be nice to have an extra layer of QC.

                    Comment

                    • dariober
                      Senior Member
                      • May 2010
                      • 311

                      bbduk with bzip2 input file

                      Hi- Is bzip2 input supported by `bbduk.sh`? When I try it, bbduk seems to hang as below. (It would be good to have support for bzip2)

                      Thanks!

                      Code:
                      bbduk.sh in=/scratch/dberaldi/Texas_Biobank/TCRBOA1-N-WEX.read1.fastq.bz2 out=stdout.fq
                      java -Djava.library.path=/home/db291g/test-setup-travis/downloads/bbmap/jni/ -ea -Xmx14666m -Xms14666m -cp /home/db291g/test-setup-travis/downloads/bbmap/current/ jgi.BBDukF in=/scratch/dberaldi/Texas_Biobank/TCRBOA1-N-WEX.read1.fastq.bz2 out=stdout.fq
                      Executing jgi.BBDukF [in=/scratch/dberaldi/Texas_Biobank/TCRBOA1-N-WEX.read1.fastq.bz2, out=stdout.fq]
                      Version 37.98 [in=/scratch/dberaldi/Texas_Biobank/TCRBOA1-N-WEX.read1.fastq.bz2, out=stdout.fq]
                      
                      NOTE: No reference files specified, no trimming mode, no min avg quality, no histograms - read sequences will not be changed.
                      0.028 seconds.
                      Initial:
                      Memory: max=14737m, free=14276m, used=461m

                      Comment

                      • enma_ai
                        Junior Member
                        • Dec 2018
                        • 2

                        pls help

                        May i know how to fix this? this the command I entered

                        $ ./bbduk.sh -Xmx27g in1=~/NGS\ 10273-Raw\ Data\ NO.rep.1/NO.rep.1_1.fq in2=~/NGS\ 10273-Raw\ Data\ NO.rep.1/NO.rep.1_2.fq out1=adapter_trimmed1.fq out2=adapter_trimmed2.fq ref=~/adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=rl trimq=10 minlen=36 mag=10 bhist=bhist.txt qhist=qhist.txt gchist=gchist.txt aqhist=aqhist.txt lhist=lhist.txt gcbins=auto

                        ..then this came up
                        java -ea -Xmx27g -Xms27g -cp /Users/uplb/Documents/AGC/bbmap/current/ jgi.BBDuk -Xmx27g in1=/Users/uplb/NGS 10273-Raw Data NO.rep.1/NO.rep.1_1.fq in2=/Users/uplb/NGS 10273-Raw Data NO.rep.1/NO.rep.1_2.fq out1=adapter_trimmed1.fq out2=adapter_trimmed2.fq ref=/Users/uplb/adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=rl trimq=10 minlen=36 mag=10 bhist=bhist.txt qhist=qhist.txt gchist=gchist.txt aqhist=aqhist.txt lhist=lhist.txt gcbins=auto
                        Executing jgi.BBDuk [-Xmx27g, in1=/Users/uplb/NGS, 10273-Raw, Data, NO.rep.1/NO.rep.1_1.fq, in2=/Users/uplb/NGS, 10273-Raw, Data, NO.rep.1/NO.rep.1_2.fq, out1=adapter_trimmed1.fq, out2=adapter_trimmed2.fq, ref=/Users/uplb/adapters.fa, ktrim=r, k=23, mink=11, hdist=1, tpe, tbo, qtrim=rl, trimq=10, minlen=36, mag=10, bhist=bhist.txt, qhist=qhist.txt, gchist=gchist.txt, aqhist=aqhist.txt, lhist=lhist.txt, gcbins=auto]
                        Version 38.33

                        Exception in thread "main" java.lang.RuntimeException: Unknown parameter 10273-Raw
                        at jgi.BBDuk.<init>(BBDuk.java:513)
                        at jgi.BBDuk.main(BBDuk.java:76)

                        is this something to do with the Java? thank you

                        Comment

                        • enma_ai
                          Junior Member
                          • Dec 2018
                          • 2

                          pls help

                          someone who knows how to fix this?

                          I entered this command:

                          $ ./bbduk.sh -Xmx27g in1=~/NGS\ 10273-Raw\ Data\ NO.rep.1/NO.rep.1_1.fq in2=~/NGS\ 10273-Raw\ Data\ NO.rep.1/NO.rep.1_2.fq out1=adapter_trimmed1.fq out2=adapter_trimmed2.fq ref=~/adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=rl trimq=10 minlen=36 mag=10 bhist=bhist.txt qhist=qhist.txt gchist=gchist.txt aqhist=aqhist.txt lhist=lhist.txt gcbins=auto

                          ..then this came up

                          java -ea -Xmx27g -Xms27g -cp /Users/uplb/Documents/AGC/bbmap/current/ jgi.BBDuk -Xmx27g in1=/Users/uplb/NGS 10273-Raw Data NO.rep.1/NO.rep.1_1.fq in2=/Users/uplb/NGS 10273-Raw Data NO.rep.1/NO.rep.1_2.fq out1=adapter_trimmed1.fq out2=adapter_trimmed2.fq ref=/Users/uplb/adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo qtrim=rl trimq=10 minlen=36 mag=10 bhist=bhist.txt qhist=qhist.txt gchist=gchist.txt aqhist=aqhist.txt lhist=lhist.txt gcbins=auto
                          Executing jgi.BBDuk [-Xmx27g, in1=/Users/uplb/NGS, 10273-Raw, Data, NO.rep.1/NO.rep.1_1.fq, in2=/Users/uplb/NGS, 10273-Raw, Data, NO.rep.1/NO.rep.1_2.fq, out1=adapter_trimmed1.fq, out2=adapter_trimmed2.fq, ref=/Users/uplb/adapters.fa, ktrim=r, k=23, mink=11, hdist=1, tpe, tbo, qtrim=rl, trimq=10, minlen=36, mag=10, bhist=bhist.txt, qhist=qhist.txt, gchist=gchist.txt, aqhist=aqhist.txt, lhist=lhist.txt, gcbins=auto]
                          Version 38.33

                          Exception in thread "main" java.lang.RuntimeException: Unknown parameter 10273-Raw
                          at jgi.BBDuk.<init>(BBDuk.java:513)
                          at jgi.BBDuk.main(BBDuk.java:76)

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            @enma_ai: It is a bad idea to have spaces in your file/directory names (even though your OS may allow it). I suggest the you change "NGS 10273-Raw" to "NGS_10273-Raw" and see if that fixes the error you see.

                            Comment

                            • Sebio
                              Junior Member
                              • Jul 2012
                              • 9

                              Order of adapters in adapte file matters for bbduk. Should it though?

                              Hi guys,

                              I have been playing with bbduk and the ref option to submit a list of adapters/contaminants.

                              I found that the order of the adapters in the ref matters and depending on which adapter is first in this file bbduk might take one over the other.

                              e.g. I changed the order and ended up having lots of "Bisulfite_R1" trimmed, but when the "Reverse_adapter" was first in the ref file it was used instead much more often on the same fq file.

                              >Bisulfite_R1
                              AGATCGGAAGAGCACACGTCTGAAC
                              >Reverse_adapter
                              AGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCTCGTATGCCGTCTTCTGCTTG


                              Is this an intended behavior? It seems to me that this should not be the case.

                              Cheers,

                              Seb

                              Comment

                              • GenoMax
                                Senior Member
                                • Feb 2008
                                • 7142

                                @Seb: Brian does not seem to have time to respond to questions in this forum now a days but the different length of the adapters may have some bearing on this. Since the part you are looking for is identical (in bold) there is no need to add both copies. I assume you are trimming away sequence to the right(?) once that part in bold is located?

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 08:59 AM
                                0 responses
                                13 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...