Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bbduk.sh barcode filter

    Does the barcode filter in bbduk.sh (v37.66) only allow perfect matches or are mismatches allowed?
    Thanks,
    Lynn

  • #2
    What barcode filter are you referring? Generally with "hdist=N" paramter you can allow or disallow (hdist=0) mismatches.

    Comment


    • #3
      I'm referring to these parameters:
      barcodefilter=t barcodes=TCTCGCGC
      As far as I can see from my results right now, only reads with exactly this sequence in the header are retained. I think this may be too stringent. Does the 'hdist' parameter affect the barcode given how short it is?

      Comment


      • #4
        I see. I have not personally used this feature since most of barcode work is done at the bcl2fastq stage, where you can allow for errors in sequence.

        What exactly are you trying to do? Eliminate reads with some barcodes? I don't think the hdist= parameter is going to apply for the barcodes. It is for errors in the main read. You may have to look for an alternate way to do this. Perhaps using "demuxbyname.sh" may be a better option. Take a look at that.

        Comment


        • #5
          Thanks for the reply. I have a set of large paired-end fastq files that were preprocessed by a sequencing core and I suspect that they were never demultiplexed because the file is quite large and may have had its own lane or flowcell. The headers contain barcodes that are mostly TCTCGCGC or that string with one or two mismatches but there are also barcodes that are wildly different and I want to strip those out without stripping what are probably legit barcodes with 1 or 2 mismatches. When I tried demuxbyname.sh, it started writing out 2x80,000 files, two for every barcode variant present.

          Comment


          • #6
            Use the code I have in this post to enumerate all the different barcodes present in your file. That should give you an idea of the complexity of the problem. Then choose the ones you want (that actually should belong to your samples since you made them) to demux and use only those with demuxbyname.sh.

            Comment


            • #7
              I have already looked at all the barcodes. I've got 76,959 different barcodes that are not exact matches accounting for about 48 million reads. If allow 1 mismatch, I could recover 16 million of those. If I allow 2 mismatches, I could recover 24 million.

              Comment


              • #8
                I wrote a perl script using the fuzzy match module (Text::Fuzzy) to pull all the entries with no more than two mismatches in the barcode. It's not much but I can send to anyone who is interested.

                Comment


                • #9
                  In future just ask the sequence provider to re-do the demultiplexing with bcl2fastq. You are paying them for it anyway :-)

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM
                  • seqadmin
                    The Impact of AI in Genomic Medicine
                    by seqadmin



                    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                    02-26-2024, 02:07 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 03-14-2024, 06:13 AM
                  0 responses
                  33 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-08-2024, 08:03 AM
                  0 responses
                  72 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-07-2024, 08:13 AM
                  0 responses
                  81 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-06-2024, 09:51 AM
                  0 responses
                  68 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X