Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What is adapter trimming threshold of fastx_clipper?

    Hi everyone

    Does fastx_clipper have simple thread hold introduction, for example, minimum adapter match ? maximum mismatch allow? adapter sequence identity threshold?

    I tried to find some manual or introduction but I can't.

    thanks!

  • #2
    Help from fastx_clipper program has the information you need.

    Code:
    $ fastx_clipper -h
    usage: fastx_clipper [-h] [-a ADAPTER] [-D] [-l N] [-n] [-d N] [-c] [-C] [-o] [-v] [-z] [-i INFILE] [-o OUTFILE]
    Part of FASTX Toolkit 0.0.13.2 by A. Gordon ([email protected])
    
       [-h]         = This helpful help screen.
       [-a ADAPTER] = ADAPTER string. default is CCTTAAGG (dummy adapter).
       [-l N]       = discard sequences shorter than N nucleotides. default is 5.
       [-d N]       = Keep the adapter and N bases after it.
                      (using '-d 0' is the same as not using '-d' at all. which is the default).
       [-c]         = Discard non-clipped sequences (i.e. - keep only sequences which contained the adapter).
       [-C]         = Discard clipped sequences (i.e. - keep only sequences which did not contained the adapter).
       [-k]         = Report Adapter-Only sequences.
       [-n]         = keep sequences with unknown (N) nucleotides. default is to discard such sequences.
       [-v]         = Verbose - report number of sequences.
                      If [-o] is specified,  report will be printed to STDOUT.
                      If [-o] is not specified (and output goes to STDOUT),
                      report will be printed to STDERR.
       [-z]         = Compress output with GZIP.
       [-D]  = DEBUG output.
    [COLOR="Red"]   [-M N]       = require minimum adapter alignment length of N.
                      If less than N nucleotides aligned with the adapter - don't clip [/COLOR]it.   [-i INFILE]  = FASTA/Q input file. default is STDIN.
       [-o OUTFILE] = FASTA/Q output file. default is STDOUT.

    Comment


    • #3
      Hi Senior Member

      thanks for your information

      I know that help screen show information of parameter

      but I thinks it is a bit fuzzy, what is maximum mismatch allow in matched adapter or adapter sequence identity threshold ?

      thanks!

      Comment


      • #4
        Hi Senior Member

        thanks for your information

        I know that help screen show information of parameter

        but I thinks it is a bit fuzzy, what is maximum mismatch allow in matched adapter or adapter sequence identity threshold ?

        thanks!

        Comment


        • #5
          louis,

          I don't know the parameters for fastx, but I can tell you how to do that with BBDuk.

          hdist=X will set a maximum hamming distance of X (default is 0) per kmer. k=X will use kmers of length X inside the read. mink=X will use kmers as short as X for the end of the read.

          So:

          bbduk.sh in=reads.fq out=trimmed.fq ref=truseq.fa ktrim=r k=25 mink=12 hdist=2

          ...will trim reads with up to 2 substitutions, and a minimum of 12 bases of adapter sequence. If you have a paired fragment library, you can get even better adapter trimming by adding the flags "tbo" and "tpe", which will look for overlaps and trim reads that have an insert size shorter than read length.

          Note that exponentially more memory is required for a higher hamming distance so I don't recommend going above 3.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          29 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          32 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X