Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fastq_quality_trimmer options

    Hi ,
    I'm writing to ask some question about FastQ quality trimmer.
    I have loaded Fastx toolkit on my linux machine, I want to use that tool to trim bases from the 3' and 5' and with a low quality score <28.
    I have a paired end data files of 101 bases, only in the R2 file I have a decrease of quality in the 3' and in the 5' end, I attached the relative quality box plot.
    How should I set the parameter reported below of Fastq quality trimmer to obtain a good quality box plot as for R1 (attached)?

    fastq_quality_trimmer -h
    usage: fastq_quality_trimmer [-h] [-v] [-t N] [-l N] [-z] [-i INFILE] [-o OUTFILE]
    Part of FASTX Toolkit 0.0.13 by A. Gordon ([email protected])

    [-h] = This helpful help screen.
    [-t N] = Quality threshold - nucleotides with lower
    quality will be trimmed (from the end of the sequence).
    [-l N] = Minimum length - sequences shorter than this (after trimming)
    will be discarded. Default = 0 = no minimum length.
    [-z] = Compress output with GZIP.
    [-i INFILE] = FASTQ input file. default is STDIN.
    [-o OUTFILE] = FASTQ output file. default is STDOUT.
    [-v] = Verbose - report number of sequences.
    If [-o] is specified, report will be printed to STDOUT.
    If [-o] is not specified (and output goes to STDOUT),
    report will be printed to STDERR.

    I will appreciate your help.
    Thanks
    Attached Files

  • #2
    Originally posted by giampe View Post
    Hi ,
    I'm writing to ask some question about FastQ quality trimmer.
    I have loaded Fastx toolkit on my linux machine, I want to use that tool to trim bases from the 3' and 5' and with a low quality score <28.
    I have a paired end data files of 101 bases, only in the R2 file I have a decrease of quality in the 3' and in the 5' end, I attached the relative quality box plot.
    How should I set the parameter reported below of Fastq quality trimmer to obtain a good quality box plot as for R1 (attached)?

    fastq_quality_trimmer -h
    usage: fastq_quality_trimmer [-h] [-v] [-t N] [-l N] [-z] [-i INFILE] [-o OUTFILE]
    Part of FASTX Toolkit 0.0.13 by A. Gordon ([email protected])

    [-h] = This helpful help screen.
    [-t N] = Quality threshold - nucleotides with lower
    quality will be trimmed (from the end of the sequence).
    [-l N] = Minimum length - sequences shorter than this (after trimming)
    will be discarded. Default = 0 = no minimum length.
    [-z] = Compress output with GZIP.
    [-i INFILE] = FASTQ input file. default is STDIN.
    [-o OUTFILE] = FASTQ output file. default is STDOUT.
    [-v] = Verbose - report number of sequences.
    If [-o] is specified, report will be printed to STDOUT.
    If [-o] is not specified (and output goes to STDOUT),
    report will be printed to STDERR.

    I will appreciate your help.
    Thanks
    Hi,
    First you need to decide on the quality threshold. If you try 28, then:

    fastq_quality_trimmer -v -t 28 -l 80 -i inputfile.fastq -o <outputfilename.fastq>
    (you can remove -l 80 if you don't care what your minimum remaining length should be. You also need to be careful with this option as it will discard any sequences shorter than the set length. If so, you will end up with different number of reads in your paired R1 and R2 files, and also lose pairing information).

    I am not sure if this tool trims from both ends, you could try with a small subset of your data.

    I would also recommend you try using a software called trimmomatic. It is versatile, much quicker, and maintains the paired information.

    Comment


    • #3
      I second Trimmomatic as being much better

      Comment


      • #4
        Hi kennels,
        thanks for your suggestion. I want to try Trimmomatic tool, but I didn't find a guide for its installation for a linux machine. I have just downloaded the binary folder from

        Downloading Trimmomatic
        Version 0.22: binary and source

        Sorry but I'm a biologist so if you could help me in the installation I will be grateful to you!

        Comment


        • #5
          Hi,
          I completely understand your situation as I was/am pretty much in the same boat.
          Once you download the binary, the .jar file it can simply be run from the folder without doing anything. You do need java installed on your linux machine though (try googling 'install java linux' if you're having trouble with this). There are examples of the command invocation on the trimmomatic page, and you literally just need to follow what it has written. The commands in capital letters are the options you can adjust for the program. I found it a little confusing at first, so here's an example code i use for myself:

          Code:
          java -classpath /usr/local/bin/Trimmomatic-0.22/trimmomatic-0.22.jar org.usadellab.trimmomatic.TrimmomaticPE -threads 4 -phred33 -trimlog read1.fastq.gz read2.fastq.gz read1.paired read1.unpaired read2.paired read2.unpaired ILLUMINACLIP:adapterseq.fa:2:40:15 TRAILING:20 HEADCROP:12 MINLEN:50
          You can use the compressed .gz file of your reads straightaway as well, which is convenient. Adjust the options to your liking.

          I found the ILLUMINACLIP option a little hard to understand. Basically you need a separate file containing the adapter sequences you want to clip off (if you want to do this that is. Otherwise you can leave it out of your command). The header for the adapter sequences must also be named according to the software instructions. e.g.
          >three-prime-adapter/1
          AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
          >five-prime-adapter/2
          ACTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG

          You need to have the same Prefix name (before the /1 and /2) if you want to invoke palindrome clipping, otherwise you can clip in simple mode as I have above. Have a read about the adapter fasta file on their page.

          After running, you end up with the .paired files which maintains your paired information, and 'orphaned' reads in the .unpaired files.

          hope this helps.
          Last edited by Kennels; 01-10-2013, 03:12 PM.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 08:47 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X