Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding Trimmomatic Output

    Hello.

    So for a specific trimmomatic run, I have the following output and I wish to compare this information with FastQC

    Input Read Pairs: 39100817 Both Surviving: 15397090 (39.38%) Forward Only Surviving: 12799520 (32.73%) Reverse Only Surviving: 548505 (1.40%) Dropped: 10355702 (26.48%)


    where my FastQC report shows that my total input reads is
    39100817, and after trimming the QC report shows 15397090

    So if I take the difference between these I get 23703727

    however this does not match the Dropped 10355702 from trimmomatic.

    So what is this Dropped output mean?

  • #2
    Hi Acrolombo

    Trimmomatic will (in this paired case) generated four files from your set of two files (FW & RV)
    First a new pair of FW and RV reads which are still paired. This is indicated by "both surviving" and this is the same numer you got out of FastQC.
    In addition to this paired sequences, some of your reads in FW did survive the trimming but their partner sequences didn't, these end up in a different file. Complementary some of the RV reads might survive, but their cognate FW read might have been trimmed down completely. This is the last file RV only.
    As most often RV reads have lower Quality you get more reads where the FW read survived than the RV read than vice versa.

    You might ask why: The issue is that if you were to drop some FW reads completely but kept their RV reads your FW.fastq file would contain a different number of reads than the RV.fastq file. As their is still useful information in unpaired reads for many aplications these are being moved to separate files.

    Cheers
    björn

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Today, 11:49 AM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Yesterday, 08:47 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    61 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Working...
    X