Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • cew85911
    Junior Member
    • Nov 2013
    • 1

    Adapter sequence trimming

    Hi all,

    I am new to next-gen bioinformatics. Been working on MiSeq data (targeted amplicon sequencing) for the past few weeks, using tools on Galaxy.

    Initially I found that the percentage of reads aligned by BWA (to reference hg19) was quite low ~50%. Just by eyeballing I noted that the majority of the unmapped reads were 'contaminated' by the adapter sequence: CTGTCTCTTATACACATCT (library was Nextera); but intriguingly the adapter sequence did not just occur at the 3' ends, some reads had them in the middle.

    So I decided to remove the adapter using a tool called Clip on Galaxy (this improved the percentage of mapped reads a lot!), and compared the variant-calling (GATK) results using adapter-trimmed reads versus untrimmed reads. I found that variant-calling was actually worse with adapter-trimmed reads - mapping quality in particular was generally lower e.g. a lot of MQ0 reads, and some true variants were skipped because read depth was too low. I wonder why this would happen and was I doing something wrong? I have read other threads and someone suggested that adapter sequence removal is actually not necessary for reference-based alignment. Is this true even when the percentage of aligned reads is low?

    Any advice is greatly appreciated. Thanks!
  • relipmoc
    Member
    • Jul 2011
    • 58

    #2
    Generally speaking, MQ0 means that there are multiple hits for a read in the target genome. If MQ decrease from none-zero to zero after adapter trimming, those reads must be over-trimmed. Though some people suggest there's no need to do adapter removal for reference-based alignment, it is undeniable the contaminants do influence mapping quality and mapping ratio, the problem is how much they influence the alignment software and how smart the alignment software is.

    For adapter trimming, I suggest a software, i.e. skewer, which was developed by ourselves. I'm not to reinvent the wheels, but I found none of the existing adapter trimmers meet our requirement.

    Comment

    • relipmoc
      Member
      • Jul 2011
      • 58

      #3
      Another comment: don't just pay attention to the percentage of mapped data, but pay more attention to the number of uniquely mapped reads. If it improves after adapter trimming, the trimmed reads will improve downstream SNP calling most of the time.

      Comment

      Latest Articles

      Collapse

      • SEQadmin2
        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
        by SEQadmin2


        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
        ...
        06-02-2026, 10:05 AM
      • SEQadmin2
        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
        by SEQadmin2


        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


        Introduction

        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
        05-22-2026, 06:42 AM
      • SEQadmin2
        Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
        by SEQadmin2

        Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


        Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
        05-06-2026, 09:04 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, Yesterday, 08:59 AM
      0 responses
      14 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      22 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 11:40 AM
      0 responses
      19 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 05-28-2026, 11:40 AM
      0 responses
      32 views
      0 reactions
      Last Post SEQadmin2  
      Working...