Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data Clean & Adapter Contamination

    Hello All,

    I am currently starting to analyse my RAD data set (Illumina HiSeq, 150bp PE sequencing run). I have used Trimmomatic for the data clean and have checked for adapter contamination (as well as complementary adapter sequence contamination) using grep. Trimmomatic managed to get rid of all of the adapters and cleaned my data set nicely.

    However, most of my adapter contamination seems to have occurred within the sequencing read (and is actually reverse complementary adapter sequence). I believe the way Trimmomatic handles within read contamination is by retaining the 5' end of the read up until the contamination occurs. What I am worried about is that this way, the retained part of the read will not necessarily match the corresponding paired read anymore (if this makes any sense at all).

    The way that I imagine the adapter ends up within the read is that a small fragment with an adapter is ligated to another small fragment with adapters so i have a read that is as follows

    P1- read(- P1 -read)------ P2-read. The brackets indicate what Trimmomatic would trim. My concern is that I have now paired reads where the forward does not belong to the reverse read:P1- read--- P2-read.

    I am trying to assemble these de-novo into contigs and am worried that these 'trimmed contaminated sequences' could lead to false assemblies. However, altogether, these make ~3% of the total reads. I assume of these some will have been discarded due to poor quality or failed to be demultiplexed etc and mostly these are different sequences, so as long as my assemblies are created sufficiently strictly this shouldn't be much of a problem?

    I would value any opinions on whether I should find a solution to this problem or whether this is sufficiently small for me to just go ahead with the analysis anyway?

    Any ideas why I end up having so many sequences with reverse complementary adapter sequence contamination? E.g. via grep I find 2 million rc adapter infested reads, but only 1000 non-rc adapter infested reads? Is this normal with RAD sequencing?

    Thank you so much for any help in advance,

    Sarah

  • #2
    If the problematic reads are only 3% of the total, then you can remove or ignore them.

    Are you using software specific for RAD-Seq like Stacks or RADTools?

    Comment


    • #3
      Thank you very much for your quick reply! It is much appreciated.
      Last edited by Corydoras; 06-04-2014, 07:49 AM.

      Comment


      • #4
        Just in case anybody ever has a similar problem or is confused and stumbles across my post:

        Looking closer at my files and where the reverse adapter contamination occurred, it became obvious that the rc sequences were actually simply adapter read through and everything that followed was nonesense which Trimmomatic then perfectly removed. This means in roughly 3% of cases, my fragments were too short for the 150bp HiSeq and the RAD size selection did not work perfectly, but considering it is only 3% and it was my first set of libraries I am fairly happy with that.

        Above I stated that I was concerned the forward read would not match the reverse read. Now I believe this is only the case in a couple of hundred fragments at best, that do consist of tiny fragments with adapter ligating to other tiny fragments of adapter. The majority of the contamination however presents itself in reverse complementary form.

        This all obviously rests upon the assumption that when read-through occurs, it will be reverse complementary of the P2 adapters in the forward reads, and reverse complementary of the P1 adapters in the reverse reads. Please feel free to point out if there is something wrong with my logic!

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X