SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bases after adapter contamination canerb Bioinformatics 7 05-06-2014 12:27 AM
rRNA & tRNA contamination in whole genome sequencing choijae3 Sample Prep / Library Generation 0 03-19-2014 02:52 PM
how ro see adapter contamination in Illumina reads paa6 Illumina/Solexa 4 03-10-2014 01:31 AM
standard of clean data Pengfei Liu RNA Sequencing 1 05-21-2013 01:17 AM
How much adapter contamination is common? knostrov Bioinformatics 6 03-07-2013 10:49 AM

Reply
 
Thread Tools
Old 05-29-2014, 08:21 AM   #1
Corydoras
Member
 
Location: Norwich

Join Date: Jan 2014
Posts: 20
Default Data Clean & Adapter Contamination

Hello All,

I am currently starting to analyse my RAD data set (Illumina HiSeq, 150bp PE sequencing run). I have used Trimmomatic for the data clean and have checked for adapter contamination (as well as complementary adapter sequence contamination) using grep. Trimmomatic managed to get rid of all of the adapters and cleaned my data set nicely.

However, most of my adapter contamination seems to have occurred within the sequencing read (and is actually reverse complementary adapter sequence). I believe the way Trimmomatic handles within read contamination is by retaining the 5' end of the read up until the contamination occurs. What I am worried about is that this way, the retained part of the read will not necessarily match the corresponding paired read anymore (if this makes any sense at all).

The way that I imagine the adapter ends up within the read is that a small fragment with an adapter is ligated to another small fragment with adapters so i have a read that is as follows

P1- read(- P1 -read)------ P2-read. The brackets indicate what Trimmomatic would trim. My concern is that I have now paired reads where the forward does not belong to the reverse read:P1- read--- P2-read.

I am trying to assemble these de-novo into contigs and am worried that these 'trimmed contaminated sequences' could lead to false assemblies. However, altogether, these make ~3% of the total reads. I assume of these some will have been discarded due to poor quality or failed to be demultiplexed etc and mostly these are different sequences, so as long as my assemblies are created sufficiently strictly this shouldn't be much of a problem?

I would value any opinions on whether I should find a solution to this problem or whether this is sufficiently small for me to just go ahead with the analysis anyway?

Any ideas why I end up having so many sequences with reverse complementary adapter sequence contamination? E.g. via grep I find 2 million rc adapter infested reads, but only 1000 non-rc adapter infested reads? Is this normal with RAD sequencing?

Thank you so much for any help in advance,

Sarah
Corydoras is offline   Reply With Quote
Old 05-29-2014, 08:28 AM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

If the problematic reads are only 3% of the total, then you can remove or ignore them.

Are you using software specific for RAD-Seq like Stacks or RADTools?
mastal is offline   Reply With Quote
Old 05-29-2014, 08:38 AM   #3
Corydoras
Member
 
Location: Norwich

Join Date: Jan 2014
Posts: 20
Default

Thank you very much for your quick reply! It is much appreciated.

Last edited by Corydoras; 06-04-2014 at 07:49 AM.
Corydoras is offline   Reply With Quote
Old 06-04-2014, 07:48 AM   #4
Corydoras
Member
 
Location: Norwich

Join Date: Jan 2014
Posts: 20
Default

Just in case anybody ever has a similar problem or is confused and stumbles across my post:

Looking closer at my files and where the reverse adapter contamination occurred, it became obvious that the rc sequences were actually simply adapter read through and everything that followed was nonesense which Trimmomatic then perfectly removed. This means in roughly 3% of cases, my fragments were too short for the 150bp HiSeq and the RAD size selection did not work perfectly, but considering it is only 3% and it was my first set of libraries I am fairly happy with that.

Above I stated that I was concerned the forward read would not match the reverse read. Now I believe this is only the case in a couple of hundred fragments at best, that do consist of tiny fragments with adapter ligating to other tiny fragments of adapter. The majority of the contamination however presents itself in reverse complementary form.

This all obviously rests upon the assumption that when read-through occurs, it will be reverse complementary of the P2 adapters in the forward reads, and reverse complementary of the P1 adapters in the reverse reads. Please feel free to point out if there is something wrong with my logic!
Corydoras is offline   Reply With Quote
Reply

Tags
adapter contamination, rad sequencing, trimmomatic

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:10 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO