View Single Post
Old 05-29-2014, 08:21 AM   #1
Corydoras
Member
 
Location: Norwich

Join Date: Jan 2014
Posts: 20
Default Data Clean & Adapter Contamination

Hello All,

I am currently starting to analyse my RAD data set (Illumina HiSeq, 150bp PE sequencing run). I have used Trimmomatic for the data clean and have checked for adapter contamination (as well as complementary adapter sequence contamination) using grep. Trimmomatic managed to get rid of all of the adapters and cleaned my data set nicely.

However, most of my adapter contamination seems to have occurred within the sequencing read (and is actually reverse complementary adapter sequence). I believe the way Trimmomatic handles within read contamination is by retaining the 5' end of the read up until the contamination occurs. What I am worried about is that this way, the retained part of the read will not necessarily match the corresponding paired read anymore (if this makes any sense at all).

The way that I imagine the adapter ends up within the read is that a small fragment with an adapter is ligated to another small fragment with adapters so i have a read that is as follows

P1- read(- P1 -read)------ P2-read. The brackets indicate what Trimmomatic would trim. My concern is that I have now paired reads where the forward does not belong to the reverse read:P1- read--- P2-read.

I am trying to assemble these de-novo into contigs and am worried that these 'trimmed contaminated sequences' could lead to false assemblies. However, altogether, these make ~3% of the total reads. I assume of these some will have been discarded due to poor quality or failed to be demultiplexed etc and mostly these are different sequences, so as long as my assemblies are created sufficiently strictly this shouldn't be much of a problem?

I would value any opinions on whether I should find a solution to this problem or whether this is sufficiently small for me to just go ahead with the analysis anyway?

Any ideas why I end up having so many sequences with reverse complementary adapter sequence contamination? E.g. via grep I find 2 million rc adapter infested reads, but only 1000 non-rc adapter infested reads? Is this normal with RAD sequencing?

Thank you so much for any help in advance,

Sarah
Corydoras is offline   Reply With Quote