SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Paired-end Illumina RNA-seq adapter trimming fabrice Bioinformatics 8 01-05-2015 08:48 AM
5' end adapter contamination lwhitmore Bioinformatics 3 07-08-2014 03:44 AM
how ro see adapter contamination in Illumina reads paa6 Illumina/Solexa 4 03-10-2014 02:31 AM
Illumina paired-end reads. More than 2 adapter sequences. RedLightPanic Illumina/Solexa 8 03-07-2013 01:27 PM
Problem working with Illumina paired-end sequence data yangfangisok Bioinformatics 7 10-22-2012 07:42 AM

Reply
 
Thread Tools
Old 11-22-2016, 11:47 AM   #1
Gazaldeep
Junior Member
 
Location: India

Join Date: Nov 2016
Posts: 4
Unhappy Illumina paired end adapter contamination problem

Hello everyone!

I have rna-seq Illumina paired end reads and want to proceed with adapter trimming.
I have some confusions:

1. Does the 5' end of both the forward and reverse reads start from the first base of the insert? Or could there be some adapter contamination also at 5' end?
From whatever I have read online, there shouldn't be any adapter present at 5' end. But, the data I am analyzing has around 75 reads (out of 7 million for forward read file) with adapter at 5' end. 75 sequences isn't much, but I want to know what causes this..

2. For the forward reads, some 3' ends may have indexed adapter. In cases where this indexed adapter occurs within the sequence, I should delete the adapter and the following sequence, right? Even if the indexed primer is present at 5' end?? In which case the whole read should be deleted. (Because this was due to absence of insert between two adapters)

3. Do the 5' ends of reverse reads have barcode sequences or any part of the indexed adapter?? I have 12,399 reads (out of 7 million) that have complete or a part of indexed adapter at 5' end, with a few of them within the reads.


I am new to rna-seq data analysis, and have gone through lots of tutorials and explanations online, but everything seems to be really confusing at this moment.

My main concern is: where to expect adapters in illumina forward and reverse reads respectively, and what to do upon encountering unexpected adapters.
Gazaldeep is offline   Reply With Quote
Old 11-22-2016, 12:34 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,140
Default

Quote:
Originally Posted by Gazaldeep View Post
Hello everyone!

I have rna-seq Illumina paired end reads and want to proceed with adapter trimming.
I have some confusions:

1. Does the 5' end of both the forward and reverse reads start from the first base of the insert? Or could there be some adapter contamination also at 5' end?
From whatever I have read online, there shouldn't be any adapter present at 5' end. But, the data I am analyzing has around 75 reads (out of 7 million for forward read file) with adapter at 5' end. 75 sequences isn't much, but I want to know what causes this..
There should be no contamination on 5'-end if you are using standard Illumina kits.

Quote:
2. For the forward reads, some 3' ends may have indexed adapter. In cases where this indexed adapter occurs within the sequence, I should delete the adapter and the following sequence, right? Even if the indexed primer is present at 5' end?? In which case the whole read should be deleted. (Because this was due to absence of insert between two adapters)
Barcodes/Tag reads are never part of the actual read in Illumina sequencing. If you have tags in your sequence then there is something wrong. If you have some reads with no inserts they should be taken care of during trimming.

Quote:
My main concern is: where to expect adapters in illumina forward and reverse reads respectively, and what to do upon encountering unexpected adapters.
Use bbduk from BBMap suite. Search for that thread here. It is straight forward to use and @Brian includes all commercially used adapters in a file included in the package. Just point bbduk to that file and scan/trim your data.
GenoMax is online now   Reply With Quote
Old 11-22-2016, 09:02 PM   #3
Gazaldeep
Junior Member
 
Location: India

Join Date: Nov 2016
Posts: 4
Post

Thanks for your reply!!

Quote:
Originally Posted by GenoMax View Post
There should be no contamination on 5'-end if you are using standard Illumina kits.
So, the 72 reads with 5' adapter contamination should be deleted, right?

Quote:
Originally Posted by GenoMax View Post
Barcodes/Tag reads are never part of the actual read in Illumina sequencing. If you have tags in your sequence then there is something wrong. If you have some reads with no inserts they should be taken care of during trimming.
The paired-end data I am trying to analyze was downloaded from DDBJ.

After searching online and through your answer, I'm sure that I should delete the reads that have any adapter at 5' end (be it the 5' adapter or 3' adapter), and perform trimming for reads with adapter at 3' end or within the read.

But I'm actually a bit confused about the Illumina sequencing steps.

Are the barcodes removed after sorting the reads into different files based on different barcodes?? So the files we get in the end cannot have the barcodes, but may they have the constant part of the indexed adapter (which occurs before/after the barcode) or are the constant parts also removed with the barcodes?
I want to be clear about the process.
Gazaldeep is offline   Reply With Quote
Old 11-22-2016, 09:08 PM   #4
Gazaldeep
Junior Member
 
Location: India

Join Date: Nov 2016
Posts: 4
Default

I could just use a tool for trimming, but before that, I want to be clear about what's happening. Maybe I've got it all wrong?
Gazaldeep is offline   Reply With Quote
Old 11-23-2016, 04:56 AM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,140
Default

Quote:
Originally Posted by Gazaldeep View Post
Thanks for your reply!!

But I'm actually a bit confused about the Illumina sequencing steps.
Check this video out for clarification: https://www.youtube.com/watch?v=HMyCqWhwB8E

Quote:
Are the barcodes removed after sorting the reads into different files based on different barcodes?? So the files we get in the end cannot have the barcodes, but may they have the constant part of the indexed adapter (which occurs before/after the barcode) or are the constant parts also removed with the barcodes?
I want to be clear about the process.
Illumina sequencing actually proceeds in four separate steps (for 2D barcodes, 3 for 1 D barcodes).

Code:
R1 --> R2 (index 1) --> R3 (index 2) --> R4.
Illumina software keeps tracks of every cluster over R1 through R4. During base calling (conversion of BCL to FASTQ) index read sequences are extracted from R2 (and R3) and are transferred to the header of the FASTQ record to complete demultiplexing (you thus end up with R1/R2 files).

It is possible to generate files with index reads in individual files so you end up with 4 files per sample. This is only needed for some applications (e.g. QIIME).
GenoMax is online now   Reply With Quote
Old 11-23-2016, 11:15 AM   #6
Gazaldeep
Junior Member
 
Location: India

Join Date: Nov 2016
Posts: 4
Default

Thanks!!! Really helpful!!

In my reads, I have 5' end contaminated with 5' adapter (75 reads). Also, in 12,000 reads out of 7 million, 5' adapter is present with the reads.. what do you suggest? Should I delete those reads? Or should I just trim the adapter and the sequence preceeding it at 5'?? I'm using Cutadapt at present. But in any adapter removal tool, I will have to specify if I want to trim these reads and in what way..

Sorry if my questions are naive!
Gazaldeep is offline   Reply With Quote
Reply

Tags
adapter contamination, adapter trimming, data preprocessing, illumina paired-end, rna-seq analysis

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:19 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO