SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to remove 3'-adaptor sequence from illumina DGE expression data archory Bioinformatics 6 12-05-2011 07:55 AM
how to remove 3'-adaptor sequence from illumina DGE expression data archory Illumina/Solexa 0 11-29-2011 06:53 PM
when do you pre-process Illumina reads before analysis? PFS Bioinformatics 15 04-28-2011 04:06 PM
SE adapters and PE primers PaulineF Illumina/Solexa 2 02-09-2010 12:10 PM
Illumina Primers Will Illumina/Solexa 1 08-10-2009 09:08 AM

Reply
 
Thread Tools
Old 01-27-2012, 05:33 AM   #1
LizBent
Member
 
Location: Guelph, Ontario, Canada

Join Date: Jan 2012
Posts: 31
Default Process to remove primers, adapters, etc. from Illumina data

Hi all,

I have some Illumina paired-end (100 bp) read data and have seen this very useful page: http://intron.ccam.uchc.edu/groups/t...Sequences.html

It has some primer adapter sequences and primer sequences in it. My question is, do I have to remove any additional sequences than the ones on this page and the primers I used to amplify my cDNA, such as the index primers? Is the sequence for index primers the same as the PE sequencing or PCR primers given in the link above, with just the index tag added?

Should I worry about reverse complementing all of these and removing those sequences as well?

Liz
LizBent is offline   Reply With Quote
Old 01-27-2012, 05:41 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,800
Default

It is always good to start with QC on your data. You will find there are several tools to do this.

FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/) and Fastx toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) are good ones to start with. There are utilities that will help you remove the adapters if they are present in your sequences. One more alternative is cutadapt (http://code.google.com/p/cutadapt/).

Last edited by GenoMax; 01-27-2012 at 05:45 AM.
GenoMax is offline   Reply With Quote
Old 01-27-2012, 06:05 AM   #3
LizBent
Member
 
Location: Guelph, Ontario, Canada

Join Date: Jan 2012
Posts: 31
Default

Hi, I'm aware of the FastXtoolkit and other tools mentioned. My question is not how to remove adapters and primers, but more whether I need to reverse complement adapters and primers and remove the reverse complemented sequences as well.
LizBent is offline   Reply With Quote
Old 01-27-2012, 06:29 AM   #4
fkrueger
Senior Member
 
Location: Cambridge, UK

Join Date: Sep 2009
Posts: 611
Default

We find that using the first 13bp of the Illumina adapter ('AGATCGGAAGAGC') efficiently removes adapter contamination for both paired-end files (the adapters on both sides share this sequence before they fork, and any of the Illumina multiplex barcodes should be further downstream of that).

A typical command for Cutadapt could be

./cutadapt -f fastq -O $stringency -q 20 -a AGATCGGAAGAGC input_file.fastq

$stringency would define the overlap with the adapter required for it to remove sequence from the end, the default is 3 I believe. This command would remove poor quality sequence as well as adapters from your FastQ file.

You should only be careful with the option of removing sequences if they become too short, because this can throw off the sequence-by-sequence order of paired-end files which is required by many aligners.

I hope this helps
fkrueger is offline   Reply With Quote
Old 01-27-2012, 07:12 AM   #5
LizBent
Member
 
Location: Guelph, Ontario, Canada

Join Date: Jan 2012
Posts: 31
Default

Thanks! I'm trying to figure out how to QC data before trying to use it in Velvet and Trinity.
LizBent is offline   Reply With Quote
Old 01-27-2012, 10:02 AM   #6
LizBent
Member
 
Location: Guelph, Ontario, Canada

Join Date: Jan 2012
Posts: 31
Default

I've tried the 13-mer adapter end sequence with the FastXtoolkit (Clip), and it didn't remove any reads. However, when I use the full primer sequences, reads are clipped and removed. I'm going to try clipping the 13-mer sequence with CutAdapt, but I thought I would mention it in case anyone can tell me the difference between how these programs work.
LizBent is offline   Reply With Quote
Old 05-14-2012, 05:08 AM   #7
rahularjun86
Member
 
Location: Frankfurt(M), Germany

Join Date: Jan 2011
Posts: 58
Default

Hi,
One could also try simple grep to have a rough idea regarding the adapter sequences.
HTML Code:
grep -c "^GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG" *.fastq
grep -c "GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG$" *.fastq
grep -c "GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG" *.fastq
You can also try 13-mer sequence.
Thanks,
Rahul
__________________
Rahul Sharma,
Ph.D
Frankfurt am Main, Germany
rahularjun86 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:16 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO