SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Process to remove primers, adapters, etc. from Illumina data (http://seqanswers.com/forums/showthread.php?t=17168)

LizBent 01-27-2012 04:33 AM

Process to remove primers, adapters, etc. from Illumina data
 
Hi all,

I have some Illumina paired-end (100 bp) read data and have seen this very useful page: http://intron.ccam.uchc.edu/groups/t...Sequences.html

It has some primer adapter sequences and primer sequences in it. My question is, do I have to remove any additional sequences than the ones on this page and the primers I used to amplify my cDNA, such as the index primers? Is the sequence for index primers the same as the PE sequencing or PCR primers given in the link above, with just the index tag added?

Should I worry about reverse complementing all of these and removing those sequences as well?

Liz

GenoMax 01-27-2012 04:41 AM

It is always good to start with QC on your data. You will find there are several tools to do this.

FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/) and Fastx toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) are good ones to start with. There are utilities that will help you remove the adapters if they are present in your sequences. One more alternative is cutadapt (http://code.google.com/p/cutadapt/).

LizBent 01-27-2012 05:05 AM

Hi, I'm aware of the FastXtoolkit and other tools mentioned. My question is not how to remove adapters and primers, but more whether I need to reverse complement adapters and primers and remove the reverse complemented sequences as well.

fkrueger 01-27-2012 05:29 AM

We find that using the first 13bp of the Illumina adapter ('AGATCGGAAGAGC') efficiently removes adapter contamination for both paired-end files (the adapters on both sides share this sequence before they fork, and any of the Illumina multiplex barcodes should be further downstream of that).

A typical command for Cutadapt could be

./cutadapt -f fastq -O $stringency -q 20 -a AGATCGGAAGAGC input_file.fastq

$stringency would define the overlap with the adapter required for it to remove sequence from the end, the default is 3 I believe. This command would remove poor quality sequence as well as adapters from your FastQ file.

You should only be careful with the option of removing sequences if they become too short, because this can throw off the sequence-by-sequence order of paired-end files which is required by many aligners.

I hope this helps

LizBent 01-27-2012 06:12 AM

Thanks! I'm trying to figure out how to QC data before trying to use it in Velvet and Trinity.

LizBent 01-27-2012 09:02 AM

I've tried the 13-mer adapter end sequence with the FastXtoolkit (Clip), and it didn't remove any reads. However, when I use the full primer sequences, reads are clipped and removed. I'm going to try clipping the 13-mer sequence with CutAdapt, but I thought I would mention it in case anyone can tell me the difference between how these programs work.

rahularjun86 05-14-2012 04:08 AM

Hi,
One could also try simple grep to have a rough idea regarding the adapter sequences.
HTML Code:

grep -c "^GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG" *.fastq
grep -c "GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG$" *.fastq
grep -c "GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG" *.fastq

You can also try 13-mer sequence.
Thanks,
Rahul


All times are GMT -8. The time now is 03:18 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.