SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Adapter trimming NEBNext Library / MiSeq (http://seqanswers.com/forums/showthread.php?t=40290)

foolishbrat 01-26-2014 09:45 PM

Adapter trimming NEBNext Library / MiSeq
 
I have a 51 single-end reads generated with MiSeq using NEBNext Multiplex Oligos for Illumina.

The sample sheet looks like this:

Code:

IEMFileVersion,4
Investigator Name,FB
Experiment Name,WT10104
Date,11/27/2013
Workflow,GenerateFASTQ
Application,FASTQ Only
Assay,TruSeq Small RNA
Description,
Chemistry,Default

[Reads]
51

[Settings]
ReverseComplement,0

[Data]
Sample_ID,Sample_Name,Sample_Plate,Sample_Well,I7_Index_ID,index,Sample_Project,Description
HS130333-1,,,,RPI3,TTAGGC,,
HS130333-2,,,,RPI4,TGACCA,,
HS130333-3,,,,RPI5,ACAGTG,,

The Primer index manual can be found here.

Sor for HS130333-1 file, according to the manual above the primer/adapter with index is:
5 ́-CAAGCAGAAGACGGCATACGAGATGCCTAAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC-s-T-3 ́

The document indicated that the expected index primer sequence read is TTAGGC which is the reverse complement of GCCTAA.


My question is if I use `trim_galore` or `cutadapt` to trim the data, what is the parameter -a I should use?

Is it the whole sequence above? Or first 5 ́-CAAGCAGAAGACGGCATACGAGAT?
Or GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC-s-T-3 ́? (and what is 's' means here)

Or the reverse complement of the each of above?

fkrueger 01-27-2014 12:06 AM

Since your sequence ends in ... GATC-s-T, you need to use its reverse complement, thus starting with AGATC.. This means you should be able to run Trim Galore in its default mode without specifying any -a because it is using just that sequence anyway.

foolishbrat 01-27-2014 12:21 AM

Hi, Thanks.

But the reverse complement of

5 ́-CAAGCAGAAGACGGCATACGAGATGCCTAAGTGACTGGAGTTCAGACGTGTGCTCTTCCGATC-s-T-3 ́

Is this:
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTTAGGCATCTCGTATGCCGTCTTCTGCTTG

I.e. it doesn't start with "AGATC".
Or trim_galore default adapter AGATCGGAAGAGC is no substring of the reverse complement above.

Did I miss anything?

Truly need your advice.

fkrueger 01-27-2014 12:24 AM

You need to add an A to the start of the reverse complemented sequence which is a result of the A-tailing process in the Illumina library preparation protocol. Then the start of both sequence will match up perfectly which is what your want to use. Hth

foolishbrat 01-27-2014 12:32 AM

Thanks a million!
So in the trimming process, I don't have to care about the index sequence "GCCTAA" or it's reverse complement "TTAGGC". Am I right?

fkrueger 01-27-2014 12:36 AM

That's right, the index is only relevant to sort out different barcodes; for adapter trimming purposes it is sufficient to specify only the start of the adapter which all indexed adapters have in common. Good luck!

foolishbrat 01-27-2014 12:45 AM

Quote:

Originally Posted by fkrueger (Post 130750)
Since your sequence ends in ... GATC-s-T, you need to use its reverse complement, thus starting with AGATC.. This means you should be able to run Trim Galore in its default mode without specifying any -a because it is using just that sequence anyway.



You save my life. If you don't mind one last question.
How can I know whether or not to use the reverse complement adapter for trimming?

Any other alternative than GATC-s-T ending?

fkrueger 01-27-2014 12:50 AM

I think that all Illumina adapters end in exactly this sequence, you just need to draw it out once and it will become very obvious (I am certain a sketch of this can be found in of the other threads here on SeqAnswers). Small RNA adapter are different, but for all TruSeq adapters etc you should be fine using the defaults.

foolishbrat 01-27-2014 01:06 AM

Thank you.


All times are GMT -8. The time now is 09:47 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.