SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Trimming multiple adapters in a single run dena.dinesh Bioinformatics 9 11-28-2014 11:21 AM
Adapters trimming: Cutadapt vs Trimmomatic MafaldaSF Bioinformatics 8 03-20-2014 07:16 AM
Trimming Haloplex adapters jordi Bioinformatics 10 01-03-2014 06:41 AM
Adapter trimming and trimming by quality question alisrpp Bioinformatics 5 04-08-2013 05:55 PM
Please Help: What is the differences between standard trimming and adaptive trimming byou678 Bioinformatics 8 08-22-2011 01:05 PM

Reply
 
Thread Tools
Old 11-22-2015, 08:02 PM   #1
bluepoison
Junior Member
 
Location: Detroit

Join Date: Nov 2015
Posts: 4
Default finding adapters for trimming

Hi All,

I am a total newbies in this field. I want to know the trend of the community for adapter trimming steps.

I have got 50bp single end reads (Sanger / Illumina 1.9). Primary goal is to align the reads using bismark, and then extract methylation scores using 'methylkit'. There were three overrepresented sequences in FastQC report. Then I ran trim_galore using the default settings. trim_galore(which basically uses 'cutadapt') trimmed the universal adapter but still there are two overrepresented sequences left in the fastQC report.

I have read so many posts related to trimming last 3-4 days but still I am so confused. The summary I have got is that FastQC tells us about adapter contamination, but it may not tell about the actual adapter sequence.

1. Is it a MUST to trim all the overrepresented sequences or just trimming the universal adapter is fine?
2. What is the easiest way to find the sequences that need to be trimmed?

Any help/suggestion is greatly appreciated.
bluepoison is offline   Reply With Quote
Old 11-22-2015, 10:39 PM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

1) The best practice is to trim the actual adapter sequences used in your library.
2) The best way to find that is to ask the people who made the library.

But, if you have paired reads, you can also find your adapter sequences with BBMerge like this:

bbmerge.sh in1=read1.fq in2=read2.fq outa=adapters.fa

BBDuk includes all standard Illumina adapters in "/resources/adapters.fa". If you do not know which adapters were used, and are unable to find out, I recommend using that as the reference.

Since you are using single-ended reads, it's difficult to automatically empirically determine the adapter sequences. So, unless you can get them from the people who made the library, I suggest using that reference.

Last edited by Brian Bushnell; 11-22-2015 at 10:43 PM.
Brian Bushnell is offline   Reply With Quote
Old 11-25-2015, 05:05 AM   #3
turnersd
Senior Member
 
Location: Charlottesville, VA

Join Date: May 2011
Posts: 112
Default

Brian - thanks for parsing Illumina's PDF and making the adapters available. It looks like as of Nov 9 2015 Illumina updated their adapter sequence document. Are there any notable changes that aren't present in the BBDuk adapter sequence fasta?

https://support.illumina.com/downloa...ce-letter.html
Attached Files
File Type: pdf illumina-adapter-sequences_1000000002694-00.pdf (543.2 KB, 23 views)
turnersd is offline   Reply With Quote
Old 11-25-2015, 10:07 AM   #4
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Ah, thanks for notifying me... I'll look at it.
Brian Bushnell is offline   Reply With Quote
Old 11-26-2015, 04:11 PM   #5
bluepoison
Junior Member
 
Location: Detroit

Join Date: Nov 2015
Posts: 4
Default

Thanks a lot for the response Brian.

I have single reads this time. Do you have any suggestions for the overrepresented sequences that do not match with any actual adapter (''No Hit" as described by fastqc)?

Quote:
Originally Posted by Brian Bushnell View Post
1) The best practice is to trim the actual adapter sequences used in your library.
2) The best way to find that is to ask the people who made the library.

But, if you have paired reads, you can also find your adapter sequences with BBMerge like this:

bbmerge.sh in1=read1.fq in2=read2.fq outa=adapters.fa

BBDuk includes all standard Illumina adapters in "/resources/adapters.fa". If you do not know which adapters were used, and are unable to find out, I recommend using that as the reference.

Since you are using single-ended reads, it's difficult to automatically empirically determine the adapter sequences. So, unless you can get them from the people who made the library, I suggest using that reference.
bluepoison is offline   Reply With Quote
Old 11-26-2015, 10:56 PM   #6
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

@bluepoison: I suggest you try adapter-removal using BBDuk and adapters.fa, and see if fastQC still detects overrepresented sequences. If not, everything should be fine! But if it does, you may have a new adapter sequence, so please reply in that case.

@turnersd: Unfortunately... there are a lot of new adapter indexes in the latest Illumina letter that you linked - dozens. They are for human-specific tests, like autism, cancer, and other possibly genetic disorders. And as always, Illumina makes no effort to indicate which indexes go with which adapters. So, it looks like a huge amount of effort now to make a complete set of Illumina adapter sequences complete with indexes.

JGI does not do any human sequencing, so none of that is relevant to us. But for everyone else out there - I really hope Illumina, or someone in the community, compiles a full list of the new human-specific adapter sequences. Because there are so many, and I have no way to empirically determine whether the new sequences are correct (since we don't use them), it's not really possible for me to generate them. Illumina would provide the full, indexed adapter-sequences for trimming if they had the slightest concern for their end users, which they unfortunately do not appear to have.

So far, it's not clear to me which adapters go with new indexes, or why they even need new indexes for cancer versus autism, etc. Seems like a marketing ploy. But probably the new indices only affect amplicon sequencing and are irrelevant to randomly-shared libraries.
Brian Bushnell is offline   Reply With Quote
Reply

Tags
adapter, overrepresented sequences, trimming

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:44 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO