SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   finding adapters for trimming (http://seqanswers.com/forums/showthread.php?t=64472)

bluepoison 11-22-2015 07:02 PM

finding adapters for trimming
 
Hi All,

I am a total newbies in this field. I want to know the trend of the community for adapter trimming steps.

I have got 50bp single end reads (Sanger / Illumina 1.9). Primary goal is to align the reads using bismark, and then extract methylation scores using 'methylkit'. There were three overrepresented sequences in FastQC report. Then I ran trim_galore using the default settings. trim_galore(which basically uses 'cutadapt') trimmed the universal adapter but still there are two overrepresented sequences left in the fastQC report.

I have read so many posts related to trimming last 3-4 days but still I am so confused. The summary I have got is that FastQC tells us about adapter contamination, but it may not tell about the actual adapter sequence.

1. Is it a MUST to trim all the overrepresented sequences or just trimming the universal adapter is fine?
2. What is the easiest way to find the sequences that need to be trimmed?

Any help/suggestion is greatly appreciated.

Brian Bushnell 11-22-2015 09:39 PM

1) The best practice is to trim the actual adapter sequences used in your library.
2) The best way to find that is to ask the people who made the library.

But, if you have paired reads, you can also find your adapter sequences with BBMerge like this:

bbmerge.sh in1=read1.fq in2=read2.fq outa=adapters.fa

BBDuk includes all standard Illumina adapters in "/resources/adapters.fa". If you do not know which adapters were used, and are unable to find out, I recommend using that as the reference.

Since you are using single-ended reads, it's difficult to automatically empirically determine the adapter sequences. So, unless you can get them from the people who made the library, I suggest using that reference.

turnersd 11-25-2015 04:05 AM

1 Attachment(s)
Brian - thanks for parsing Illumina's PDF and making the adapters available. It looks like as of Nov 9 2015 Illumina updated their adapter sequence document. Are there any notable changes that aren't present in the BBDuk adapter sequence fasta?

https://support.illumina.com/downloa...ce-letter.html

Brian Bushnell 11-25-2015 09:07 AM

Ah, thanks for notifying me... I'll look at it.

bluepoison 11-26-2015 03:11 PM

Thanks a lot for the response Brian.

I have single reads this time. Do you have any suggestions for the overrepresented sequences that do not match with any actual adapter (''No Hit" as described by fastqc)?

Quote:

Originally Posted by Brian Bushnell (Post 185199)
1) The best practice is to trim the actual adapter sequences used in your library.
2) The best way to find that is to ask the people who made the library.

But, if you have paired reads, you can also find your adapter sequences with BBMerge like this:

bbmerge.sh in1=read1.fq in2=read2.fq outa=adapters.fa

BBDuk includes all standard Illumina adapters in "/resources/adapters.fa". If you do not know which adapters were used, and are unable to find out, I recommend using that as the reference.

Since you are using single-ended reads, it's difficult to automatically empirically determine the adapter sequences. So, unless you can get them from the people who made the library, I suggest using that reference.


Brian Bushnell 11-26-2015 09:56 PM

@bluepoison: I suggest you try adapter-removal using BBDuk and adapters.fa, and see if fastQC still detects overrepresented sequences. If not, everything should be fine! But if it does, you may have a new adapter sequence, so please reply in that case.

@turnersd: Unfortunately... there are a lot of new adapter indexes in the latest Illumina letter that you linked - dozens. They are for human-specific tests, like autism, cancer, and other possibly genetic disorders. And as always, Illumina makes no effort to indicate which indexes go with which adapters. So, it looks like a huge amount of effort now to make a complete set of Illumina adapter sequences complete with indexes.

JGI does not do any human sequencing, so none of that is relevant to us. But for everyone else out there - I really hope Illumina, or someone in the community, compiles a full list of the new human-specific adapter sequences. Because there are so many, and I have no way to empirically determine whether the new sequences are correct (since we don't use them), it's not really possible for me to generate them. Illumina would provide the full, indexed adapter-sequences for trimming if they had the slightest concern for their end users, which they unfortunately do not appear to have.

So far, it's not clear to me which adapters go with new indexes, or why they even need new indexes for cancer versus autism, etc. Seems like a marketing ploy. But probably the new indices only affect amplicon sequencing and are irrelevant to randomly-shared libraries.


All times are GMT -8. The time now is 05:33 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.