I have been given Illumina paired end sequences for a chicken genome. I do not have access to the experiment protocol and the adapter files nor do I know which Illumina machine or sequencing adapters were used.
I'd like to perform a de novo assembly but would like to trim adapters first, if they haven't been trimmed already. I know there are a few tools I can use to trim my paired end sequence files like FastqcMcf and Trimmomatic, but I need to have an adapter .fa file or know which adapters were used (TruSeq2, 3, etc). Also is there a way to get an updated adapter list for Illumina sequences? The Trimmomatic program that I just installed only has Nextera, TruSeq2 and 3 .fa files.
How can I discover which adapter sequences were used?
Thank you for your time.
PS - I considered looking at overrepresented sequences reported by FastQC but I can only look for kmers of 5 to 10 nucleotides in length. Below is sample header from one line in my forward file...
FASTQC had the following for Encoding:
Encoding Sanger / Illumina 1.9
@HWI-ST531R:257:H7R0WADXX:2:1101:1240:1994 1:N:0:GTGAAA
CTCACTTTGCCATGTTTCTATTTGAACAGATGATAATTTTACCTTTTGGGTGAAAAATAAAATACGCCTCTCTTTGCACTCTGTTATTTGCCAAAGTAGAG
+
@CCFFFFFHHHHHIIJJJJIJJJJJJJJJJJIIIIHJIEHIIJJJJIJJJBFHIJJJDIIJJEGHJJJIIJJJHHHHHHFFFFDFCDEEEEEDDDD@CDDC
I'd like to perform a de novo assembly but would like to trim adapters first, if they haven't been trimmed already. I know there are a few tools I can use to trim my paired end sequence files like FastqcMcf and Trimmomatic, but I need to have an adapter .fa file or know which adapters were used (TruSeq2, 3, etc). Also is there a way to get an updated adapter list for Illumina sequences? The Trimmomatic program that I just installed only has Nextera, TruSeq2 and 3 .fa files.
How can I discover which adapter sequences were used?
Thank you for your time.
PS - I considered looking at overrepresented sequences reported by FastQC but I can only look for kmers of 5 to 10 nucleotides in length. Below is sample header from one line in my forward file...
FASTQC had the following for Encoding:
Encoding Sanger / Illumina 1.9
@HWI-ST531R:257:H7R0WADXX:2:1101:1240:1994 1:N:0:GTGAAA
CTCACTTTGCCATGTTTCTATTTGAACAGATGATAATTTTACCTTTTGGGTGAAAAATAAAATACGCCTCTCTTTGCACTCTGTTATTTGCCAAAGTAGAG
+
@CCFFFFFHHHHHIIJJJJIJJJJJJJJJJJIIIIHJIEHIIJJJJIJJJBFHIJJJDIIJJEGHJJJIIJJJHHHHHHFFFFDFCDEEEEEDDDD@CDDC
Comment