Hello all. This seems to be a routinely discussed question with many answers around here, however I could not use the answers provided in other questions to solve my query. I have some mi-RNA seq data from Illumina Hiseq platform. Thats about all the information I have with me. I have not been able to identify the vendor who has done the sequencing, so approaching them is out of question. My problem is as follows : I have single end sequencing reads of 54 base length. I am trying to identify a good way to trim them. I have no idea what adapter to use for read trimming, so I have been stupidly looking a t other posts on here trying to make sense. Long story short, as suggested on some posts, my FastQc over represented sequence output gives me these two sequences as the adapter sequences in one sample :
AGCCGCCTGGATACCGCAGCTAGGAATAATGGAATTCTCGGGTGCCAAGG 189653 0.410031497 Illumina Small RNA Adapter 2 (100% over 21bp) CGCGACCTCAGATCAGACGTGGCGACCCGTGGAATTCTCGGGTGCCAAGG 184505 0.398901475 Illumina Small RNA Adapter 2 (100% over 21bp)
and these 3 sequences as the adapter in a different sample.
AGCCGCCTGGATACCGCAGCTAGGAATAATGGAATTCTCGGGTGCCAAGG 189653 0.410031497 Illumina Small RNA Adapter 2 (100% over 21bp) CGCGACCTCAGATCAGACGTGGCGACCCGTGGAATTCTCGGGTGCCAAGG 184505 0.398901475 Illumina Small RNA Adapter 2 (100% over 21bp) TTGCTGTGATGACTATCTTAGGACACCTTTGGAATTCTCGGGTGCCAAGG 50032 0.108169635 Illumina Small RNA Adapter 2 (100% over 21bp)
Now these are two different samples run in different lanes. I do not know if sequencing was pooled with an indexing adapter (although that is very likely given the total number of reads being small.) after matching over the four sequences I have deduced that TGGAATTCTCGGGTGCCAAGG is my illumina adapter sequence. The problem is I cannot find any mention of this being a adapter sequence in any of illumina's official documents on their FTP, other than this document http://support.illumina.com/content/...15061994-a.pdf. Is this the correct sequence?
AGCCGCCTGGATACCGCAGCTAGGAATAATGGAATTCTCGGGTGCCAAGG 189653 0.410031497 Illumina Small RNA Adapter 2 (100% over 21bp) CGCGACCTCAGATCAGACGTGGCGACCCGTGGAATTCTCGGGTGCCAAGG 184505 0.398901475 Illumina Small RNA Adapter 2 (100% over 21bp)
and these 3 sequences as the adapter in a different sample.
AGCCGCCTGGATACCGCAGCTAGGAATAATGGAATTCTCGGGTGCCAAGG 189653 0.410031497 Illumina Small RNA Adapter 2 (100% over 21bp) CGCGACCTCAGATCAGACGTGGCGACCCGTGGAATTCTCGGGTGCCAAGG 184505 0.398901475 Illumina Small RNA Adapter 2 (100% over 21bp) TTGCTGTGATGACTATCTTAGGACACCTTTGGAATTCTCGGGTGCCAAGG 50032 0.108169635 Illumina Small RNA Adapter 2 (100% over 21bp)
Now these are two different samples run in different lanes. I do not know if sequencing was pooled with an indexing adapter (although that is very likely given the total number of reads being small.) after matching over the four sequences I have deduced that TGGAATTCTCGGGTGCCAAGG is my illumina adapter sequence. The problem is I cannot find any mention of this being a adapter sequence in any of illumina's official documents on their FTP, other than this document http://support.illumina.com/content/...15061994-a.pdf. Is this the correct sequence?
Comment