View Single Post
Old 03-19-2014, 02:17 PM   #7
arcolombo698
Senior Member
 
Location: Los Angeles

Join Date: Nov 2013
Posts: 142
Default Response

Hi,
I may have a very stupid question here... new to this...
I just got my reads and checked (fastQC) that among the "overrepresented sequences" are full adaptors (universal and indexed, 100% over 50bp, so I assume it's the complete thing...)

>what is your data? often times you must use BlastN to examine the overrepresented data. Be warned for mRNA the libraries are smaller so this means that you should expect sequences to repeat.


I see that the adaptor sequences that people usually use are shorter, the inner fragments of these... and when I use only those, yes, these "overrepresented" sequences are not there anymore. I used trimmomatic with cmd ILLUMINACLIP:Cadaptor31.fa:1:30:10:8:true giving adaptor file:



>PrefixPE/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT # this being the 3' end of the TruSeq Universal adaptor (5'-3')
>PrefixPE/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC # this being the 3' end of the TruSeq Indexed adaptor (5'-3')
>PE/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT # this being the 3' end of the TruSeq Universal adaptor (5'-3')
>PE/1_rc
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
>PE/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATC # this being the 3' end of the TruSeq Indexed adaptor (5'-3')
>PE/2_rc
GATCGGAAGAGCACACGTCTGAACTCCAGTCAC


what is trimmomatic doing exactly?? when it detects an adaptor, it removes it until its 3' end...? if not I still should be seeing the remaining parts as overrepresented, right??


> I am not an expert but trimmomatic differs from other software because depending on the parameters given such as sliding window and the parameters right after the adapter sequence, it will trim out the listed sequences here in the given adapter file. This means that trimmomatic detects matches of the reverse compliments, it will omit them out. You can check this by using "grep" command after trimmomatic completes and you can see that if you grep the adapters in the raw data file they appear, however, if you grep the adaptors in the trimmed file they are official omitted.


should I give it instead an adaptor file with the full sequences?
>The current behaviour is to retain from the start of the read to the base before the adapter starts.

It is also possible, if you have short adapter sequences and a liberal match threshold, that false positives can be causing a problem.
arcolombo698 is offline   Reply With Quote