Hi,
I'm new here and first of all I would like to thank for this wonderful forum. I will probably talk about something not new here but I've read a lot and after a while I felt lost and decided to post this with my exact questions
So, I have RNA-seq data from Illumina HiSeq but library prep was done with SureSelect Strand-Specific mRNA library prep protocol. So, I have paired-end strand specific data.
While doing the adapter trimming, I faced the following doubts:
1. I don't have the exact adapter sequences from SureSelect but by making a quick analysis with a script that I have that calculates duplicates and analyzing FastQC results, I can tell that they are similar to TruSeq adapters, except for the indexes. But does anyone can confirm this? Can I use the sequences from TruSeq to trim the SureSelect ones? What solutions are there for this?
2. I now that, due to read through, I'm expected to find the indexed adapter in read 1 at the 3' end, and the reverse complement of the universal adapter at read 2, at the 3'end. But, due to adapter dimmer, isn't it possible to find other combinations? For example, if two indexed adapters exist in read 2, isn't it possible to find the reverse complement of the indexed adapter in read 2, at 5' end? From your experience, does this happen often enough that it compensates trimming and potential the loss of good data?
3. What is the best to deal with paired-end data? Actually, I was using the -b option from Cutadapt to remove the adapters, without considering the paired-end information and I'm getting what I think are nice results (but my experience is not great). I'm considering the paired-end options of Cutadapt, but I don't understand very well how Cutadapt uses the paired-end info. Would, for example, Trimmomatic be better because it actually aligns the two reads and uses them to understand where the real sequences end and the adapters begin (palindrome).?
Thanks a lot in advance
Mafalda
I'm new here and first of all I would like to thank for this wonderful forum. I will probably talk about something not new here but I've read a lot and after a while I felt lost and decided to post this with my exact questions
So, I have RNA-seq data from Illumina HiSeq but library prep was done with SureSelect Strand-Specific mRNA library prep protocol. So, I have paired-end strand specific data.
While doing the adapter trimming, I faced the following doubts:
1. I don't have the exact adapter sequences from SureSelect but by making a quick analysis with a script that I have that calculates duplicates and analyzing FastQC results, I can tell that they are similar to TruSeq adapters, except for the indexes. But does anyone can confirm this? Can I use the sequences from TruSeq to trim the SureSelect ones? What solutions are there for this?
2. I now that, due to read through, I'm expected to find the indexed adapter in read 1 at the 3' end, and the reverse complement of the universal adapter at read 2, at the 3'end. But, due to adapter dimmer, isn't it possible to find other combinations? For example, if two indexed adapters exist in read 2, isn't it possible to find the reverse complement of the indexed adapter in read 2, at 5' end? From your experience, does this happen often enough that it compensates trimming and potential the loss of good data?
3. What is the best to deal with paired-end data? Actually, I was using the -b option from Cutadapt to remove the adapters, without considering the paired-end information and I'm getting what I think are nice results (but my experience is not great). I'm considering the paired-end options of Cutadapt, but I don't understand very well how Cutadapt uses the paired-end info. Would, for example, Trimmomatic be better because it actually aligns the two reads and uses them to understand where the real sequences end and the adapters begin (palindrome).?
Thanks a lot in advance
Mafalda
Comment