Dear RNA-Seq analysis experts,
This is my first time analysing RNA-Seq data; so I would appreciate your help with a couple of issues.
My data consist of 8 samples sequenced using Illumina’s standard paired-end RNA-seq protocol (which unfortunately at the time was NOT carried out in a strand-specific manner). The fragment size was 220bp which left around 100 bp after subtraction of adapters (2X58-60bp). Each pair (60 bp) in the dataset therefore has an approximately 20bp overlap.
I would like to use the bowtie/tophat/cufflinks pipeline and have a couple of questions regarding the analysis:
1. Although this post was very informative, I cannot decide which of these is a better strategy for analysing reads in my case:
a. Assembling the paired-end reads into 100bp single reads before…
b. Directly using the paired-end reads in tophat [This generates a negative (-20) inner distance between pairs but version 1.0.13 onwards seems to be able to handle this scenario (the –r option, I’m right???)].
2. Since the Illumina protocol was not strand-specific, is it a good idea to convert (correct word?) all the resulting mapping data or the sequence reads (after an initial round of mapping) so that it matches a single strand of the genome? I wonder if this strategy will help cufflinks better assemble and quantify the transcripts…
Thank you very much in advance for your help/suggestions/feedback…
Fred
This is my first time analysing RNA-Seq data; so I would appreciate your help with a couple of issues.
My data consist of 8 samples sequenced using Illumina’s standard paired-end RNA-seq protocol (which unfortunately at the time was NOT carried out in a strand-specific manner). The fragment size was 220bp which left around 100 bp after subtraction of adapters (2X58-60bp). Each pair (60 bp) in the dataset therefore has an approximately 20bp overlap.
I would like to use the bowtie/tophat/cufflinks pipeline and have a couple of questions regarding the analysis:
1. Although this post was very informative, I cannot decide which of these is a better strategy for analysing reads in my case:
a. Assembling the paired-end reads into 100bp single reads before…
b. Directly using the paired-end reads in tophat [This generates a negative (-20) inner distance between pairs but version 1.0.13 onwards seems to be able to handle this scenario (the –r option, I’m right???)].
2. Since the Illumina protocol was not strand-specific, is it a good idea to convert (correct word?) all the resulting mapping data or the sequence reads (after an initial round of mapping) so that it matches a single strand of the genome? I wonder if this strategy will help cufflinks better assemble and quantify the transcripts…
Thank you very much in advance for your help/suggestions/feedback…
Fred
Comment