Hi Jay, thanks for getting back to me so quickly. I have certainly thought about PCR duplicates being the source of variation as we did PCR amplify before sequencing. The problem is my data is single end Illumina data and I don't know how to differentiate whether a duplicate read is a result of PCR amplification or a genuine indication of a copy of the mRNA. I think if I had paired end reads, I could use the size distribution of the library to eliminate PCR duplicates. I can perhaps apply the assumption of the library size to single end reads too, but I need to think more about this? If any one has other ideas on how to detect PCR duplicates in RNA seq data, please let me know.
Also, I am probably missing something but since you didn't find anything with free energy, how are you convinced that RNA sec structure interfering with the transcriptase could be a source? Are you thinking the algorithm for detecting free energy is not efficient? if the RNA is sheared before creating the cDNA, I think it should eliminate sec structure formations (though I could be wrong).
Finally, this is a stupid question, at what step is the GC coverage variation introduced?
Also, I am probably missing something but since you didn't find anything with free energy, how are you convinced that RNA sec structure interfering with the transcriptase could be a source? Are you thinking the algorithm for detecting free energy is not efficient? if the RNA is sheared before creating the cDNA, I think it should eliminate sec structure formations (though I could be wrong).
Finally, this is a stupid question, at what step is the GC coverage variation introduced?
Comment