Seqanswers Leaderboard Ad

**thinkRNA** · 08-10-2010, 02:24 PM

Hi Jay, thanks for getting back to me so quickly. I have certainly thought about PCR duplicates being the source of variation as we did PCR amplify before sequencing. The problem is my data is single end Illumina data and I don't know how to differentiate whether a duplicate read is a result of PCR amplification or a genuine indication of a copy of the mRNA. I think if I had paired end reads, I could use the size distribution of the library to eliminate PCR duplicates. I can perhaps apply the assumption of the library size to single end reads too, but I need to think more about this? If any one has other ideas on how to detect PCR duplicates in RNA seq data, please let me know.

Also, I am probably missing something but since you didn't find anything with free energy, how are you convinced that RNA sec structure interfering with the transcriptase could be a source? Are you thinking the algorithm for detecting free energy is not efficient? if the RNA is sheared before creating the cDNA, I think it should eliminate sec structure formations (though I could be wrong).
Finally, this is a stupid question, at what step is the GC coverage variation introduced?

**thinkRNA** · 08-10-2010, 02:34 PM

I guess you don't need to answer the last stupid question as I found a paper that explains it really well.

http://nar.oxfordjournals.org/cgi/content/full/36/16/e105

**jay** · 08-11-2010, 01:56 AM

I didn't get anywhere with free energy, possibly because of the sequence we are using - a 10kb RNA genome - I found tools which would predict its shape, and that would predict free energy for shorter sequences, but I couldn't find a tool quickly online that would give me a free energy estimate per base for such a long sequence, so I gave up after a day or so of looking, as we were not interested in quantification per se. If you have any ideas of a good tool for this I'd definitely be interested to give it a go. You may be right about shearing controlling for sec structures, I will talk to our experimentalist about this, as his thought was that secondary structure would be an issue.

**thinkRNA** · 08-11-2010, 08:56 AM

have you tried mfold?

http://mfold.bioinfo.rpi.edu/

I haven't used it but a colleague says its the one of the best out there.
EDIT: it takes max 1500 bases, so I guess you'll have to chop you RNA sequence (!)

**Xi Wang** · 08-11-2010, 11:26 PM

There are two papers studying on RNA-seq biases:
Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biology.
Biases in Illumina transcriptome sequencing caused by random hexamer priming. NAR.

They may help you on this topic.

Topics	Statistics	Last Post
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, Today, 07:29 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 07:29 AM
Genetic Barcodes and Single-Cell Sequencing Illuminate Tumor Initiation and Chemoresistance in Breast Cancer by seqadmin Started by seqadmin, 10-15-2024, 06:35 AM	0 responses 11 views 0 likes	Last Post by seqadmin 10-15-2024, 06:35 AM
Study Identifies Key Protein Involved in DNA Replication Process by seqadmin Started by seqadmin, 10-14-2024, 02:44 PM	0 responses 12 views 0 likes	Last Post by seqadmin 10-14-2024, 02:44 PM
New Computational Methods Advance Genomic Studies Across Multiple Fields by seqadmin Started by seqadmin, 10-11-2024, 06:55 AM	0 responses 19 views 0 likes	Last Post by seqadmin 10-11-2024, 06:55 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News