Hi guys, I'm trying to use RNA-seq to identify RNA editing sites in mouse. I'm a summer student, so I don't really have any experience with RNA-seq and I've run into a couple of things that have confused me so far. I'm mostly following the methods used in these papers: http://www.nature.com/ng/journal/v43...ll/ng.872.html http://www.sciencemag.org/content/ea...07018.abstract. I'm using 42 bp single end reads from Illumina, and I aligned to the RefSeq known genes, omitting introns, with Novoalign.
1) What quality score do people tend to use as a cutoff when trimming reads? I trimmed the reads 5' and 3' to a quality score of 30, and then threw out reads with length less than 35, but I noticed that I lost a lot of reads by doing this. How beneficial is trimming reads?
2) When visualizing my data with Tablet, I noticed that there were some regions where it seemed like all the reads that aligned were from one strand only. Is there any reason why this would be the case? Our library prep was not strand-specific, so I would expect a fairly even split between each strand (which I do see for the most part).
3) Does anybody know of software/papers that look at probabilities for RNA-editing? Currently I'm just picking out any bases where 10% of the reads differ, which seems rather unscientific.
4) After identifying possible RNA editing sites, I noticed that rather than A-to-G editing being most prevalent, as I would expect, it seems that each type of conversion is equally prevalent. Does anybody have an idea why this might be? Something with the library prep maybe?
Thanks for any help you can provide.
1) What quality score do people tend to use as a cutoff when trimming reads? I trimmed the reads 5' and 3' to a quality score of 30, and then threw out reads with length less than 35, but I noticed that I lost a lot of reads by doing this. How beneficial is trimming reads?
2) When visualizing my data with Tablet, I noticed that there were some regions where it seemed like all the reads that aligned were from one strand only. Is there any reason why this would be the case? Our library prep was not strand-specific, so I would expect a fairly even split between each strand (which I do see for the most part).
3) Does anybody know of software/papers that look at probabilities for RNA-editing? Currently I'm just picking out any bases where 10% of the reads differ, which seems rather unscientific.
4) After identifying possible RNA editing sites, I noticed that rather than A-to-G editing being most prevalent, as I would expect, it seems that each type of conversion is equally prevalent. Does anybody have an idea why this might be? Something with the library prep maybe?
Thanks for any help you can provide.