View Single Post
Old 11-08-2012, 11:34 PM   #9
I like code
Location: San Diego, CA, USA

Join Date: Sep 2009
Posts: 438

I started learning to process RNA Seq data in 2009. In the early versions of Tophat it would actually make a file of totally naive expressions and we would just compute fold changes on them and pick genes with more than two fold change as possibly interesting. One of the post docs I was working with at the time would look through the list, sorted by fold change, and compare the fold changes and expression levels to what he could see in the bed graph histogram a we loaded into the genome browser. He would cross out genes from the list that looked suspicious (ones with strange coverages that didn't make sense with the expression data) and over the course of several months, along with whatever else he was doing, those gene lists proved to be very useful in his project. Some of that data went to this:

I still think this basic approach is viable, for gene level analysis even with all the work that has been done in the field. Things are better now that running more samples is cheaper so that we can have a few replicates to cut back on false positives. I think that's what will improve things. The cheaper it is per million reads per sample the more replicates people can run and then there's less of a need for modeling and estimates for DE and we will get better mean expression estimates per condition.

I agree it is a real Wild West kinda thing. I think the tools are looking better right now than in the past. Tophat has been working for me for years. I like using bowtie to align things quickly and bowtie2 is great for gapped and local alignments plus its fast for long paired reads. Transcriptome assembly is still very mysterious. I'd think that someone would just release a tool that gives you all of the possible splice variants based on detected exons and junctions but it seems like people can't help but try to make their software capable of making decisions for the researchers and bypassing that basic level of information. I guess that's the difference between publication type tools and homemade tools.

I'll be digging into the eXpress data tomorrow. So far I like it but I need to spend a lot of time looking at the results and the raw data to see how much of it makes sense.
sdriscoll is offline   Reply With Quote