Seqanswers Leaderboard Ad

**Joann** · 06-09-2011, 07:42 AM

test

Hi,
This is a situation where you have your choice among previously well characterized gene expression biology to see if your expectations pan out. Just do a good literature search to identify a particular cell based system and the characteristics you would expect reconfirm using the techniques you propose above. More time in the library=less time at the bench. Never let a few weeks at the bench save you from spending a few hours in the library.

**BAMseek** · 06-09-2011, 09:23 PM

Hi Joann,

Thanks for the suggestions. No doubt the best way to see if something works is to give it a try. I am definitely glad to hear suggestions from biologically-minded people like you, since my background is more in the computer sciences. I decided to just post the question to see if others thought the approach seemed reasonable before fully embarking down that path. Thanks!

**steven** · 06-09-2011, 11:39 PM

As with microarrays, It may be risky to have the expression level of a specific transcript only rely on a handful of supposedly isoform-specific probes. The higher the number of reads/probes, the lower the impact of probe-specific biases (GC content, relative position within transcript, etc). I believe that there is still too much unappreciated bias in RNA-seq experiments to let us define a reliable gold-standard probe set.

Plus you have to assume that the whole transcriptome of your species of interest is entirely known and well-established, which is almost systematically disproved by exploratory transcriptomic study-even in model organisms. OK, I know this is not crucial in the context of measuring the expression of known genes but I like to repeat that statement

Sounds like an alternative normalization approach to me: do not consider all of the signal, but just the part you predefined as significant/specific. Like Digital Gene Expression or SAGE. Not sure a reduction of the information is the best strategy, or at least not for now. I prefer what Cufflinks and Scripture are doing. Just my 2 cents.

**BAMseek** · 06-10-2011, 06:22 AM

Hi steven,
Thanks for the very useful comments! I agree that looking at only a portion of a transcript may not give the best idea of what is going on, due to non-uniform coverage of reads across the transcript and differences in GC-content. From my experiences looking at visualizations of alternative splicing, the eye is usually drawn to exons or splice junctions that only appear in one of the isoforms to determine which isoform is being expressed. So I thought it might be nice to simplify things and measure those interesting locations directly. I guess my concern is that when the data is transformed by doing a dash of GC-correction here and a pinch of quantile normalization there, the model gets pretty complex and it is difficult to get useful statistics out of it.

I think you are right about the similarities of the approach I described with SAGE/digital gene counting. Tag based approaches seem nice because you don't have to worry about differences in transcript length or fragmentation biases. Of course, you can't do some of the cooler stuff like transcript level expression or alternative splicing detection. I guess I am surprised I don't see tag-based approaches used more often for gene-level expression since it might simplify the analysis.

Thanks again. I will definitely explore the links you sent.

BAMseek

**Joann** · 06-10-2011, 09:16 AM

my assumptions

Hello:
I am assuming that you are talking about using the proposed approach to look at a set state of regulated expression in a well characterized cell system, not a random piece of species.

A situation of protein secretion, for example, or induction of hemoglobin synthesis comes to mind but there are many examples of differential isozyme expression in past literature as well.

From previous studies you would be able to tell just about how much message to expect around a target gene expression as well as a lot about what else is not being expressed, so what you can find outside expectations would be novel and interesting if it were real.

Tune the biological system for support of your methodology exploration.

**lh3** · 06-10-2011, 10:37 AM

Case (1): For longer reads, it is more important to perform local alignment, instead of glocal alignment most mappers are currently doing. Local alignment does not have the problems you are describing.

Case (2): What matters more is the mean instead of the variance and the statistical fluctuation due to variable transcript lengths can be corrected.

I think while case (1) might be problematic to a limited extend as we do not do local alignment often, case (2) is not a problem. Another argument against longer reads is the cost.

By using probes, you may be dropping informative data. It is a valid method, but I guess is less powerful than looking at the full data set (I cannot predict how much less). Also, my impression is with sophisticated tools such as cufflinks, measuring gene expression nowadays is not a particularly hard problem. It is not necessary to trade the information in data for reduced computing time.

**BAMseek** · 06-13-2011, 05:52 PM

Hi Heng,

Thanks alot for your reply! I agree that as reads get longer, more consideration will need to be made for local re-alignments. I wonder how important it is to reconsider the defaults used for many of the short-read aligners out there as read lengths go beyond what they were originally intended for. Maybe a shorter portion of long read can be used to quickly find potential hits, followed by a more exhaustive local alignment, similar to the BLAST approach. I know SHRiMP does a local Smith-Waterman, but at the expense of speed compared to the other aligners (at least that has been my experience).

I still think it is an open question on how to best do differential expression of genes. Cuffdiff looks at FPKM values - I've seen some suggestions that dividing by transcript length and total reads may be too simple for normalizing data (since a majority of the expression could be caused by a minority of the genes, and changes in highly expressed genes could affect total reads). Cuffdiff will optionally use the Bullard normalization in the FPKM computation, but how would you know when to use that or not? Some approaches that work off of raw counts would tell you not to normalize at all, but then I don't think they address the transcript length bias issue. If an approach works off of just the raw reads, I would think they would either need to look at equal sized regions (which is why I thought about aligning to equal-sized "probes") or somehow account for the differences in transcript length.

I admit I am still trying to wrap my head around all this, so please forgive me if I am mistaken.

thanks!
BAMseek

**lh3** · 06-14-2011, 07:05 AM

As to local alignment, you may consider bwa-sw, probably with "-T20" and possibly with "-z5". I kind of think given >100bp RNA-seq reads, we should do local alignment more often, but I am not in this field, so do not really know if this is a good idea.

**BAMseek** · 06-15-2011, 04:01 AM

As to local alignment, you may consider bwa-sw, probably with "-T20" and possibly with "-z5". I kind of think given >100bp RNA-seq reads, we should do local alignment more often, but I am not in this field, so do not really know if this is a good idea.

I will definitely try that out. Thanks!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 29 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Can NGS learn something from its older brother Microarrays?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News