Seqanswers Leaderboard Ad

**rfilbert** · 12-26-2012, 09:25 AM

Actually, it only does call more genes/transcripts because of better fit, not because of multiple testing. The reason is that Partek Flow is not picking the model with the best p-value, it is picking the model with the best fit. As for making biological sense, it absolutely does in my mind, as it does not assume that all genes/transcripts are influenced by the same biological factors and it informs me which genes are influenced by which biological factors and how many genes are altered by which biological factors.

**jwfoley** · 01-07-2013, 07:15 AM

Originally posted by rfilbert View Post

There are multiple options for normalization, but I believe the default option is to simply normalize to the total number of reads for each sample. I don't think any normalization based on the length of the transcript (like RPKM) matters as for this analysis you are comparing the same transcript in different groups of samples.

But if you want to do any sort of statistics, e.g. to say whether a difference is significant, you need a variance model for the sampling error of your read counts. That does depend on transcript length, for whole-transcript RNA-seq. doi:10.1186/1745-6150-4-14

Originally posted by rfilbert View Post

Actually, it only does call more genes/transcripts because of better fit, not because of multiple testing. The reason is that Partek Flow is not picking the model with the best p-value, it is picking the model with the best fit. As for making biological sense, it absolutely does in my mind, as it does not assume that all genes/transcripts are influenced by the same biological factors and it informs me which genes are influenced by which biological factors and how many genes are altered by which biological factors.

It may make biological sense to you but it doesn't make statistical sense. "Influenced by different factors" doesn't mean "let's do a different kind of test for each row of the matrix". If you have a hypothesis about the influences, you should build it into your model.

**rfilbert** · 01-07-2013, 05:45 PM

Really? Are you sure? In my case, the samples are from a breast cancer study. Some of the samples are tumor and some are normal. Some are ER+ and some are ER-. Partek Flow tested each gene to see whether it was differentially expressed between tumor vs. normal, ER+ vs. ER-, or both (interaction) - or whether the gene was not affected by either factor. What problem do you have with that?

**rfilbert** · 01-07-2013, 05:49 PM

I guess you are saying that as a researcher, I can only have 1 hypothesis, and only about 1 factor, and it must be the same for every gene. That is not how a biologist thinks. We want to learn what is going on with the biology and not tell the biology what hypothesis it must answer.

**jb2** · 01-07-2013, 06:48 PM

Originally posted by joxcargator73 View Post

I believe that replicates are very important to have good quality results. RNA-seq is becoming cheaper and cheaper but still quite expensive for small labs. In this case I also believe that RNAseq without replicates could be used as screening and then confirm by replicating qRT-PCR and based you conclusion on these results.

This is about the only way you could use a 1 vs. 1 experiment in a way that would be legitimate. One other potential option is to rank your genes by fold change and use something like Gene Set Enrichment Analysis to see if any major biological pathways jump out from that list. This could potentially give you a pathway to go after for validation with PCR. Either way, you aren't really guaranteed a positive result if you have no idea how high the variance is and you can't estimate the variance without biological replicates.

**rfilbert** · 01-07-2013, 06:55 PM

Well, you could estimate variance if you really wanted, and could rank genes by p-value and/or fold change. One way you could do that is to assume a Poisson distribution, where the variance is equal to the mean, and use something like a log-likelihood test. All that said, I certainly agree that replicates - that is INDEPENDENT BIOLOGICAL REPLICATES are required to estimate variance within your biological population that you wish to make an inference about.

**jwfoley** · 01-07-2013, 09:22 PM

Originally posted by rfilbert View Post

I guess you are saying that as a researcher, I can only have 1 hypothesis, and only about 1 factor, and it must be the same for every gene. That is not how a biologist thinks. We want to learn what is going on with the biology and not tell the biology what hypothesis it must answer.

DESeq and other packages allow you to test multifactorial experimental designs.

Originally posted by rfilbert View Post

One way you could do that is to assume a Poisson distribution, where the variance is equal to the mean, and use something like a log-likelihood test.

You may want to see the earlier responses in this thread.

**chadn737** · 01-07-2013, 09:36 PM

Originally posted by rfilbert View Post

I guess you are saying that as a researcher, I can only have 1 hypothesis, and only about 1 factor, and it must be the same for every gene. That is not how a biologist thinks. We want to learn what is going on with the biology and not tell the biology what hypothesis it must answer.

That is not at all what jwfoley said. He said that having a multifactorial design is not justification for fitting each gene to a different set of statistical tests.

What Biological justification is there to fit each gene individually? What is the Biological explanation for this? I too am a Biologist and I know of no Biological reason to justify this. However, as a Biologist, I also know that the average Biologist knows little statistics and how a Biologist thinks is not justification for choosing your statistics.

One thing I think you are ignoring is that RNA-seq data, like all measures of gene expression is subject to its own kinds of biases and technical variation. The variation you see in RNA-seq data is not purely Biological and Biological reasoning cannot justify all the variation, particularly for genes with fewer reads.

**rfilbert** · 01-07-2013, 11:16 PM

In my case, some of the genes were differentially expressed between tumor and normal, while others were not, but some of them were differentially expressed between ER+ and ER-. As a fellow biologist, why is this so confusing for you? I also asked a professor of statistics at UCSD that I work with about this and he said he thought it made perfect sense. It certainly made sense in my research, and obviously made sense to the statisticians at Partek who have always been very helpful to me as well.

**chadn737** · 01-07-2013, 11:25 PM

Originally posted by rfilbert View Post

In my case, some of the genes were differentially expressed between tumor and normal, while others were not, but some of them were differentially expressed between ER+ and ER-. As a fellow biologist, why is this so confusing for you? I also asked a professor of statistics at UCSD that I work with about this and he said he thought it made perfect sense. It certainly made sense in my research, and obviously made sense to the statisticians at Partek who have always been very helpful to me as well.

That is not what I am talking about, its not what jwfoley is talking about.

Having multiple factors is different than individually fitting each gene to one of five distributions which you claim Partek does. Both jwfoley and I have criticized the latter while you keep talking about the former.

As was pointed out to you earlier, a program like DESeq is capable of multifactorial designs like yours. Of course it makes biological sense that you would find differentially expressed genes under one set of factors and not another. That's not the issue. What I am asking you to support is your claim that fitting each gene individually to a different distribution makes biological sense, let alone statistical sense.

**rfilbert** · 01-07-2013, 11:43 PM

Well if you want to debate this with the statisticians at Partek, feel free. I use it and like as do many of my colleagues. As you said, you are not a statistician either, and I don't know about jwfoley, but I have spoken with statisticians at Partek and I find them quite friendly, helpful, and knowledgeable.

Regarding the confusion between us, Partek Flow not only fits 5 different distributions to each gene, but also multiple factors, so for example, if my candidate models are:
1. Tumor status
2. ER status
3. Tumor + ER
4. Tumore + ER + Tumor*ER (interaction)
then they fit 5 (distributions) x 4 (model designs) = 20 statistical tests

Hopefully that clears up the confusion. Now, on to your argument that all genes or all transcripts follow the same distribution, I don't think I need to be a card carrying statistician to know that is ridiculous. If it was obvious which distribution they all follow, why hasn't the community agreed which distribution that is?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News