Seqanswers Leaderboard Ad

**dpryan** · 12-05-2011, 10:20 AM

Originally posted by turnersd View Post

My questions are:

1. How are tests for differential splicing (#5) different from tests for differential coding output (#6)?
2. How are the tests for differential gene expression summing over gene ids (#2) different that tests for gene expression summing over CDS ids (#3)?
3. Tests #5-7 above are testing something fundamentally different than the tests for differential gene expression (tests #1-4). I'd like a good explanation of how these groups of tests differ. E.g. how does #3 (differential expression over CDS) differ from #6 (differential coding output).

Thanks very much in advance.

1. Different splice forms may have the same coding sequence. For example, the 5' UTR may be different.

2. A single gene may produce multiple splice forms containing different coding exons, resulting in different CDS.

3. Not familiar with these. Although #7 isn't really like the others since it's concerned with the first exon only.

**billstevens** · 04-20-2012, 03:10 PM

Hey guys,

Does anyone have a good answer to turnersd's 3rd question? It does seem as if 5-7 are derived from 1-4. And it certainly seems like #3 is pretty much the same as #6.

**turnersd** · 04-21-2012, 03:47 AM

I believe 1-4 are grouping (or not grouping, in the case of isoforms) transcripts at the level of the gene (2), coding sequence (3), and transcription start site use (4), and testing for differential expression of these groups of transcripts between conditions. And I believe #5-7 are looking at whether there is significantly different TSS usage, or an imbalance in TSS usage overall? Does this make sense?

**billstevens** · 04-23-2012, 05:23 AM

Yes it makes sense. I'm just noticing there is a lot of overlap, which is good, I get that that allows you to use whichever method one is comfortable with. I guess the main issue I'm having is deciding which method to use. Does anyone know of any papers that actually use cuffdiff (that aren't published in PLOS One)?

**sdriscoll** · 04-23-2012, 04:26 PM

All of the *_exp.diff files cuffdiff produces are literally differential expression outputs. The remaining outputs test the probability that between any pairwise test there is a significant change in the balance of any locus. Note that those files only include results for genes that have at least 2 splice variants.

For example, say a gene has two isoforms A and B and you have two conditions 1 and 2. In condition 1 the balance of expression at those isoforms is A=0.3 and B=0.7 where those sum up to 1 or 100% of the expression at that locus. Say cuffdiff finds that in condition 2 the balance has changed such that A=0.8 and B=0.2. Depending on the variability across replicates, of course, that change may end up being reported statistically significant. This result would be found in the splicing.diff file.

You can apply the same thinking to make sense out of the cds.diff and promoters.diff files. Again each of these files only test genes with more than one isoform.

*Edit*

I guess you can think of those three files as a more general result than the corresponding *_exp.diff files. Since the *_exp.diff files specifically test each gene/cds/isoform/tss they don't give you a number that tells you if there's an overall change in the expression across a locus. Whether or not these files are interesting to you probably just depends on what it is you're looking for. If you're generally interested in genes that aren't necessarily differentially expressed but might be producing different amounts of the different proteins they code then you might start with the cds.diff file for your gene list.

**billstevens** · 04-30-2012, 10:00 AM

Hey guys,

So I've been playing with this stuff a bit more, and I was hoping you guys could shed some light on this.

So the splicing.diff file and the tss_group file are the exact same, except splicing.diff uses Jenson and Tss uses p-value. No idea why the authors included both. But much more importantly, why is it even called splicing.diff? Splicing.diff and tss_group measure differentiation between samples based on transcription start site, so shouldn't that be actually differential promoter use?

**sdriscoll** · 04-30-2012, 10:58 AM

so splicing.diff, cds.diff and promoters.diff measure something different that each of the *_exp.diff files. Instead of differential expression they measure the significance of the difference in expression balance at any given loci. Also splicing.diff is telling us something about differential splicing even between multiple isoforms that have the same promoter. therefore promoters.diff is more generalized than splicing.diff.

I can think of a possible example that makes these files make more sense. Say I have two samples, A and B, and I'm wondering if sample B tends to have different promoter useage than sample A. I could figure this out based on the output of isoform_exp.diff or tss_group_exp.diff but the file promoters.diff tells me this directly. We get a p-value telling us if sample B has significantly different promoter usage at any gene loci relative to sample A.

The same scenario could come up for coding sequence. Is sample B producing significantly different proteins relative to sample A? The cds.diff file gives you that estimation. Now you don't have to parse the isoform_exp.diff file and figure out which ones are differentially expressed and which ones have CDS regions, etc.

As for splicing.diff this file gives you a general measure of differential splicing between samples. So in sample B is there a significantly different balance of expression across isoforms for any gene loci relative to sample A. This isn't as specific as asking, "which isoforms are differentiall expressed", it's just a general measurement. In other words you can very quickly have a gene list for those genes that seem to be differentially spliced in sample B relative to sample A. again you could probably build this list by parsing isoform_exp.diff but you'd have to filter out single isoform genes and you'd also be buried in a file with 90,000 rows instead of one that's already summarized into 30,000 loci (or less).

does that make sense?

**bbm** · 02-08-2020, 09:30 AM

splicing file from tophat

Originally posted by sdriscoll View Post

so splicing.diff, cds.diff and promoters.diff measure something different that each of the *_exp.diff files. Instead of differential expression they measure the significance of the difference in expression balance at any given loci. Also splicing.diff is telling us something about differential splicing even between multiple isoforms that have the same promoter. therefore promoters.diff is more generalized than splicing.diff.

I can think of a possible example that makes these files make more sense. Say I have two samples, A and B, and I'm wondering if sample B tends to have different promoter useage than sample A. I could figure this out based on the output of isoform_exp.diff or tss_group_exp.diff but the file promoters.diff tells me this directly. We get a p-value telling us if sample B has significantly different promoter usage at any gene loci relative to sample A.

The same scenario could come up for coding sequence. Is sample B producing significantly different proteins relative to sample A? The cds.diff file gives you that estimation. Now you don't have to parse the isoform_exp.diff file and figure out which ones are differentially expressed and which ones have CDS regions, etc.

As for splicing.diff this file gives you a general measure of differential splicing between samples. So in sample B is there a significantly different balance of expression across isoforms for any gene loci relative to sample A. This isn't as specific as asking, "which isoforms are differentiall expressed", it's just a general measurement. In other words you can very quickly have a gene list for those genes that seem to be differentially spliced in sample B relative to sample A. again you could probably build this list by parsing isoform_exp.diff but you'd have to filter out single isoform genes and you'd also be buried in a file with 90,000 rows instead of one that's already summarized into 30,000 loci (or less).

does that make sense?

So in my case I looked at the splice.diff, there is no significant hit.
If I digged in the isoform_exp.diff, there are isoforms who are differently expressed. If one gene_id appears twice or more, does it mean there is a splicing event occured in that gene?
thanks

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, Yesterday, 06:35 AM	0 responses 14 views 0 likes	Last Post by seqadmin Yesterday, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 19 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 18 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

What is the cuffdiff output really telling you?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News