Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cuffdiff 2.0: figure this one out...

    I've been running the new release of cuffdiff to see how it does the last day or two. It certainly produces more conservative gene lists and it seems to be better at throwing out genes that might be DE on average across replicates but have expression that seems erratic. I love the new output files that allow you to "dig in" a little more than before to see what's going on.

    So that's what I'm doing and I found something strange.

    I've run a 3 vs 3 wt vs mutant test with cuffdiff. Viewing the scatter plot of condition 2 vs condition 1 the overall plot looks good. there are several genes that stand out pretty far from the main body of the scatter with FPKMs > 100 in one condition or the other but they are not called significant. So I wanted to take a look at those. The first one was "Snora64" (i'm working with Mouse).

    I tracked it down in several of the cuffdiff outputs:

    gene_exp.diff:
    Code:
    XLOC_009930	XLOC_009930	Snora64	chr17:24857007-24858872	wt	ko	OK	52.6625	172.027	1.70779	-0.171531	0.863806	0.999999	no
    isoform_exp.diff
    Code:
    uc008ayc.1	XLOC_009930	Snora64	chr17:24857007-24858872	wt	ko	OK	52.6625	172.027	1.70779	-0.171531	0.863806	0.999999	no
    tss_group_exp.diff
    Code:
    TSS13588	XLOC_009930	Snora64	chr17:24857007-24858872	wt	ko	OK	52.6625	172.027	1.70779	-0.171531	0.863806	0.999999	no
    Each of these files reports that this gene has FPKM of 52.6625 in the wt condition and 172.027 in the ko condition. So I figured there must be some wacky variance in this gene across replicates in each condition so I checked out the new file genes.read_groups_tracking to see how the gene is expressed and how many reads it received across conditions. This is where I get a little confused.

    genes.read_groups_tracking:
    Code:
    XLOC_009930	ko	0	2.00145	1.45042	1.50285	0.822565	-	OK	Snora64
    XLOC_009930	ko	1	0	0	0	0	-	OK	Snora64
    XLOC_009930	ko	2	1.00112	0.95579	0.99034	0.542051	-	OK	Snora64
    XLOC_009930	wt	0	0	0	0	0	-	OK	Snora64
    XLOC_009930	wt	1	1	1.14625	1.11053	0.607837	-	OK	Snora64
    XLOC_009930	wt	2	0	0	0	0	-	OK	Snora64
    This file shows the FPKM of this gene across each of the replicates in both conditions in the 7th column (one left of the '-' column). Those expressions are all less than 1. So why is the expression reported to be so high in every other file? Other genes with comparable expression in gene_exp.diff or genes.fpkm_tracking, when looked up in this file, match up pretty well. I'd believe the information in this file based on what the coverage looks like across the locus this gene is in over what is reported in the other files.

    There's actually several of these mis-matched expressions in my output - most of them are these same type of genes (short, single exon genes in intergenic regions of other genes). It's distracting to get odd expression values in the output like this. So why does it happen...and why is the more "correct" expression reported in genes.read_groups_tracking but a different, and much higher, expression level reported in the differential expression output files? I'm sure nobody can answer that one except Cole but I think it's good to report odd findings like this.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
31 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
32 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
28 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Working...
X