Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • using DEXSeq one gene with only one exon - problem

    Hello,

    I have posted the problem that I meet on the bioconductor mailing list bioc-sig- sequencing but I have not got any answer yet. So I will try here too !

    I am using DEXseq for exon differential analysis. The analysis seems to work
    well on genes with several exons, but I have a problem with genes with only
    one exon.
    Let me present you my dataset:

    pData(ecs)
    sizeFactor condition replicate type
    Raman_1 0.9638822 nonvir 1 paired-end
    HM1_1 1.4000715 vir 1 paired-end
    Raman_2 1.0314049 nonvir 2 paired-end
    Raman_3 1.0734133 nonvir 3 paired-end
    HM1_2 0.7801269 vir 2 paired-end
    HM1_3 0.9488409 vir 3 paired-end

    table( table (geneIDs(ecs)))
    1 2 3 4 5 6 7 8
    3149 893 874 206 100 42 18 4
    As you can see, I have 3149 genes (out of 5286) which have only 1 exon.
    There are 9291 exons.

    res1 <- DEUresultTable(ecs)

    > summary(res1)
    geneID exonID dispersion_CR_est
    EHI_051420 : 8 E001 :5286 Min. :0.000e+00
    EHI_070690 : 8 E002 :2137 1st Qu.:7.176e-03
    EHI_169670 : 8 E003 :1244 Median :1.693e-02
    EHI_175010+EHI_175000.1: 8 E004 : 370 Mean :5.546e-02
    EHI_005010 : 7 E005 : 164 3rd Qu.:3.641e-02
    EHI_042710 : 7 E006 : 64 Max. :1.300e+01
    (Other) :9245 (Other): 26 NA's :3.153e+03
    dispersion pvalue padjust
    Min. :2.010e-02 Min. : 0.0000 Min. : 0.0000
    1st Qu.:2.202e-02 1st Qu.: 0.1150 1st Qu.: 0.4597
    Median :2.628e-02 Median : 0.4054 Median : 0.8086
    Mean :3.229e+04 Mean : 0.4269 Mean : 0.6775
    3rd Qu.:4.214e-02 3rd Qu.: 0.7133 3rd Qu.: 0.9510
    Max. :1.000e+08 Max. : 1.0000 Max. : 1.0000
    NA's :3153.0000 NA's :3153.0000
    There are 3153 NA's values. When checking out what were those NA's values, I
    realized that they were all the genes with only one exon (and only 4 more
    genes in the other cases). I doubt that it is by chance... And they don't
    have weird read counts. Check for example EHI_000010, EHI_000410,
    EHI_000420, EHI_000430, EHI_000440:

    geneCountTable(ecs)[1:21,]
    Raman_1 HM1_1 Raman_2 Raman_3 HM1_2 HM1_3
    EHI_000010 391 1364 464 449 457 560
    EHI_000130 18806 8116 21890 24994 6509 14067
    EHI_000240 24057 21825 15365 26903 21101 27841
    EHI_000250 3374 2689 3938 4854 2310 3556
    EHI_000290 458 3721 437 550 1716 1054
    EHI_000410 896 1041 1086 978 905 745
    EHI_000420 140 220 244 117 140 78
    EHI_000430 583 2043 304 634 1398 1197
    EHI_000440 3103 15309 5358 3619 9152 10392
    EHI_000450 358 559 991 516 281 564
    EHI_000460.1 357 118 449 425 43 746
    EHI_000470.1 408 2657 297 397 946 909
    EHI_000480 842 1702 858 778 841 847
    EHI_000490 187 648 221 239 370 249
    EHI_000500 329 506 454 402 244 555
    EHI_000520 482 1615 883 440 925 751
    EHI_000530 1794 1588 1751 1966 1099 1860
    EHI_000540 2857 2504 3706 4152 1479 2547
    EHI_000550 1967 3999 1241 1904 2505 2448
    EHI_000560 152 434 239 165 273 175
    EHI_000570 1209 2142 2183 1213 1905 2662


    res1[1:20,]
    geneID exonID dispersion_CR_est dispersion pvalue
    EHI_000010:001 EHI_000010 E001 NA 0.02377712 NA
    EHI_000130:001 EHI_000130 E001 0.074669451 0.07466945 0.48673539
    EHI_000130:002 EHI_000130 E002 0.128972813 0.12897281 0.94279527
    EHI_000130:003 EHI_000130 E003 0.019602219 0.02031871 0.84327914
    EHI_000130:004 EHI_000130 E004 0.067337196 0.06733720 0.70429765
    EHI_000240:001 EHI_000240 E001 0.043753997 0.04375400 0.74939865
    EHI_000240:002 EHI_000240 E002 0.058562700 0.05856270 0.74939865
    EHI_000250:001 EHI_000250 E001 0.031440630 0.03144063 0.70444841
    EHI_000250:002 EHI_000250 E002 0.037457450 0.03745745 0.70444841
    EHI_000290:001 EHI_000290 E001 0.021113888 0.02396819 0.05100089
    EHI_000290:002 EHI_000290 E002 0.007626346 0.03471905 0.95676220
    EHI_000290:003 EHI_000290 E003 0.036889359 0.03688936 0.03057067
    EHI_000410:001 EHI_000410 E001 NA 0.02235347 NA
    EHI_000420:001 EHI_000420 E001 NA 0.03395573 NA
    EHI_000430:001 EHI_000430 E001 NA 0.02219522 NA
    EHI_000440:001 EHI_000440 E001 NA 0.02037263 NA
    EHI_000450:001 EHI_000450 E001 0.024495906 0.04492663 0.64717121
    EHI_000450:002 EHI_000450 E002 0.020356260 0.02483641 0.64717121
    EHI_000460.1:001 EHI_000460.1 E001 NA 0.02602179 NA
    EHI_000470.1:001 EHI_000470.1 E001 NA 0.02254333 NA
    padjust
    EHI_000010:001 NA
    EHI_000130:001 0.8639624
    EHI_000130:002 0.9927663
    EHI_000130:003 0.9756197
    EHI_000130:004 0.9503086
    EHI_000240:001 0.9611397
    EHI_000240:002 0.9611397
    EHI_000250:001 0.9503086
    EHI_000250:002 0.9503086
    EHI_000290:001 0.2890521
    EHI_000290:002 0.9962043
    EHI_000290:003 0.2087239
    EHI_000410:001 NA
    EHI_000420:001 NA
    EHI_000430:001 NA
    EHI_000440:001 NA
    EHI_000450:001 0.9278993
    EHI_000450:002 0.9278993
    EHI_000460.1:001 NA
    EHI_000470.1:001 NA
    I checked the model in the available documentation and I don't see why the test does
    not give a result in this case, ie why the Cox-Reid likelihood estimation cannot be computed. Has anyone got the same problem?

    Thanks for your help,
    Jane
    Last edited by Jane M; 08-04-2011, 04:21 AM.

  • #2
    I got a response finally. So, here is what I got, it could help someone:

    DEXSeq is not a tool to find differentially expressed genes, but removes the effect of overall changes to a gene's expression. Its purpose is to test whether the relative usage of exons in agene depends on a condition.

    Let's say you have a gene with three exons, a cassette exon flanked by consitutive exons. In one condition 20% of all the reads that fall onto the gene overlap with the middle exon, in the other condition, it is 40%. This is what DEXSeq looks for. If the counts for all exons of a gene increases by the sae factor from one condition to the other, the gene may be differentially expressed but not differentially spliced, and DEXSeq does not aim to detect the former.

    Hence, single-exon genes cannot be tested and are skipped.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      Yesterday, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    58 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    54 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    46 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    55 views
    0 likes
    Last Post seqadmin  
    Working...
    X