Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEXSeq output not convincing

    Hello,
    I'm running DEXSeq for finding out the differential exon usage between 3 Wild type and 3 transgenic rat data. I obtained the exon count using the scripts dexseq_count.py and the gff file using dexseq_prepare_annotation.py. After following all the steps given in your vignette,When I run the DEUresultTable command, I get a lot of NA's in the pvalue column and just 1's in the padjust column. Also after executing the estimateDispersions function, the column dispersion_CR_est is showing NULL. I have a feeling that I'm making a mistake in the exoncount data object but I'm not sure. Could someone throw some light on this?

    I could probably highlight the steps(in bold) I have done.

    annotationfile = file.path("/triton/ics/scratch/rnaseq/exoncount/Rattus_norvegicus.RGSC3.4.66_DEXSEQ.gff")

    annotationfile
    [1] "/triton/ics/scratch/rnaseq/exoncount/Rattus_norvegicus.RGSC3.4.66_DEXSEQ.gff"

    samples=data.frame(condition=c(rep("wild",3),rep("trans",3)),replicate=c(1,1,1,1,1,1),stringsAsFactors = TRUE,check.names = FALSE)
    row.names(samples) <- c("01_5b_WT.txt","02_6b_WT.txt","03_15b_WT.txt","04_4b_TG.txt","05_7b_TG.txt","06_16b_TG.txt")


    samples
    condition replicate
    01_5b_WT.txt wild 1
    02_6b_WT.txt wild 1
    03_15b_WT.txt wild 1
    04_4b_TG.txt trans 1
    05_7b_TG.txt trans 1
    06_16b_TG.txt trans 1

    countfiles = file.path("/triton/ics/scratch/rnaseq/exoncount",paste(row.names(samples))

    countfiles
    [1] "/triton/ics/scratch/rnaseq/exoncount/01_5b_WT.txt"
    [2] "/triton/ics/scratch/rnaseq/exoncount/02_6b_WT.txt"
    [3] "/triton/ics/scratch/rnaseq/exoncount/03_15b_WT.txt"
    [4] "/triton/ics/scratch/rnaseq/exoncount/04_4b_TG.txt"
    [5] "/triton/ics/scratch/rnaseq/exoncount/05_7b_TG.txt"
    [6] "/triton/ics/scratch/rnaseq/exoncount/06_16b_TG.txt"

    ecs = read.HTSeqCounts(countfiles,design=samples,flattenedfile=annotationfile)

    ExonCountSet (storageMode: environment)
    assayData: 236327 features, 6 samples
    element names: counts
    protocolData: none
    phenoData
    sampleNames: 01_5b_WT.txt 02_6b_WT.txt ... 06_16b_TG.txt (6 total)
    varLabels: sizeFactor condition replicate countfiles
    varMetadata: labelDescription
    featureData
    featureNames: ENSRNOG00000000001:001 ENSRNOG00000000001:002 ...
    ENSRNOG00000045521:001 (236327 total)
    fvarLabels: geneID exonID ... transcripts (13 total)
    fvarMetadata: labelDescription
    experimentData: use 'experimentData(object)'
    Annotation:

    sampleNames(ecs) = row.names(samples)

    ecs <- estimateSizeFactors(ecs)


    sizeFactors(ecs)
    01_5b_WT.txt 02_6b_WT.txt 03_15b_WT.txt 04_4b_TG.txt 05_7b_TG.txt
    1.6794610 0.5407073 0.8215439 1.7972423 0.8516481
    06_16b_TG.txt
    0.9878676

    ecs <- estimateDispersions(ecs)


    featureData(ecs)$dispersion_CR_est
    NULL

    ecs <- fitDispersionFunction(ecs)

    ecs <- testForDEU(ecs)

    res <- DEUresultTable(ecs)

  • #2
    Hi,

    I would start by checking that the ExonContSet (ecs) was built correctly.

    A couple of things to try

    design(ecs) - This should match what you intended.

    sampleNames(ecs) - the row names of the design output should match this

    table(geneIDs(ecs)) will show you how many genes are in the set.

    you can also try head(modelFrameforGene(ecs, "Pick some gene" )) of course replace the Pick some gene with the name of a gene from your genome.
    This will show if something for counts was read in.

    Also normally the values for replicate would be 1,2, and 3 for each sample type. I don't think having them all set to one will cause a problem, but I have never tried it.



    Craig

    Comment


    • #3
      Thank you Craig. I will look into it immediately and get back here.

      Comment


      • #4
        First of all: The command "featureData(ecs)$dispersion_CR_est" did not work because you are reading an outdated manual. The slot "dispersion_CR_est" is now called "dispBeforeSharing".

        Use the command "openVignette()" to get documentation that fits your installed package versions.

        In the current manual, you will see instructions to plot the estimated dispersions against the mean count value. Have a look, maybe your data is simply to noisy to get any results. (If you unsure how to interpret the plot, just post it here.)

        Also have a look at the count values plot for few genes (with plotDEXSeq(ecs, "my_favourite_geneID", norCounts = TRUE)). Does it look like you should detect something?

        Comment


        • #5
          Hello Craig,
          I checked my dataset with the commands that you mentioned. Everything looks fine. The counts were also read. I just did one change. I got information that my data had 3 replicates. So I re-ran the process by replacing the 1's with 1,2 and 3 for wild and 1,2 and 3 for trans.

          Comment


          • #6
            Hello Simon,
            Thanks for that. I downloaded the new vignette and proceeded as stated. Here is the plot between mean count value and dispersion.

            Could you interpret the plot ?

            I selected a particular gene for testing purpose. That gene coincidentally had data so I could visualise using the plotDEXSeq function. I'm trying to verify the result with the exon count to be sure that the output is valid.

            Comment


            • #7
              It looks like the image is not rendering. Here is it.
              Attached Files

              Comment


              • #8
                Hello Simon. I'm not sure if you received my other posts here. But I proceeded with my analysis and I have arrived at some results for which I need your help in the analysis. I sorted the dexseq output according to p value and visualised the exons.

                Gene ID: ENSRNOG00000017466
                Exon : E016
                p Value : 0.000166557183275073

                For the above mentioned gene, the fitted expression did not help me conclude anything so I went for the fitted splicing. I see that a lot of exons like E016,E017 and so on show similar results but none of their p values are even close to E016. In such a case, how can I make the required conclusion? Am I missing something in the analysis ? Is there another way to look at it?
                Attached Files

                Comment


                • #9
                  Note how one of your three wild-type samples looks completely different than the other two. It seems that the whole left half is overrepresented. Hence, DEXSeq considers these exons as too noisy to detect anything. I'm a bit surprised that E016 got a low p value, as it does not look that much better.

                  Is this the gene with the strongest effect that you got? Then, it were justified if you got no significant hits.

                  Check a few more genes to see if it is always one sample that behaves strangely, and if so, if it is always the same. (You can instruct the plotting function to use different colours for the replicates.) If it is one sample that is behaving oddly, you may want to see what happens if you repeat the analysis without it.

                  Comment


                  • #10
                    Hello Simon,
                    This is the third best gene available. I checked the top 5 and I see this behaviour very distinct only in 2 of them. I’m attaching the pdf of that for your reference. The remaining do not highlight a lot. I should also mention that all the adjusted p values were 1. I also read another post of yours connected with DESeq where you had mentioned about a correlation between high variance and the output. I have posted a graph showing the connection between mean value and dispersion in reply #7. Does the graph hint anything? I'm not able to interpret that one very well.
                    Attached Files

                    Comment


                    • #11
                      I forgot to mention the p values for those 5 genes. They are as follows.
                      ENSRNOG00000005330 E035 0.000108786
                      ENSRNOG00000009864 E005 0.000152295
                      ENSRNOG00000017466 E016 0.000166557
                      ENSRNOG00000005434 E046 0.000262485
                      ENSRNOG00000032391 E004 0.00050834

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM
                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 06:37 PM
                      0 responses
                      10 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Yesterday, 06:07 PM
                      0 responses
                      9 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-22-2024, 10:03 AM
                      0 responses
                      51 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-21-2024, 07:32 AM
                      0 responses
                      67 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X