Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Mudita
    Member
    • Oct 2012
    • 15

    Usage of gene aggregation option in DEXSeq

    Hello Everyone,

    I am using DEXSeq to find out differential usage of exons.
    I want to know about the usage of gene aggregation option. After reading DEXSeq tutorial, I could understand that if I provide -r yes, it would merge the genes which have overlapping exons. I do not want to do this, so I used -r no. However I could not find any diiference in output.
    Below i am posting the command which I am using as well as the output:

    command: python dexseq_prepare_annotation.py -r no hg19.ensemblfortophat.gtf output.gff

    Output:
    chr1 dexseq_prepare_annotation.py aggregate_gene 11869 14412 . + . gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 11869 11871 . + . transcripts "ENST00000456328"; exonic_part_number "001"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 11872 11873 . + . transcripts "ENST00000456328+ENST00000515242"; exonic_part_number "002"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 11874 12009 . + . transcripts "ENST00000456328+ENST00000515242+ENST00000518655"; exonic_part_number "003"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 12010 12057 . + . transcripts "ENST00000456328+ENST00000515242+ENST00000450305+ENST00000518655"; exonic_part_number "004"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 12058 12178 . + . transcripts "ENST00000456328+ENST00000515242+ENST00000518655"; exonic_part_number "005"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 12179 12227 . + . transcripts "ENST00000456328+ENST00000515242+ENST00000450305+ENST00000518655"; exonic_part_number "006"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 12595 12612 . + . transcripts "ENST00000518655"; exonic_part_number "007"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 12613 12697 . + . transcripts "ENST00000456328+ENST00000515242+ENST00000450305+ENST00000518655"; exonic_part_number "008"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 12698 12721 . + . transcripts "ENST00000456328+ENST00000515242+ENST00000518655"; exonic_part_number "009"; gene_id "ENSG00000223972"

    Please let me know if used command is incorrect.

    Thank you,

    Regards,
    Mudita
  • areyes
    Senior Member
    • Aug 2010
    • 165

    #2
    Hi @Mudita,

    The command looks good. If you look close enough to your gtf file you will notice that when using "-r yes", some of the fields "gene_id" from your gtf file have two gene identifiers concatenated by a "+".

    In case your file does not contain such instances, it means that your initial gtf file does not contain overlapping exons.

    Best regards,
    Alejandro

    Comment

    • Mudita
      Member
      • Oct 2012
      • 15

      #3
      Thank you so much for your reply.
      However, I just found the difference from non aggregate genes (-r no) to aggregate genes (-r yes).

      output with with -r yes:
      chr1 dexseq_prepare_annotation.py aggregate_gene 846815 850351 . + . gene_id "ENSG00000230699+ENSG00000241180"

      output with -r no:
      chr1 dexseq_prepare_annotation.py aggregate_gene 850329 850351 . + . gene_id "ENSG00000241180"

      I could find that when I use -r no, i do not find any gene with + sign. This is what I wanted.

      Thank you
      Mudita

      Comment

      • Mudita
        Member
        • Oct 2012
        • 15

        #4
        Dear Areyes,
        I would like to run paired sample analysis in DEXSeq.
        I could find from this thred (http://seqanswers.com/forums/showthread.php?t=13299) that following link should be useful.
        The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.


        However, I could not understand.
        Could you please let me know which is the formula to modify and how to modify, in order to run a paired analysis.

        Thank you for your help.

        Regards,
        Mudita

        Comment

        • areyes
          Senior Member
          • Aug 2010
          • 165

          #5
          Hi Mudita,

          - what is the output of 'sessionInfo()'?
          - what is the output of 'design(ecs)', of your DEXSeq object?
          - what column of 'design(ecs)' would you like to see if it has an effect on differential exon usage?
          - what is the column of 'design(ecs)' that indicates the pairing of the samples?

          Alejandro

          Comment

          • Mudita
            Member
            • Oct 2012
            • 15

            #6
            I am sorry for late reply.
            Please find the response below.

            Output of sessionInfo::

            Platform: x86_64-unknown-linux-gnu (64-bit)

            locale:
            [1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
            [4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
            [7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
            [10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C

            attached base packages:
            [1] parallel stats graphics grDevices utils datasets methods
            [8] base

            other attached packages:
            [1] DEXSeq_1.8.0 Biobase_2.22.0 BiocGenerics_0.8.0
            [4] BiocInstaller_1.12.0

            loaded via a namespace (and not attached):
            [1] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6
            [4] GenomicRanges_1.14.4 hwriter_1.3 IRanges_1.20.6
            [7] RCurl_1.95-4.1 Rsamtools_1.14.3 statmod_1.4.18
            [10] stats4_3.0.2 stringr_0.6.2 tools_3.0.2
            [13] XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0

            Output of design(ecs)::

            countFile condition libType
            19_N.counts control paired-end
            19_V.counts knockdown paired-end
            22_N.counts control paired-end
            22_V.counts knockdown paired-end
            33_N.counts control paired-end
            33_V.counts knockdown paired-end
            39_N.counts control paired-end
            39_V.counts knockdown paired-end
            65_N.counts control paired-end
            65_V.counts knockdown paired-end

            - what column of 'design(ecs)' would you like to see if it has an effect on differential exon usage?
            Between "control and knockdown"

            - what is the column of 'design(ecs)' that indicates the pairing of the samples?

            all N-V samples are paired samples. For example: 19_N and 19_V, 22_N and 22_V, 33_N and 33_V ans so on.

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              How about just using a design like:

              Code:
              countFile   condition group
              19_N.counts control   A
              19_V.counts knockdown A
              22_N.counts control   B
              22_V.counts knockdown B
              33_N.counts control   C
              33_V.counts knockdown C
              39_N.counts control   D
              39_V.counts knockdown D
              65_N.counts control   E
              65_V.counts knockdown E
              and then add "group" as a factor in the design. That seems to be the normal way this is handled. Make sure to put "condition" at the end of the design formula, since that's then used for graphs.

              Comment

              • Mudita
                Member
                • Oct 2012
                • 15

                #8
                Thank you Devon,

                I understand that I should apply this formula while calculating dispersion as well as testing for differential expression:
                formula=count ~ sample+group*exon+condition

                Regards,
                Mudita

                Comment

                • areyes
                  Senior Member
                  • Aug 2010
                  • 165

                  #9
                  Hi Mudita,

                  As a general comment, you seem to be using a vignette that does not correspond to the DEXSeq version that you have installed, this is not ideal. If you do browseVignettes("DEXSeq") in your R session you will find the vignette associated to the version of DEXSeq that you are using.

                  I would follow @dpryan suggestion in specifying the design, and then use the formulas:

                  formulaFullModel <- ~ sample + exon + group:exon + condition:exon
                  formulaReducedModel <- ~ sample + exon + group:exon

                  specify this formulas in estimateDispersions and testForDEU as the vignette indicates and that should be it!

                  Bests,
                  Alejandro

                  Comment

                  • Mudita
                    Member
                    • Oct 2012
                    • 15

                    #10
                    Dear Areyes,

                    I have tried as suggested. However, I am getting all NA values for p and p adjusted columns.
                    I am using the R script below.

                    #!/usr/bin/env Rscript
                    library(DEXSeq)

                    annotationfile = file.path("/media/gadgil/Data1/RNA_Seq/Index/Ensemble_GTF_file/Downloaded_6_Jan_2014/Prepare_annotations_for_DEXseq/for_DEXSeq_annotations_withr-no.gff")

                    sample <- data.frame(row.names = c( "19_N", "19_V", "22_N", "22_V", "33_N", "33_V", "39_N", "39_V", "65_N", "65_V"), countFile = c( "19_N.counts", "19_V.counts", "22_N.counts", "22_V.counts", "33_N.counts", "33_V.counts", "39_N.counts", "39_V.counts", "65_N.counts", "65_V.counts" ), condition = c("control", "knockdown", "control", "knockdown", "control", "knockdown", "control", "knockdown", "control", "knockdown" ), group=c("A", "A", "B", "B", "C", "C", "D", "D", "E", "E"), libType = c( "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end" ) )

                    fullFilenames<- list.files("/media/gadgil/Seagate Expansion Drive/Data/OUTPUT/DEXSeq_output/",full.names=TRUE,pattern=".count")

                    ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = sample,flattenedfile = annotationfile)

                    formulafullmodel=~sample + exon + group:exon + condition:exon
                    ecs<- estimateSizeFactors(ecs)
                    library(parallel)
                    ecs<- estimateDispersions(ecs,formula=formulafullmodel, nCores=12)

                    ecs<- fitDispersionFunction(ecs)
                    ecs<- estimatelog2FoldChanges(ecs)

                    test<- testForDEU(ecs, formula=formulafullmodel, nCores=12)
                    res1<- DEUresultTable(test)

                    write.table(res1, file="/media/gadgil/Seagate Expansion Drive/Data/OUTPUT/DEXSeq_output/result_paired.txt")

                    Could you please let me know if this is correct.

                    Thank you,
                    Regards,
                    Mudita

                    Comment

                    • areyes
                      Senior Member
                      • Aug 2010
                      • 165

                      #11
                      Hi Mudita,

                      Are you getting an error message? I would almost bet that you are getting this error:

                      "Error in testForDEU: argument 2 matches multiple formal arguments"

                      You have to change your line:

                      test<- testForDEU(ecs, formula=formulafullmodel, nCores=12)

                      to

                      test<- testForDEU(ecs, formula0=~sample + exon + group:exon, formula1=formulafullmodel, nCores=12)

                      I think that should do the trick

                      Comment

                      • Mudita
                        Member
                        • Oct 2012
                        • 15

                        #12
                        yes, everything works now.
                        Thanks a lot for all your help.

                        Comment

                        • Mudita
                          Member
                          • Oct 2012
                          • 15

                          #13
                          hello again,

                          I am trying to generate plot in DEXSeq, but I do not see the distribution of expression.
                          could you please let me know why?

                          plot.ppt

                          Thank you,
                          Regards,
                          Mudita

                          Comment

                          • areyes
                            Senior Member
                            • Aug 2010
                            • 165

                            #14
                            Hi Mudita,

                            Could you include the code that you are using and the output of sessionInfo()?
                            Are you getting an error or a warning?
                            Does it happen with all the genes?
                            Does it happen also when your plotting device is a x11, pdf?

                            Comment

                            • Mudita
                              Member
                              • Oct 2012
                              • 15

                              #15
                              Thank you,
                              Please find below the response:

                              code to plot:
                              plotDEXSeq(test, "ENSG00000167613")

                              warning while generating the plot:

                              Warning message:
                              In segments(intervals[rango], matr[rango, j], intervals[rango + :
                              semi-transparency is not supported on this device: reported only once per page

                              Also, there was a warning while calculating the dispersion which I am trying to reproduce.

                              sessionInfo:


                              Platform: x86_64-unknown-linux-gnu (64-bit)

                              locale:
                              [1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
                              [4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
                              [7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
                              [10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C

                              attached base packages:
                              [1] parallel stats graphics grDevices utils datasets methods
                              [8] base

                              other attached packages:
                              [1] DEXSeq_1.8.0 Biobase_2.22.0 BiocGenerics_0.8.0

                              loaded via a namespace (and not attached):
                              [1] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6
                              [4] GenomicRanges_1.14.4 hwriter_1.3 IRanges_1.20.7
                              [7] RCurl_1.95-4.1 Rsamtools_1.14.3 statmod_1.4.18
                              [10] stats4_3.0.2 stringr_0.6.2 XML_3.98-1.1

                              Does it happen with all the genes?
                              I tried with 4 or 5 genes and it happened with all.

                              Does it happen also when your plotting device is a x11, pdf?
                              I just run this code "plotDEXSeq(test, "ENSG00000167613")" and generated image is copied in power point.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              12 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              24 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              28 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              22 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...