Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Usage of gene aggregation option in DEXSeq

    Hello Everyone,

    I am using DEXSeq to find out differential usage of exons.
    I want to know about the usage of gene aggregation option. After reading DEXSeq tutorial, I could understand that if I provide -r yes, it would merge the genes which have overlapping exons. I do not want to do this, so I used -r no. However I could not find any diiference in output.
    Below i am posting the command which I am using as well as the output:

    command: python dexseq_prepare_annotation.py -r no hg19.ensemblfortophat.gtf output.gff

    Output:
    chr1 dexseq_prepare_annotation.py aggregate_gene 11869 14412 . + . gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 11869 11871 . + . transcripts "ENST00000456328"; exonic_part_number "001"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 11872 11873 . + . transcripts "ENST00000456328+ENST00000515242"; exonic_part_number "002"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 11874 12009 . + . transcripts "ENST00000456328+ENST00000515242+ENST00000518655"; exonic_part_number "003"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 12010 12057 . + . transcripts "ENST00000456328+ENST00000515242+ENST00000450305+ENST00000518655"; exonic_part_number "004"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 12058 12178 . + . transcripts "ENST00000456328+ENST00000515242+ENST00000518655"; exonic_part_number "005"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 12179 12227 . + . transcripts "ENST00000456328+ENST00000515242+ENST00000450305+ENST00000518655"; exonic_part_number "006"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 12595 12612 . + . transcripts "ENST00000518655"; exonic_part_number "007"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 12613 12697 . + . transcripts "ENST00000456328+ENST00000515242+ENST00000450305+ENST00000518655"; exonic_part_number "008"; gene_id "ENSG00000223972"
    chr1 dexseq_prepare_annotation.py exonic_part 12698 12721 . + . transcripts "ENST00000456328+ENST00000515242+ENST00000518655"; exonic_part_number "009"; gene_id "ENSG00000223972"

    Please let me know if used command is incorrect.

    Thank you,

    Regards,
    Mudita

  • #2
    Hi @Mudita,

    The command looks good. If you look close enough to your gtf file you will notice that when using "-r yes", some of the fields "gene_id" from your gtf file have two gene identifiers concatenated by a "+".

    In case your file does not contain such instances, it means that your initial gtf file does not contain overlapping exons.

    Best regards,
    Alejandro

    Comment


    • #3
      Thank you so much for your reply.
      However, I just found the difference from non aggregate genes (-r no) to aggregate genes (-r yes).

      output with with -r yes:
      chr1 dexseq_prepare_annotation.py aggregate_gene 846815 850351 . + . gene_id "ENSG00000230699+ENSG00000241180"

      output with -r no:
      chr1 dexseq_prepare_annotation.py aggregate_gene 850329 850351 . + . gene_id "ENSG00000241180"

      I could find that when I use -r no, i do not find any gene with + sign. This is what I wanted.

      Thank you
      Mudita

      Comment


      • #4
        Dear Areyes,
        I would like to run paired sample analysis in DEXSeq.
        I could find from this thred (http://seqanswers.com/forums/showthread.php?t=13299) that following link should be useful.
        The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.


        However, I could not understand.
        Could you please let me know which is the formula to modify and how to modify, in order to run a paired analysis.

        Thank you for your help.

        Regards,
        Mudita

        Comment


        • #5
          Hi Mudita,

          - what is the output of 'sessionInfo()'?
          - what is the output of 'design(ecs)', of your DEXSeq object?
          - what column of 'design(ecs)' would you like to see if it has an effect on differential exon usage?
          - what is the column of 'design(ecs)' that indicates the pairing of the samples?

          Alejandro

          Comment


          • #6
            I am sorry for late reply.
            Please find the response below.

            Output of sessionInfo::

            Platform: x86_64-unknown-linux-gnu (64-bit)

            locale:
            [1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
            [4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
            [7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
            [10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C

            attached base packages:
            [1] parallel stats graphics grDevices utils datasets methods
            [8] base

            other attached packages:
            [1] DEXSeq_1.8.0 Biobase_2.22.0 BiocGenerics_0.8.0
            [4] BiocInstaller_1.12.0

            loaded via a namespace (and not attached):
            [1] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6
            [4] GenomicRanges_1.14.4 hwriter_1.3 IRanges_1.20.6
            [7] RCurl_1.95-4.1 Rsamtools_1.14.3 statmod_1.4.18
            [10] stats4_3.0.2 stringr_0.6.2 tools_3.0.2
            [13] XML_3.98-1.1 XVector_0.2.0 zlibbioc_1.8.0

            Output of design(ecs)::

            countFile condition libType
            19_N.counts control paired-end
            19_V.counts knockdown paired-end
            22_N.counts control paired-end
            22_V.counts knockdown paired-end
            33_N.counts control paired-end
            33_V.counts knockdown paired-end
            39_N.counts control paired-end
            39_V.counts knockdown paired-end
            65_N.counts control paired-end
            65_V.counts knockdown paired-end

            - what column of 'design(ecs)' would you like to see if it has an effect on differential exon usage?
            Between "control and knockdown"

            - what is the column of 'design(ecs)' that indicates the pairing of the samples?

            all N-V samples are paired samples. For example: 19_N and 19_V, 22_N and 22_V, 33_N and 33_V ans so on.

            Comment


            • #7
              How about just using a design like:

              Code:
              countFile   condition group
              19_N.counts control   A
              19_V.counts knockdown A
              22_N.counts control   B
              22_V.counts knockdown B
              33_N.counts control   C
              33_V.counts knockdown C
              39_N.counts control   D
              39_V.counts knockdown D
              65_N.counts control   E
              65_V.counts knockdown E
              and then add "group" as a factor in the design. That seems to be the normal way this is handled. Make sure to put "condition" at the end of the design formula, since that's then used for graphs.

              Comment


              • #8
                Thank you Devon,

                I understand that I should apply this formula while calculating dispersion as well as testing for differential expression:
                formula=count ~ sample+group*exon+condition

                Regards,
                Mudita

                Comment


                • #9
                  Hi Mudita,

                  As a general comment, you seem to be using a vignette that does not correspond to the DEXSeq version that you have installed, this is not ideal. If you do browseVignettes("DEXSeq") in your R session you will find the vignette associated to the version of DEXSeq that you are using.

                  I would follow @dpryan suggestion in specifying the design, and then use the formulas:

                  formulaFullModel <- ~ sample + exon + group:exon + condition:exon
                  formulaReducedModel <- ~ sample + exon + group:exon

                  specify this formulas in estimateDispersions and testForDEU as the vignette indicates and that should be it!

                  Bests,
                  Alejandro

                  Comment


                  • #10
                    Dear Areyes,

                    I have tried as suggested. However, I am getting all NA values for p and p adjusted columns.
                    I am using the R script below.

                    #!/usr/bin/env Rscript
                    library(DEXSeq)

                    annotationfile = file.path("/media/gadgil/Data1/RNA_Seq/Index/Ensemble_GTF_file/Downloaded_6_Jan_2014/Prepare_annotations_for_DEXseq/for_DEXSeq_annotations_withr-no.gff")

                    sample <- data.frame(row.names = c( "19_N", "19_V", "22_N", "22_V", "33_N", "33_V", "39_N", "39_V", "65_N", "65_V"), countFile = c( "19_N.counts", "19_V.counts", "22_N.counts", "22_V.counts", "33_N.counts", "33_V.counts", "39_N.counts", "39_V.counts", "65_N.counts", "65_V.counts" ), condition = c("control", "knockdown", "control", "knockdown", "control", "knockdown", "control", "knockdown", "control", "knockdown" ), group=c("A", "A", "B", "B", "C", "C", "D", "D", "E", "E"), libType = c( "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end" ) )

                    fullFilenames<- list.files("/media/gadgil/Seagate Expansion Drive/Data/OUTPUT/DEXSeq_output/",full.names=TRUE,pattern=".count")

                    ecs<- read.HTSeqCounts(countfiles = fullFilenames,design = sample,flattenedfile = annotationfile)

                    formulafullmodel=~sample + exon + group:exon + condition:exon
                    ecs<- estimateSizeFactors(ecs)
                    library(parallel)
                    ecs<- estimateDispersions(ecs,formula=formulafullmodel, nCores=12)

                    ecs<- fitDispersionFunction(ecs)
                    ecs<- estimatelog2FoldChanges(ecs)

                    test<- testForDEU(ecs, formula=formulafullmodel, nCores=12)
                    res1<- DEUresultTable(test)

                    write.table(res1, file="/media/gadgil/Seagate Expansion Drive/Data/OUTPUT/DEXSeq_output/result_paired.txt")

                    Could you please let me know if this is correct.

                    Thank you,
                    Regards,
                    Mudita

                    Comment


                    • #11
                      Hi Mudita,

                      Are you getting an error message? I would almost bet that you are getting this error:

                      "Error in testForDEU: argument 2 matches multiple formal arguments"

                      You have to change your line:

                      test<- testForDEU(ecs, formula=formulafullmodel, nCores=12)

                      to

                      test<- testForDEU(ecs, formula0=~sample + exon + group:exon, formula1=formulafullmodel, nCores=12)

                      I think that should do the trick

                      Comment


                      • #12
                        yes, everything works now.
                        Thanks a lot for all your help.

                        Comment


                        • #13
                          hello again,

                          I am trying to generate plot in DEXSeq, but I do not see the distribution of expression.
                          could you please let me know why?

                          plot.ppt

                          Thank you,
                          Regards,
                          Mudita

                          Comment


                          • #14
                            Hi Mudita,

                            Could you include the code that you are using and the output of sessionInfo()?
                            Are you getting an error or a warning?
                            Does it happen with all the genes?
                            Does it happen also when your plotting device is a x11, pdf?

                            Comment


                            • #15
                              Thank you,
                              Please find below the response:

                              code to plot:
                              plotDEXSeq(test, "ENSG00000167613")

                              warning while generating the plot:

                              Warning message:
                              In segments(intervals[rango], matr[rango, j], intervals[rango + :
                              semi-transparency is not supported on this device: reported only once per page

                              Also, there was a warning while calculating the dispersion which I am trying to reproduce.

                              sessionInfo:


                              Platform: x86_64-unknown-linux-gnu (64-bit)

                              locale:
                              [1] LC_CTYPE=en_IN LC_NUMERIC=C LC_TIME=en_IN
                              [4] LC_COLLATE=en_IN LC_MONETARY=en_IN LC_MESSAGES=en_IN
                              [7] LC_PAPER=en_IN LC_NAME=C LC_ADDRESS=C
                              [10] LC_TELEPHONE=C LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C

                              attached base packages:
                              [1] parallel stats graphics grDevices utils datasets methods
                              [8] base

                              other attached packages:
                              [1] DEXSeq_1.8.0 Biobase_2.22.0 BiocGenerics_0.8.0

                              loaded via a namespace (and not attached):
                              [1] biomaRt_2.18.0 Biostrings_2.30.1 bitops_1.0-6
                              [4] GenomicRanges_1.14.4 hwriter_1.3 IRanges_1.20.7
                              [7] RCurl_1.95-4.1 Rsamtools_1.14.3 statmod_1.4.18
                              [10] stats4_3.0.2 stringr_0.6.2 XML_3.98-1.1

                              Does it happen with all the genes?
                              I tried with 4 or 5 genes and it happened with all.

                              Does it happen also when your plotting device is a x11, pdf?
                              I just run this code "plotDEXSeq(test, "ENSG00000167613")" and generated image is copied in power point.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              27 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              26 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X