Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Goseq for non-native goseq db

    hi

    I am new to Goseq and I am trying to implement it for a genome not present in the database. I have the list of genes and their associated GO terms. It's just that it's not clear to me how can I include this in the software .... the vignette has an explanation on the format of the data but it doesn't spcify where it should be entered.

    thanks

    Anna

  • #2
    Dear Anna,

    Did you find an answer for this, if so could you share.

    Thanks
    Swarbre

    Comment


    • #3
      Hi Anna,
      I rather suggest you to have a look at goseq manual
      http://www.bioconductor.org/packages.../man/goseq.pdf

      under the goseq function details

      Comment


      • #4
        error in goseq() gene2cat invalid

        Hi,
        I was using goseq to test on a non model species, when I was using goseq function, the gene2cat is given by manual input. For this, I gave a list in which the key is the gene name and the value is an array containing several go ids (because it doesn't work, I also tried category descriptions). However, an error occurred and saying that Error in goseq() Invaled category specified. Valid categories are GO:CC, GO:BP, GO:MF or KEGG.
        I wonder what exactly I should input the gene2cat.
        IF you happen to know the answer, I appreciate your reply a lot.
        Thanks.
        AliceRNA

        Comment


        • #5
          Hi AliceRNA,

          I suspect your error is not the GO descriptions but the test.cats argument. If you can't solve it maybe you should post your command.

          Comment


          • #6
            bad nullp plots for pwf

            Hi Oshlack,
            Thanks for your reply, I really appreciate it. I finally fixed the error, it is because when using the goseq function, the first argument is actually results from pwf, not genes. If this is corrected from the manual, then this error is gone.
            However I have another question, which is the nullp plots for my data doesn't show goodness of fit. In the manual, it shows many dots and they fit with the curve very well and there is a small number of bins(3 hundred). However, in my case, there are too many bins, and the dots on the plot is quite few, so that the dots lay far away from the curve. I don't trust the curve then.
            This leads to the fact that there is no enriched GO term. I also tried by not fitting a proportion of DE genes against length curve, still no padj smaller than 0.05. Is there any idea on how to play around the parameter and make a better curve which may give me significant results?
            Thanks.
            AliceRNA

            Comment


            • #7
              Hi AliceRNA,

              I don't understand what you are saying. How many DE genes do you have? The number of bins is the number of dots on the plot but the fit is actually on all the individual genes. The function that does the fit is makespline() so you could perhaps fiddle around with that and try changing the number of knots. It might be worth posting your picture.

              Cheers,
              Alicia

              Comment


              • #8
                goseq poor fitting curve

                Hi Alicia,
                I was working on a de novo transcriptome assebmly, so for one locus, we may have more than one contig. What I import is the total number of transcripts, and the length of transcript, not the gene. I wonder if that will make a difference.
                I have in total 46463 contigs in which there are only 738 differentially expressed contigs.
                I still couldn't figure out why.
                Thanks.
                AliceRNA
                Attached Files

                Comment


                • #9
                  I don't think it should matter that you are using transcripts rather than genes. You may need to fiddle with the makespline() function and put in more knots, say nKnots=10 and see if this make the figure look like a better fit.

                  Comment


                  • #10
                    Hello,
                    I'm trying to run the goseq on a not supported organism but unfortunately I already stuck for a while with the nullp function...
                    I have my DE genes in a vector:
                    head(degenes)

                    AT1G21660 AT1G11310 AT1G06220 AT1G01010 AT1G01020 AT1G01030
                    1 0 0 0 0 0

                    and also my gene_length:

                    head(gene_length)
                    AT1G21660 AT1G11310 AT1G06220 AT1G01010 AT1G01020 AT1G01030
                    3366 3447 4909 2268 2809 2065

                    both vectors have the same length but,
                    if I run now the nullp function:
                    pwf<-nullp(degenes,bias.data=gene_length)
                    Error in if (hi <= low) { : missing value where TRUE/FALSE needed

                    Its not too good explained in the goseq manual so I hope anybody can help me.
                    Cheers,
                    Julian

                    Comment


                    • #11
                      Hi SeqAnswer community,

                      I also have some trouble with GOSeq, or probably with the data input. I'm dealing with DE data from the livers of different treated sticklebacks (Gasterosteus aculeatus) and used tophat for mapping, Htseq for counting and DESeq for the DE Analysis. To get the results that are of interest for an biologist I performed a Analysis to get the Gene Ontology. For this I used GOSeq because of their argument of length bias correction. As the stickleback is not an supported model organism I take the gene length information and GO IDs from ENSEMBL/Biomart. I wrote a script to generate a all_genes list, a DE_genes list and a corresponding gene_length list (with one length per geneID; mean respectively medians calculated when there was more than one transcript). And as well a list of all transcript GO IDs corresponding to geneIDs.
                      The all_genes file contains all genes that are expressed in the set of interest (e.g. 20534 genes)
                      The DE_genes are the genes that are DE between the two conditions after FDR (e.g. 1578 genes).

                      Then I imported the list to GOSeq --> worked
                      de.genes <- scan("de_genes.txt", what=character() )

                      >assayed.genes <- scan("all_genes.txt", what=character() )

                      >gene.length=scan("gene_lengths.txt", what=numeric() )

                      >go.ids= read.table("goIDs.csv",header=FALSE)

                      Next the gene.vector (0 = not DE 1 = DE) was created from the gene files

                      >gene.vector = as.integer(assayed.genes %in% de.genes)

                      and the gene names were added

                      >names(gene.vector) = assayed.genes
                      then the Probability Weighting Function was calculated

                      >pwf=nullp(gene.vector,bias.data=gene.length)

                      The over and under expressed GO categories among DE genes with the wallenius approximation were calculated

                      >GO.wall=goseq(pwf,gene2cat=go.ids)

                      The over and under expressed GO categories among DE genes using random sampling were calculated

                      >GO.samp=goseq(pwf,gene2cat=go.ids,method="Sampling",repcnt=1000)

                      And plotted against each other
                      >plot(log10(GO.wall[,2]), log10(GO.samp[match(GO.samp[,1],GO.wall[,1]),2]),
                      xlab="log10(Wallenius p-values)", ylab="log10(Sampling p-values)", xlim=c(-3,0), )
                      >abline(0,1,col=3,lty=2)

                      Next the over and under expressed GO categories among DE genes without length bias were calculated
                      >GO.nobias=goseq(pwf,gene2cat=go.ids,method="Hypergeometric")

                      ...and plotted against the GO.wall

                      >plot(log10(GO.wall[,2]), log10(GO.nobias[match(GO.nobias[,1],GO.wall[,1]),2]), xlab="log10(Wallenius p-values)", ylab="log10(Hypergeometric p-values)", xlim=c(-3,0), ylim=c(-3,0))
                      >abline(0,1,col=3,lty=2)

                      The plots I got look weird to me. In the GOSeq vignette they argumented that the random sampling method and the wallenius aprox. method should be end with nearly the same results in contrast to the hypergeometric method. (Supported with the displayed plots). But in my plots it is vice versa!! (Plots are attached) Do you have a clue what has gone wrong, or has somebody an explanation for this?

                      One more thing I was concerned about: Why are there so few over/under represented GOs? I have just 4/3 GOs but 1578 DE genes in the DESeq analysis!

                      What I would appreciate is to get a script or input solution or that somebody see an error...

                      Thanks and cheers!
                      Alex

                      P.S. Is there a possibilty to get GO word clouds / networks or anything like this within the GOSeq script?
                      Attached Files

                      Comment


                      • #12
                        Hi there!
                        Does nobody have experienced the same or has a clue?
                        I would recommened eyery help, because I didn't solve it yet...

                        cheers
                        Alex

                        Comment


                        • #13
                          "I finally fixed the error, it is because when using the goseq function, the first argument is actually results from pwf, not genes. If this is corrected from the manual, then this error is gone."

                          Would you please explain it clearly, my email is : [email protected]
                          thanks a lot

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 03-27-2024, 06:37 PM
                          0 responses
                          12 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-27-2024, 06:07 PM
                          0 responses
                          11 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          53 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          68 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X