Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GOSeq analysis problem with geneLenDataBase

    Hi!

    I am doing analysis on my RNA-Seq data. When I reached to the GOSeq I encountered a problem as there is no support for the genome and gene references I need. The error message:

    > pwf=nullp(genes,"mm10","refGene")
    Error in getlength(names(DEgenes), genome, id) :
    Length information for genome mm10 and gene ID refGene is not in the geneLenDataBase database. You will have to specify bias.data manually.

    Is it even possible to manually add length data or how should I proceed? I could easily get the lengths from the reference files, but how can I import the gene lengths to GOSeq? Or maybe my only option is to wait for the upgrade of geneLenDataBase?

    Thanks in advance!

  • #2
    Hi,

    If you can get gene length data, you can pass it as a vector to the argument bias.data of nullp. The length data format is (from http://www.bioconductor.org/packages.../doc/goseq.pdf)

    5.1 Length data format
    The length data must be formatted as a numeric vector, of the same length as the main named vector specifying gene names/DE genes. Each entry should give the length of the corresponding gene in bp. If length data is unavailable for some genes, that entry should be set to NA.
    Good luck!
    Dario

    Comment


    • #3
      Thank you a lot! I should have read the manual more carefully. Actually the manual resolves my current issues clearly, but thank you for pointing that out!

      Sander

      Comment


      • #4
        Dear Dario,

        I was having the same problems Sander had and even following the format suggested in the manual I could not get rid of them. I really do not know what I am doing wrong.
        I created a mock set of results to test the procedure. This is the code I am using:

        > de.genes <- scan("de_genes.txt", what=character() )
        Read 27 items
        > assayed.genes <- scan("all_genes.txt", what=character() )
        Read 37 items
        > gene.length=scan("gene_lengths.txt", what=numeric() )
        Read 27 items
        > names(gene.vector) = assayed.genes
        > pwf=nullp(gene.vector,bias.data=gene.length)
        Error in nullp(gene.vector, bias.data = gene.length) :
        bias.data vector must have the same length as DEgenes vector!

        R is telling me the size of de.genes and gene.length is the same but it stills sends me the error message. If would really appreciate if someone could help me with this problem.

        Thanks

        Jorge

        Comment


        • #5
          Hi guys,
          I have a similar problem as well when working with GOSeq. There is support for mm10 genome but not Gene ID ( I am using geneSymbol).
          I am trying to get length information by following the Goseq manual but I still dont understand. So, could you please show me how to get the length information for mm10 genome and geneID geneSymbol ?

          >genes = as.integer(all.genes %in% F.genes)
          > names(genes) = all.genes
          > head(genes)
          Cryba1 Cryba4 Cryga Crygb Crygc Crygd
          1 1 1 1 1 1
          > pwf=nullp(genes,"mm10", "geneSymbol")
          Can't find mm10/geneSymbol length data in genLenDataBase... Trying to download from UCSC. This might take a couple of minutes.
          Error in value[[3L]](cond) :
          Length information for genome mm10 and gene ID geneSymbol is not available. You will have to specify bias.data manually.

          Thank you so much

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          59 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          57 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          51 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          55 views
          0 likes
          Last Post seqadmin  
          Working...
          X