Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gene level analysis (lincRNA, pseudogenes) ?

    Hey, there,

    I used STAR and featurecounts on 20 mouse samples from RNAseq.
    With many helps from the forums, now I got lists of DEGs.

    I realized that there are quite some lincRNA pseudogenes in my DEG lists.
    I was planing to exclude them from the following network, enrichment analysis, etc (using X2K package, and DAVID...any other suggestions?). I plan to deal with lincRNA separately from the rest protein coding genes and discard the pseudogenes. What do people usually do with these kind genes in the RNAseq field?


    On the other hand, I do have lists of DEGs from BWA/DESeq2 analysis. These lists of DEG use RefSeq ID as the primary keys. I have used Biomart in Bioconductor to do the annotation to add MGI symbol and description. I found that with Biomart there are still some RefSeq IDs mapped to "NA" in the SYMBOL columns. I manually searched a particular RefSEqID with NA symbol in MGI and I did find a gene name associated with it. So it seems that Biomart/ENSEMBL is not very updated. How could I do more update on these missing symbols?

    I was following the instruction of http://www.bioconductor.org/help/wor...Gene/#annotate to update the annotation in my DEG files. I used the code they designed:


    convertIDs <- function( ids, from, to, db, ifMultiple=c("putNA", "useFirst")) {
    stopifnot( inherits( db, "AnnotationDb" ) )
    ifMultiple <- match.arg( ifMultiple )
    suppressWarnings( selRes <- AnnotationDbi::select(
    db, keys=ids, keytype=from, columns=c(from,to) ) )
    if ( ifMultiple == "putNA" ) {
    duplicatedIds <- selRes[ duplicated( selRes[,1] ), 1 ]
    selRes <- selRes[ ! selRes[,1] %in% duplicatedIds, ]
    }
    return( selRes[ match( ids, selRes[,1] ), 2 ] )
    }

    I loaded my DEG csv file to deg_01 first. The id column has all the refseq ids.

    Then

    > deg_01$newMGI_symbol <- convertIDs("deg_01$id", "REFSEQ", "SYMBOL", org.Mm.eg.db)

    Error in .testForValidKeys(x, keys, keytype) :
    None of the keys entered are valid keys for 'REFSEQ'. Please use the keys method to see a listing of valid arguments.

    I checked and the deg_01$id are all refseq ids.

    > deg_01$id
    [1] NM_026268 NM_025777 NM_013478 NR_015524 NM_001099297 NM_177610 NM_001122954
    [8] NM_144834 NM_011110 NM_016668 NM_013820 NM_001145874 NM_001081060 NM_080457


    Any suggestions? Thanks.
    Last edited by neokao; 04-05-2015, 04:43 PM.

  • #2
    if you look at how convertIDs() is used in that guide, the first argument should be the actual id's vector.

    "deg_01$id" is a character string

    deg_01$id is a vector

    We're working on finding a way to get this function into a package somewhere...

    Comment


    • #3
      I actually did this first:

      > deg_01$newMGI_symbol <- convertIDs(deg_01$id, "REFSEQ", "SYMBOL", org.Mm.eg.db)

      Error in .testForValidKeys(x, keys, keytype) :
      'keys' must be a character vector

      Did I misunderstand anything? Any suggestions?

      Comment


      • #4
        Try investigating the problem with class():

        class(deg_01$id)
        head(deg_01$id)

        Comment


        • #5
          > class(deg_01$id)
          [1] "factor"
          > head(deg_01$id)
          [1] NM_026268 NM_025777 NM_013478 NR_015524 NM_001099297 NM_177610
          142 Levels: NM_001004140 NM_001005423 NM_001014976 NM_001024139 NM_001033484 ... NR_073167

          Originally posted by Michael Love View Post
          Try investigating the problem with class():

          class(deg_01$id)
          head(deg_01$id)

          Comment


          • #6
            It turned out the issue is on the "factor".

            I followed the Bioconductor's suggestion and did:

            > deg_01$newMGI_symbol <- convertIDs(as.character(deg_01$id), "REFSEQ", "SYMBOL", org.Mm.eg.db)

            It worked. I did not know that factor vs character makes such a difference because the same input works well with biomaRt .

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            58 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            45 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X