Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • annotate ID after using DESeq

    Hi everyone!

    I have my rnaseq topTable below. And I would like to add a column with an alias code. The ID code in the topTable is the ORF code and I would like to add a alias code and predicted function to my data. Does anyone know what the best way of doing this is?

    > write.csv( res.combined, file="rnaseq.topTable.csv" )
    > res.combined[1:5,]
    ID mean.d0 mean.d1 mean.d3 mean.d6 FC.d1-d0 FC.d3-d0 FC.d6-d0
    1 An01g00010 22.56114 91.74978 91.6181 101.3439 4.0667164 4.0608797 4.4919670
    2 An01g00020 59.12810 157.12747 186.8015 195.0879 2.6574079 3.1592673 3.2994108
    3 An01g00030 2496.49429 877.02756 1232.1749 1349.5057 0.3513037 0.4935621 0.5405603
    4 An01g00040 1697.34332 1329.64879 1256.3100 1336.8125 0.7833706 0.7401626 0.7875911
    5 An01g00050 8234.36186 940.63154 1322.0219 1385.3169 0.1142325 0.1605494 0.1682361
    logFC.d1-d0 logFC.d3-d0 logFC.d6-d0 pval.d1-d0 pval.d3-d0 pval.d6-d0 qval.d1-d0
    1 2.0238644 2.0217923 2.1673473 5.947051e-07 6.311527e-07 7.022009e-08 1.547561e-06
    2 1.4100197 1.6595900 1.7222084 4.491603e-06 5.141473e-08 1.880176e-08 1.079539e-05
    3 -1.5092095 -1.0186965 -0.8874725 1.772232e-15 5.512175e-08 1.864615e-06 8.577393e-15
    4 -0.3522332 -0.4340859 -0.3444813 6.577599e-02 2.044133e-02 6.936071e-02 9.494206e-02
    5 -3.1299552 -2.6389108 -2.5714408 4.471197e-59 4.230865e-44 5.274404e-42 1.569486e-57
    qval.d3-d0 qval.d6-d0
    1 1.815415e-06 2.052575e-07
    2 1.632608e-07 5.782136e-08
    3 1.746078e-07 4.783620e-06
    4 3.341204e-02 9.971601e-02
    5 9.841654e-43 1.024172e-40


    Excel file with needed info from an article:

    ORF code Alias Code Predicted function
    An01g00010 An00g03020 hypothetical protein [truncated ORF]
    An01g00020 An00g13235 weak similarity to nucleotide binding protein phnN - Escherichia coli
    An01g00030 An00g08601 strong similarity to Hgh1 - Saccharomyces cerevisiae
    An01g00040 An00g07028 strong similarity to alpha subunit of transcription initiation factor TFIIF Tfg1 - Saccharomyces cerevisiae [truncated ORF]
    An01g00050 An00g04204 similarity to fatty-acyl-CoA synthase beta chain Fas1 - Saccharomyces cerevisiae [truncated ORF]


    Thank you!

  • #2
    Presumably you have access to information relating your ORFs to their predicted functions (otherwise, you get to make such a file). You can then load that and use "match()" to match the IDs and descriptions (technically, you match the IDs in both and the result is a vector of indices that you can use to resort the descriptions to then match your ORFs).

    Comment


    • #3
      Thanks, I think I understand what you are saying, but I already have a file that relates the ORF to the function (see excel file above). I need to link that file to the ORFs in the other file...But I dont really now much about the match function. I guess I use the base package?

      Comment


      • #4
        Ah, I guess I hadn't fully realised that that was an Excel file that you already had. Assuming the excel file has been saved as a tsv (tab separated values) file called "foo.tsv":

        Code:
        anno <- read.delim("foo.tsv", header=T)
        m <- match(res.combined$ID, anno$ORF)
        res.combined$AliasCode <- anno$Alias.Code #I think it'll be called that
        res.combined$PredFunc <- anno$Predicted.Function #Or whatever it gets called
        Something like that should work. The match() function is pretty handy.

        Comment


        • #5
          I think that should work yes, thanks. But I cant get hold of that file. it is a xls file but i cant safe it as tab delimited or download it as other than a worksheet. I also tried to make my own file..

          You can find the file here. http://www.biomedcentral.com/1471-21...380/additional
          additional file 1

          Comment


          • #6
            I've attached the information from supplemental figure 1 as a tab separated value text file (it's gzipped so the forum will allow it).
            Attached Files

            Comment


            • #7
              lets assume that table with the annotation is called "annot" and it has the comumn with the An01g00010 entries also called "ID" the you want:
              Code:
              New <- merge(res.combined,annot,by="ID",all.x=T,sort=F)
              edit:
              Oh I see it is called "ORF code"
              Code:
              New <- merge(res.combined,annot,by.x="ID",by.y="ORF code",all.x=T,sort=F)
              Last edited by Jeremy; 04-02-2014, 07:33 PM.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              25 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X