Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • XLOC gene id

    Does anyone know what the "XLOC" gene IDs are, and how to convert them to actual gene names or some other useable identifier?

    The first few columns of my Cuffdiff data looks like this:

    test_id gene_id gene locus
    XLOC_000001 XLOC_000001 - chr1:162458-171994
    XLOC_000002 XLOC_000002 - chr1:860763-880142

  • #2
    Cufflink IDs

    They are CuffLinks IDs. If you run CuffLinks with a GTF or GFF file you will get gene names instead of XLocs. If you have a genome without an annotation file then You could extract those sequences and blast them for an initial identification. Though Ideally you would run your genome through Maker or other gene model prediction software before running CuffLinks.

    Comment


    • #3
      If you run CuffLinks with a GTF or GFF file you will get gene names instead of XLocs.


      can you please explain this in detail steps ? Thank you in advance.

      Comment


      • #4
        I ran Cufflinks with the -G flag (i.e. providing an annotation file (gtf file from UCSC) and suggesting to not perform novel transcript discovery) and I still got this XLOC id format. I am having trouble converting them.

        Comment


        • #5
          I saw this thread and thought would like to bring this alive again.

          I am having similar issues. The GTF file I used was from Ensembl where gene IDs are Ensembl IDs. The cuffdiff output file replaced the Ensembl IDs with XLOC_'s although it also output gene names (e.g. BCL2). Ensembl IDs were no longer there.

          Is there anyway to convert XLOC back to Ensemble IDs, or simply keep the ensembl IDs from my GTF file? how do you guys go about this? I try to think what was the authors' intention to replace useful IDs with XLOC's?

          Interesting enough, if I don't run new gene discovery (i.e. without doing cuffmerge step), I got to keep Ensembl IDs.

          thoughts?

          Comment


          • #6
            I faced the similar problem but then used -g with GTF file and got the IDs in my file during cuffdiff...

            Comment


            • #7
              I used this solution (http://seqanswers.com/forums/showthread.php?t=18357):
              Thomas Doktor said:
              cuff <- readCufflinks()

              #Retrive significant gene IDs (XLOC) with a pre-specified alpha
              diffGeneIDs <- getSig(cuff,level="genes",alpha=0.05)

              #Use returned identifiers to create a CuffGeneSet object with all relevant info for given genes
              diffGenes<-getGenes(cuff,diffGeneIDs)

              #gene_short_name values (and corresponding XLOC_* values) can be retrieved from the CuffGeneSet by using:
              names<-featureNames(diffGenes)
              row.names(names)=names$tracking_id
              diffGenesNames<-as.matrix(names)
              diffGenesNames<-diffGenesNames[,-1]

              # get the data for the significant genes
              diffGenesData<-diffData(diffGenes)
              row.names(diffGenesData)=diffGenesData$gene_id
              diffGenesData<-diffGenesData[,-1]

              # merge the two matrices by row names
              diffGenesOutput<-merge(diffGenesNames,diffGenesData,by="row.names")
              diffGenesOutput will then by a list of genes with the XLOC name as well as the gene name (like BATF3).
              Last edited by blakeoft; 12-08-2014, 06:43 AM.

              Comment


              • #8
                Hi All,

                This works great, so many thanks. One quick question, I am having a hard time inserting a column between "value_2" and "log2_fold_change". I can make the new column but it goes to the end of the data frame. The new columned (Ratio) it is placed after the 'significant' column. For example:

                myGenesOutput$Ratio <- myGenesOutput$TRT_fpkm/myGenesOutput$CTR_fpkm

                Any thoughts? Thanks
                Cheers
                G

                Comment


                • #9
                  Gonza,

                  Just rearrange the columns. For example, if your data frame called df has three columns, and you want the third column to come before the second column, do
                  Code:
                  df <- df[, c(1, 3, 2)]
                  If you're still having trouble, tell me what
                  Code:
                  names(myGenesOutput)
                  gives you along with the desired order of the names, and I'll be able to help you more explicitly.

                  Comment


                  • #10
                    Thanks so much that rearrange worked fantastic!!!!!!!
                    G

                    Comment


                    • #11
                      Hello again,

                      I have another R question, please some advice.
                      I am plotting the FPKM expression (log data) of a certain gene using the scrip below and I cannot figure out how to make the y-axis to show up as "10 to the 1", "10 to the 1.5", "10 to the 2", etc.
                      Instead,the graph shows FKPM+1 values as 1, 10 and 100.

                      Any ideas?

                      Script:
                      myGeneBHLH100_isoform_logModeT<-expressionPlot(isoforms(myGeneBHLH100),logMode=T)
                      myGeneBHLH100_isoform_logModeT + theme_bw()

                      Comment


                      • #12
                        Gonza,

                        This appears to be the way to do it with ggplot2. I've tried it with the sample cummeRbund data, and the results are a little goofy. The y axis ticks are at 10^(2.6), 10^(2.8), etc. Maybe it would look better if your data had values that were spread over more powers of 10, or perhaps this is what you're looking for. Try

                        Code:
                        library(scales)
                        myGeneBHLH100_isoform_logModeT<-expressionPlot(isoforms(myGeneBHLH100), logMode=T)
                        myGeneBHLH100_isoform_logModeT +
                           theme_bw() +
                           scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                                             labels = trans_format("log10", math_format(10^.x)))
                        Here's a source R Cookbook. See the section titled "Axis transformations: log, sqrt, etc." This page has an example with axis ticks that are integer powers of 10.

                        Edit: Oh. It looks like you're ok with rational powers of 10.

                        Comment


                        • #13
                          Hey blakeoft, that worked beautifully, thanks much once again!. If you do not mind one last question please.....

                          When i type the command below I get 2 different plots (one for each isoform). Is there a way to plot those isoforms in the sample plot? Somehow they do it the cummeRbund protocol (Fig. 5a - Nature Protocols 7, 562–578 (2012) doi:10.1038/nprot.2012.016)

                          Full script :

                          myGeneId<-"XLOC_010858"
                          myGeneBHLH100<-getGene(cuff_data,myGeneId)
                          myGeneBHLH100

                          XLOC_010858 <-expressionPlot(myGeneBHLH100,logMode=T)
                          XLOC_010858 + theme_bw()

                          myGeneBHLH100_isoform_logModeT<-expressionPlot(isoforms(myGeneBHLH100), logMode=T)
                          myGeneBHLH100_isoform_logModeT + theme_bw() + scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                          labels = trans_format("log10", math_format(10^.x)))

                          Comment


                          • #14
                            Gonza,

                            It looks like expressionPlot() has been updated at some point so that the isoforms are now plotted side by side. Have you looked at the manual? It has the plots side by side in its example. It also has the FPKM values as integers in log mode, instead of the "10^x" format. I could be wrong because the paper and the manual are both dated 2012.

                            I tried to use ggplot2 to plot this for you. Anyways, this is the best that I could do.

                            Code:
                            iso_plot <- ggplot(isoforms(myGeneBHLH100)@fpkm,
                                               aes(x = sample_name, y = fpkm, group = isoform_id, color = isoform_id))
                            iso_plot +
                               geom_line() + theme_bw() +
                               scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                                             labels = trans_format("log10", math_format(10^.x))) +
                               geom_errorbar(aes(ymin = conf_lo, ymax = conf_hi)) # + geom_point(color = "black", shape = 19)
                            Some aesthetics aren't the same as the normal plots that cummeRbund makes, for example the colors of the lines are different. You can mess around with those colors, the line thickness, etc., but this looks pretty close to what they have in the paper. If you want black data points like in the manual, uncomment the geom_point part on the last line.

                            Edit: I think that some people frown on multiple line plots like this because they can get crowded. One way to mitigate this is to do what is called dodging. Here's how you'd do it for this plot:

                            Code:
                            iso <- isoforms(myGeneBHLH100)
                            pd <- position_dodge(0.3)
                            iso_plot <- ggplot(isoforms(myGeneBHLH100)@fpkm,
                                               aes(x = sample_name, y = fpkm, group = isoform_id, color = isoform_id))
                            iso_plot +
                               geom_line(position = pd) + theme_bw() + 
                               scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                                             labels = trans_format("log10", math_format(10^.x))) +
                               geom_errorbar(aes(ymin = conf_lo, ymax = conf_hi), position = pd) # + geom_point(color = "black", shape = 19, position = pd)
                            Last edited by blakeoft; 10-06-2014, 09:10 AM. Reason: made the black data points come after error bars

                            Comment


                            • #15
                              Hi blakeoft, that worked well. I am so grateful to your help!.
                              But you are totally right, after playing around with it, the graphs seems pretty crowded, does not look as good as i thought.

                              Again, many many many thanks for your help and time (and i may have another questions as i go along....)

                              Best
                              G

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              56 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              70 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X