SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
same xloc id but different tcons id with same exons madsaan Bioinformatics 1 10-09-2014 06:46 AM
Definition/Origin of XLOC clsppb Bioinformatics 0 10-19-2011 12:47 PM
Xloc MrRight RNA Sequencing 0 06-27-2011 03:32 AM
PubMed: Quantification of Gene Transcripts with Deep Sequencing Analysis of Gene Expr Newsbot! Literature Watch 0 01-13-2011 03:00 AM
ChIP-Seq: EpiChIP: gene-by-gene quantification of epigenetic modification levels. Newsbot! Literature Watch 0 12-07-2010 03:00 AM

Reply
 
Thread Tools
Old 04-09-2012, 06:41 PM   #1
JMo
Junior Member
 
Location: Cambridge, MA

Join Date: Apr 2012
Posts: 2
Default XLOC gene id

Does anyone know what the "XLOC" gene IDs are, and how to convert them to actual gene names or some other useable identifier?

The first few columns of my Cuffdiff data looks like this:

test_id gene_id gene locus
XLOC_000001 XLOC_000001 - chr1:162458-171994
XLOC_000002 XLOC_000002 - chr1:860763-880142
JMo is offline   Reply With Quote
Old 04-10-2012, 05:58 AM   #2
severin
Genome Informatics Facility
 
Location: Iowa @isugif

Join Date: Sep 2009
Posts: 105
Default Cufflink IDs

They are CuffLinks IDs. If you run CuffLinks with a GTF or GFF file you will get gene names instead of XLocs. If you have a genome without an annotation file then You could extract those sequences and blast them for an initial identification. Though Ideally you would run your genome through Maker or other gene model prediction software before running CuffLinks.
severin is offline   Reply With Quote
Old 06-01-2012, 12:07 PM   #3
ojham
Member
 
Location: brookings, Sd

Join Date: May 2012
Posts: 16
Default

If you run CuffLinks with a GTF or GFF file you will get gene names instead of XLocs.


can you please explain this in detail steps ? Thank you in advance.
ojham is offline   Reply With Quote
Old 07-10-2012, 06:28 PM   #4
vkartha
Member
 
Location: Boston

Join Date: Feb 2012
Posts: 28
Default

I ran Cufflinks with the -G flag (i.e. providing an annotation file (gtf file from UCSC) and suggesting to not perform novel transcript discovery) and I still got this XLOC id format. I am having trouble converting them.
vkartha is offline   Reply With Quote
Old 10-26-2012, 05:09 PM   #5
JQL
Member
 
Location: MO, USA

Join Date: Apr 2011
Posts: 82
Default

I saw this thread and thought would like to bring this alive again.

I am having similar issues. The GTF file I used was from Ensembl where gene IDs are Ensembl IDs. The cuffdiff output file replaced the Ensembl IDs with XLOC_'s although it also output gene names (e.g. BCL2). Ensembl IDs were no longer there.

Is there anyway to convert XLOC back to Ensemble IDs, or simply keep the ensembl IDs from my GTF file? how do you guys go about this? I try to think what was the authors' intention to replace useful IDs with XLOC's?

Interesting enough, if I don't run new gene discovery (i.e. without doing cuffmerge step), I got to keep Ensembl IDs.

thoughts?
JQL is offline   Reply With Quote
Old 03-31-2014, 04:32 AM   #6
emp
Member
 
Location: india

Join Date: Jan 2014
Posts: 11
Default

I faced the similar problem but then used -g with GTF file and got the IDs in my file during cuffdiff...
emp is offline   Reply With Quote
Old 04-03-2014, 08:54 AM   #7
blakeoft
Member
 
Location: Connecticut

Join Date: Oct 2013
Posts: 79
Default

I used this solution (http://seqanswers.com/forums/showthread.php?t=18357):
Quote:
Thomas Doktor said:
Quote:
cuff <- readCufflinks()

#Retrive significant gene IDs (XLOC) with a pre-specified alpha
diffGeneIDs <- getSig(cuff,level="genes",alpha=0.05)

#Use returned identifiers to create a CuffGeneSet object with all relevant info for given genes
diffGenes<-getGenes(cuff,diffGeneIDs)

#gene_short_name values (and corresponding XLOC_* values) can be retrieved from the CuffGeneSet by using:
names<-featureNames(diffGenes)
row.names(names)=names$tracking_id
diffGenesNames<-as.matrix(names)
diffGenesNames<-diffGenesNames[,-1]

# get the data for the significant genes
diffGenesData<-diffData(diffGenes)
row.names(diffGenesData)=diffGenesData$gene_id
diffGenesData<-diffGenesData[,-1]

# merge the two matrices by row names
diffGenesOutput<-merge(diffGenesNames,diffGenesData,by="row.names")
diffGenesOutput will then by a list of genes with the XLOC name as well as the gene name (like BATF3).

Last edited by blakeoft; 12-08-2014 at 06:43 AM.
blakeoft is offline   Reply With Quote
Old 10-01-2014, 07:59 AM   #8
Gonza
Member
 
Location: Ithaca, NY

Join Date: Mar 2013
Posts: 78
Default

Hi All,

This works great, so many thanks. One quick question, I am having a hard time inserting a column between "value_2" and "log2_fold_change". I can make the new column but it goes to the end of the data frame. The new columned (Ratio) it is placed after the 'significant' column. For example:

myGenesOutput$Ratio <- myGenesOutput$TRT_fpkm/myGenesOutput$CTR_fpkm

Any thoughts? Thanks
Cheers
G
Gonza is offline   Reply With Quote
Old 10-01-2014, 08:30 AM   #9
blakeoft
Member
 
Location: Connecticut

Join Date: Oct 2013
Posts: 79
Default

Gonza,

Just rearrange the columns. For example, if your data frame called df has three columns, and you want the third column to come before the second column, do
Code:
df <- df[, c(1, 3, 2)]
If you're still having trouble, tell me what
Code:
names(myGenesOutput)
gives you along with the desired order of the names, and I'll be able to help you more explicitly.
blakeoft is offline   Reply With Quote
Old 10-01-2014, 08:42 AM   #10
Gonza
Member
 
Location: Ithaca, NY

Join Date: Mar 2013
Posts: 78
Default

Thanks so much that rearrange worked fantastic!!!!!!!
G
Gonza is offline   Reply With Quote
Old 10-06-2014, 07:32 AM   #11
Gonza
Member
 
Location: Ithaca, NY

Join Date: Mar 2013
Posts: 78
Default

Hello again,

I have another R question, please some advice.
I am plotting the FPKM expression (log data) of a certain gene using the scrip below and I cannot figure out how to make the y-axis to show up as "10 to the 1", "10 to the 1.5", "10 to the 2", etc.
Instead,the graph shows FKPM+1 values as 1, 10 and 100.

Any ideas?

Script:
myGeneBHLH100_isoform_logModeT<-expressionPlot(isoforms(myGeneBHLH100),logMode=T)
myGeneBHLH100_isoform_logModeT + theme_bw()
Gonza is offline   Reply With Quote
Old 10-06-2014, 08:12 AM   #12
blakeoft
Member
 
Location: Connecticut

Join Date: Oct 2013
Posts: 79
Default

Gonza,

This appears to be the way to do it with ggplot2. I've tried it with the sample cummeRbund data, and the results are a little goofy. The y axis ticks are at 10^(2.6), 10^(2.8), etc. Maybe it would look better if your data had values that were spread over more powers of 10, or perhaps this is what you're looking for. Try

Code:
library(scales)
myGeneBHLH100_isoform_logModeT<-expressionPlot(isoforms(myGeneBHLH100), logMode=T)
myGeneBHLH100_isoform_logModeT +
   theme_bw() +
   scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                     labels = trans_format("log10", math_format(10^.x)))
Here's a source R Cookbook. See the section titled "Axis transformations: log, sqrt, etc." This page has an example with axis ticks that are integer powers of 10.

Edit: Oh. It looks like you're ok with rational powers of 10.
blakeoft is offline   Reply With Quote
Old 10-06-2014, 08:54 AM   #13
Gonza
Member
 
Location: Ithaca, NY

Join Date: Mar 2013
Posts: 78
Default

Hey blakeoft, that worked beautifully, thanks much once again!. If you do not mind one last question please.....

When i type the command below I get 2 different plots (one for each isoform). Is there a way to plot those isoforms in the sample plot? Somehow they do it the cummeRbund protocol (Fig. 5a - Nature Protocols 7, 562–578 (2012) doi:10.1038/nprot.2012.016)

Full script :

myGeneId<-"XLOC_010858"
myGeneBHLH100<-getGene(cuff_data,myGeneId)
myGeneBHLH100

XLOC_010858 <-expressionPlot(myGeneBHLH100,logMode=T)
XLOC_010858 + theme_bw()

myGeneBHLH100_isoform_logModeT<-expressionPlot(isoforms(myGeneBHLH100), logMode=T)
myGeneBHLH100_isoform_logModeT + theme_bw() + scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)))
Gonza is offline   Reply With Quote
Old 10-06-2014, 09:59 AM   #14
blakeoft
Member
 
Location: Connecticut

Join Date: Oct 2013
Posts: 79
Default

Gonza,

It looks like expressionPlot() has been updated at some point so that the isoforms are now plotted side by side. Have you looked at the manual? It has the plots side by side in its example. It also has the FPKM values as integers in log mode, instead of the "10^x" format. I could be wrong because the paper and the manual are both dated 2012.

I tried to use ggplot2 to plot this for you. Anyways, this is the best that I could do.

Code:
iso_plot <- ggplot(isoforms(myGeneBHLH100)@fpkm,
                   aes(x = sample_name, y = fpkm, group = isoform_id, color = isoform_id))
iso_plot +
   geom_line() + theme_bw() +
   scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                 labels = trans_format("log10", math_format(10^.x))) +
   geom_errorbar(aes(ymin = conf_lo, ymax = conf_hi)) # + geom_point(color = "black", shape = 19)
Some aesthetics aren't the same as the normal plots that cummeRbund makes, for example the colors of the lines are different. You can mess around with those colors, the line thickness, etc., but this looks pretty close to what they have in the paper. If you want black data points like in the manual, uncomment the geom_point part on the last line.

Edit: I think that some people frown on multiple line plots like this because they can get crowded. One way to mitigate this is to do what is called dodging. Here's how you'd do it for this plot:

Code:
iso <- isoforms(myGeneBHLH100)
pd <- position_dodge(0.3)
iso_plot <- ggplot(isoforms(myGeneBHLH100)@fpkm,
                   aes(x = sample_name, y = fpkm, group = isoform_id, color = isoform_id))
iso_plot +
   geom_line(position = pd) + theme_bw() + 
   scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
                 labels = trans_format("log10", math_format(10^.x))) +
   geom_errorbar(aes(ymin = conf_lo, ymax = conf_hi), position = pd) # + geom_point(color = "black", shape = 19, position = pd)

Last edited by blakeoft; 10-06-2014 at 10:10 AM. Reason: made the black data points come after error bars
blakeoft is offline   Reply With Quote
Old 10-06-2014, 10:22 AM   #15
Gonza
Member
 
Location: Ithaca, NY

Join Date: Mar 2013
Posts: 78
Default

Hi blakeoft, that worked well. I am so grateful to your help!.
But you are totally right, after playing around with it, the graphs seems pretty crowded, does not look as good as i thought.

Again, many many many thanks for your help and time (and i may have another questions as i go along....)

Best
G
Gonza is offline   Reply With Quote
Old 11-21-2014, 05:12 PM   #16
Gonza
Member
 
Location: Ithaca, NY

Join Date: Mar 2013
Posts: 78
Default

Hey blakeoft and all,

I am building a barplot (using ggplot) with values for 5 genes (SHP....PHB) side by side on the x axis comparing normalized fold induction on the y axis (for example, the most induced gene is SHP with normalized fold induction of ~30, then NGA1, etc). I have struggled a lot to add the error bars on top of the graph.
Could anyone please give me a hand with this? Many thanks.

genes<-factor(c('SHP', 'NGA1', 'PAN', 'TUB', 'PHB'))
values1<-c(29.77,4.55,3.23,1.28,0.06)
values2<-c(30.37,3.43,2.07,0.81,4.93)
df<- data.frame(genes,values1,values2)

ggplot(data=df,aes(x=genes,y=values1)) + geom_bar(stat='identity') +
scale_x_discrete(limits=df$genes[order(levels(df$genes))])

Last edited by Gonza; 11-21-2014 at 05:41 PM.
Gonza is offline   Reply With Quote
Old 11-26-2014, 09:59 AM   #17
blakeoft
Member
 
Location: Connecticut

Join Date: Oct 2013
Posts: 79
Default

Gonza,

Sorry for the late reply. Can you explain your data a little more? You want to add error bars, but that usually means that you have upper and lower bounds. At first it looks like you want the upper bounds to be values2, but upon closer inspection, most of these numbers are smaller than the corresponding numbers in values1.

Let's suppose you have two vectors, error.hi and error.lo, that contain upper and lower error bounds, respectively. For the sake of running the code at the end of my post, let's define them as

Code:
error.hi <- values1 + 0.5
error.lo <- values1 - 0.5
error.lo[error.lo < 0] <- 0
Then you'd want to run

Code:
ggplot(data=df, aes(x = genes,y = values1, fill = genes)) +
    geom_bar(stat = 'identity') +
    scale_x_discrete(limits = df$genes[order(levels(df$genes))]) +
    geom_errorbar(aes(x = genes, ymin = error.lo, ymax = error.hi, width = 0.5))
blakeoft is offline   Reply With Quote
Old 11-28-2014, 10:43 AM   #18
Gonza
Member
 
Location: Ithaca, NY

Join Date: Mar 2013
Posts: 78
Default

Thanks blakeoft. I actually made a lot of progress since that previous post. I still have one quick question though...

When i build the graph i cannot organize it in such a way that the genes (on x axis) are arranged from the highest to the lowest values (R automatically orders them by name). Any thoughts on how to order this?

Thanks again
-G

Please see my code:

#########

rm (list=ls())
library(plyr)
library(ggplot2)
library(bear)

# make a data frame called 'df'

SHP<- c(29.77,30.37)
NGA1 <- c(4.55,3.43)
PAN<-c(3.23,2.07)
TUB<-c(1.28,0.81)

gene<-as.factor(c("SHP","SHP","NGA1","NGA1","PAN","PAN","TUB","TUB"))
df<-data.frame(gene,c(SHP,NGA1,PAN,TUB))
df

colnames(df)[2]<- "CT_values"

# summarySE to provide the standard deviation, standard error of the mean, and a (default 95%) confidence interval
dfc <- summarySE(df, measurevar="CT_values", groupvars=c("gene"))


##Plot

p<- ggplot(dfc, aes(x=gene, y=CT_values, fill=gene)) +
geom_bar(position=position_dodge(), stat="identity",
colour="black", # Use black outlines,
size=.3) + # Thinner lines
geom_errorbar(aes(ymin=CT_values-se, ymax=CT_values+se),
size=.3, # Thinner lines
width=.2,
position=position_dodge(.9)) +
xlab("Gene Assayed") +
ylab("Normalized Fold Enrichment YFP+/YFP-") +
scale_fill_hue(name="GeneID", # Legend label, use darker colors
breaks=c("SHP","NGA1","PAN","TUB"),
labels=c(" SHP","NGA1","PAN","TUB")) +
ggtitle("mRNAs enriched in the sorted YFP+ protoplasts") +
scale_y_continuous(breaks=0:20*4) +
theme_bw()
p

##add label

label.df <- data.frame(gene = c("NGA1", "SHP"),
CT_values = c(5,31))
p + geom_text(data = label.df, label = "*")
Gonza is offline   Reply With Quote
Old 11-28-2014, 01:28 PM   #19
blakeoft
Member
 
Location: Connecticut

Join Date: Oct 2013
Posts: 79
Default

I'm not able to give you the best answer since I'm not at my work station.

Have a look at this link though. Please let me know if you're still having trouble after reading.

http://stackoverflow.com/questions/3...cale-in-ggplot
blakeoft is offline   Reply With Quote
Old 11-30-2014, 12:56 PM   #20
Gonza
Member
 
Location: Ithaca, NY

Join Date: Mar 2013
Posts: 78
Default

Hi blakeoft,

After a lot of pain.....it worked! I found this website more helpful.
http://rstudio-pubs-static.s3.amazon...15f513469.html

Again thank you....
Gonza is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:41 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO