SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
First 6 genes missing from HTSeq when read into edgeR sindrle Bioinformatics 3 01-24-2014 04:26 AM
EdgeR heatmap specific genes claire5 Bioinformatics 1 10-25-2013 05:56 AM
How to rescue multi-reads when using htseq to generate edgeR/DESeq counts? Hilary April Smith Bioinformatics 3 05-06-2013 12:07 PM
DESeq: problem in viewing heatmap.2 output coutellec RNA Sequencing 0 02-03-2013 08:53 AM
help with a heatmap with deseq - legend? vebaev Bioinformatics 3 03-03-2012 10:39 AM

Reply
 
Thread Tools
Old 05-12-2014, 10:41 AM   #101
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
Perhaps the uniprot IDs haven't been merged into biomart yet. Since you have an annotation file, the conversions are contained in it. So just load it into R with GenomicRanges and then get the conversions from the mcols() of the resulting GRanges object.

A "2'-5'-oligoadenylate synthetase" would add ATP to oligos (if you google this, you'll find that apparently this particular method ends up activating ribonucleases and leading to RNA degradation).

Thank you! But bioDBnet (http://biodbnet.abcc.ncifcrf.gov/db/db2db.php) has merged the uniprot, however it seems not working. Could you please try it? I am wondering if I selected the wrong catalog (e.g. gene symbol, gene ID...). Because if I input the gene symbol 'OAS1 ', and the Outputs select the 'UniProt Accession', I don't find the 'Q865A2' in the result list, and hence, if I input 'UniProt Accession' is 'Q865A2', outputs is null. It is strange!

BTW, how to define the "2'-5'-oligoadenylate synthetase" for 'OAS1' , or "prostaglandin D2 receptor" for 'PTGDR'? Are these long name called "Full gene name"? Could I use Biomart or bioDBnet or R packages you mentioned to transfer the Ensembl ID to this so called 'Full gene name'? Or the only way is to Google them one by one after I got the list?
Thank you!

Last edited by super0925; 05-12-2014 at 10:51 AM.
super0925 is offline   Reply With Quote
Old 05-12-2014, 10:54 AM   #102
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

"2'-5'-oligoadenylate synthetase" is an example of a gene name, yes. I also can't get bioDBnet to work with that example, but I've never used the site before. As I suggested earlier, just parse the annotation file you're using, since it apparently has both.

Last edited by dpryan; 05-13-2014 at 01:39 AM.
dpryan is offline   Reply With Quote
Old 05-13-2014, 01:10 AM   #103
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
"2'-5'-oligoadenylate synthetase" is an example of a gene name, yes. I also can't get bioDBnet to work with that example, but I've never used the site before. As I suggested earlier, just parse the annotation file you're using, since it apparently have both.
Sorry I haven't got your idea. Do you mean once I get the gene name list , like OAS1, PTGDR, etc. If I want to transfer them to their full name (or as you said the example of gene name) to "2'-5'-oligoadenylate synthetase","prostaglandin D2 receptor", I need to Google it?
Could the annotation file e.g. GTF file have these full name? I only remember it has the Ensembl ID and hence I need to transfer the ID to gene names (e.g. OAS1) by online converter tools. But for how to transfer them to the full name, I am still confused.
Thank you!

Last edited by super0925; 05-13-2014 at 01:15 AM.
super0925 is offline   Reply With Quote
Old 05-13-2014, 01:41 AM   #104
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

No, just use R/biopython/bioperl/whatever to read in the GTF/GFF file that you used with cufflinks & htseq-count and then extract the ID conversions with that. Since I assume that you used the same annotation file for both tools, it must have both IDs, which means that that file can be used to define the ID mappings. Yes, this will take a modicum of programming.
dpryan is offline   Reply With Quote
Old 05-13-2014, 02:19 AM   #105
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
No, just use R/biopython/bioperl/whatever to read in the GTF/GFF file that you used with cufflinks & htseq-count and then extract the ID conversions with that. Since I assume that you used the same annotation file for both tools, it must have both IDs, which means that that file can be used to define the ID mappings. Yes, this will take a modicum of programming.
Hi Devon
I still have two questions:
Q1:
The full name "prostaglandin D2 receptor" is not from my GTF file... What is strange , in gene names from GTF file, I think some use UniProt Accession(e.g. ATPO_BOVIN) but some use gene names(e.g. ITSN1 ). It made me confused.
I have screenshot to you (My GTF file is from Ensembl Database. Bovine's genome annotation, Btau_4.0).
Hence I don't know how to uniform them? I think the reasonable method is use Ensembl ID which is uniformed (by (1)"Tophat-htseq-edgeR" or (2)"Tophat-Cuffdiff" but without Cufflinks and Cuffmerge) , after then, I translate them to gene name by R package or online tools (Biomart) .
Am I right? Or DO you have any better solution?
Q2:
The full name(e.g. "2'-5'-oligoadenylate synthetase","prostaglandin D2 receptor") is my collaborator sent to me, who is more interested with the long full name than gene name. He google the gene name or Ensembl ID and get the full name...
Hence I am think a automatically transfer method than manually Google it.
Or may I change another annotation file (GTF) e.g. from UCSC database rather than Ensembl?
Thank you!
Attached Images
File Type: png Untitled.png (99.6 KB, 2 views)

Last edited by super0925; 05-13-2014 at 04:23 AM.
super0925 is offline   Reply With Quote
Old 05-13-2014, 04:42 AM   #106
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I was talking about the ensembl ID and the uniprot ID, which both are. Ensembl IDs are easy to deal with, so just convert everything to that by using your annotation file's mappings and then use those to derive whatever other names you need. You can do that in most any language (or just load things into R as a GRanges object, in which case it'll parse things for you and you can just use unique() on a subset of the mcols()).
dpryan is offline   Reply With Quote
Old 05-13-2014, 06:22 AM   #107
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
I was talking about the ensembl ID and the uniprot ID, which both are. Ensembl IDs are easy to deal with, so just convert everything to that by using your annotation file's mappings and then use those to derive whatever other names you need. You can do that in most any language (or just load things into R as a GRanges object, in which case it'll parse things for you and you can just use unique() on a subset of the mcols()).
How about my methods in post#105?
"
Use Ensembl ID which is uniformed by the genes.gtf (1)"Tophat-htseq-edgeR" or (2)"Tophat-Cuffdiff" but without Cufflinks and Cuffmerge(because Cufflinks+Cuffmerge would generate a merged.gtf whose gene name is mixture with gene name and uniprot ID as I show to you) , after then, I translate them to gene name by R package or online tools (Biomart) .
"

And "Tophat-htseq-edgeR/DESeq", the reads counts table is given by Ensembl ID, which I have tried.
I also have run the "Texudo without Cufflinks and Cuffmerge" and the result could also give you the gene Ensembl ID.
However, I miss the Cufflinks+Cuffmerge steps.
Could you please tell me how to convert texudo output to Ensembl ID by using annotation file's mappings? using AWK?

Last edited by super0925; 05-13-2014 at 06:59 AM.
super0925 is offline   Reply With Quote
Old 05-13-2014, 07:54 AM   #108
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by super0925 View Post
How about my methods in post#105?
I suppose as long as you told cufflinks to only perform its quantitation based on the annotation and not do anything else. Otherwise you'd be comparing apples to oranges.

Quote:
Could you please tell me how to convert texudo output to Ensembl ID by using annotation file's mappings? using AWK?
Awk would work if you want. I'd personally use R with GenomicRanges, but there are many ways to skin this proverbial cat.
dpryan is offline   Reply With Quote
Old 05-14-2014, 07:56 AM   #109
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
I suppose as long as you told cufflinks to only perform its quantitation based on the annotation and not do anything else. Otherwise you'd be comparing apples to oranges.



Awk would work if you want. I'd personally use R with GenomicRanges, but there are many ways to skin this proverbial cat.

Hi I have a small question.
If I have 3 conditions . I thought I could use egdeR (is that GLM model I read from some other posts) to analysis it. Am I right?
How about DESeq?
So far I know I could use 3 independence test (C1 vs C2, C2 vs C3, C1 vs C3) to get the DE genes. It is good but requiring for three times.

Last edited by super0925; 05-14-2014 at 07:59 AM.
super0925 is offline   Reply With Quote
Old 05-14-2014, 08:02 AM   #110
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by super0925 View Post
Hi I have a small question.
If I have 3 conditions . I thought I could use egdeR (is that GLM model I read from some other posts) to analysis it. Am I right?
DESeq (better would be DESeq2) would work fine as well.
dpryan is offline   Reply With Quote
Old 05-15-2014, 03:34 AM   #111
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
DESeq (better would be DESeq2) would work fine as well.
But Cuffdiff could only do pairwise groups, am I right?
So I need to do C1vsC2,C2vsC3,C3vsC1 in cuffdiff one by one, and compare the result with edgeR or DESeq2(What you recommend is better and advantage than DESeq).
super0925 is offline   Reply With Quote
Old 05-15-2014, 03:55 AM   #112
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I haven't a clue how cuffdiff works internally, the documentation is simply insufficient there and I have no desire to go through its code.

I assume you mean to ask what the advantages of DESeq2 are over DESeq. Just have a look at the DESeq2 paper, which lays them out.
dpryan is offline   Reply With Quote
Old 05-15-2014, 08:21 AM   #113
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
I haven't a clue how cuffdiff works internally, the documentation is simply insufficient there and I have no desire to go through its code.

I assume you mean to ask what the advantages of DESeq2 are over DESeq. Just have a look at the DESeq2 paper, which lays them out.
Hi D,
Q1:I have seen your post on bioStar. You suggest we remove the low counts by genefilter when using DESeq/edgeR/limma. But DESeq2 automatically has filter. Am I right?
Q2:
But so many function in genefilter, do I use genefilter function?
how about the parameter?


their usage:
set.seed(-1)
f1 <- kOverA(5, 10)
flist <- filterfun(f1, allNA)
exprA <- matrix(rnorm(1000, 10), ncol = 10) (change it to count matrix)
ans <- genefilter(exprA, flist)

Is that OK?

Last edited by super0925; 05-15-2014 at 08:30 AM.
super0925 is offline   Reply With Quote
Old 05-15-2014, 08:22 AM   #114
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

At least the more recent versions, yes.
dpryan is offline   Reply With Quote
Old 05-15-2014, 08:56 AM   #115
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
At least the more recent versions, yes.
But so many function in genefilter,how about the parameter seeting?
their usage in Manual:
set.seed(-1)
f1 <- kOverA(5, 10)
flist <- filterfun(f1, allNA)
exprA <- matrix(rnorm(1000, 10), ncol = 10)
ans <- genefilter(exprA, flist)

My commands:
set.seed(-1)
f1 <- kOverA(5, 10)----Q: This means an expression measure above 10 in at least 5 samples.but how to decide this parameter? And, could I use ttest instead?
flist <- filterfun(f1)
exprA <- counts
----Two conditions with 3 samples in each group
head(counts)
C1.R1 C1.R2 C1.R3 C2.R1 C2.R2 C2.R3
ENSBTAG00000000003 0 0 0 0 0 0
ENSBTAG00000000005 1 0 0 1 0 0
ENSBTAG00000000008 2 2 1 0 2 0
ans <- genefilter(exprA, flist)
counts_filter<-counts[which(ans==1), ]


Is this OK?

Last edited by super0925; 05-15-2014 at 09:00 AM.
super0925 is offline   Reply With Quote
Old 05-15-2014, 09:00 AM   #116
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

You'll want to read the "Diagnostics for independent filtering" PDF, which uses an RNAseq example. I usually find filtered_R() to be the most convenient function in a script (for your first time doing things manually you should go ahead and use filtered_p() and rejection_plot() to get a feel for things.)
dpryan is offline   Reply With Quote
Old 05-15-2014, 10:34 AM   #117
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
You'll want to read the "Diagnostics for independent filtering" PDF, which uses an RNAseq example. I usually find filtered_R() to be the most convenient function in a script (for your first time doing things manually you should go ahead and use filtered_p() and rejection_plot() to get a feel for things.)
I have seen you PDF. It is nice. However, they use DESeq's fitNbinomGLMs function to decide the p-value, (but we could use DESeq2 to instead of DESeq and DESeq2 is atumatically filtered as you said)But how about edgeR?


And how about my command in my post #115? Is is too naive?
super0925 is offline   Reply With Quote
Old 05-15-2014, 10:56 AM   #118
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

I wouldn't normally recommend the kOverA method for RNAseq, since it's usually more meaningful to use summed counts or average counts (I think DESeq2 uses the average counts).

Using genefilter with edgeR would work the same as in the aforementioned PDF. You perform your tests as normal and then put whatever filtering metric (summed counts, average counts, etc.) and the raw p-values into filtered_p() or filtered_R() and continue in a similar manner.
dpryan is offline   Reply With Quote
Old 05-19-2014, 03:58 AM   #119
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by dpryan View Post
DESeq (better would be DESeq2) would work fine as well.
Hi D,
I have tried what you said the multiple condition comparison in DESeq2.

The input count matrix is (3 conditions with 3 samples in per condition)

C1.R1 C1.R2 C1.R3 C2.R1 C2.R2 C2.R3 C3.R1 C3.R2 C3.R3
ENSBTAG00000000003 0 0 0 0 0 0 0 0 0
ENSBTAG00000000005 1 0 0 1 0 0 2 2 2
ENSBTAG00000000008 2 2 1 0 2 0 1 1 2

Here is my script:

library("DESeq2")
colData<- data.frame(condition=factor(c(rep("C1",3),rep("C2",3),rep("C3",3))))
dds<-DESeqDataSetFromMatrix(countData= output2,colData= colData,design=~condition)
dds$condition<-factor(dds$condition,levels=c("C1","C2","C3"))
dds<-DESeq(dds)
res<-results(dds)
resOrdered<-res[order(res$padj),]
head(resOrdered)
write.csv(as.data.frame(resOrdered),file="C1_C2_C3_results.csv")

But the output is only one table with the significant DE genes list ranked by P-value (is it C1vsC2vsC3?), but how do I know the ranking of C1vs C2,C2vsC3 and C1vsC3 which is what I want ?
Sorry for this naive question

Last edited by super0925; 05-19-2014 at 04:02 AM.
super0925 is offline   Reply With Quote
Old 05-19-2014, 04:03 AM   #120
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Read help(results)
dpryan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:44 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO