SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
DEGseq calculation method umnklang Bioinformatics 1 10-09-2012 06:48 PM
Degseq Question Amative RNA Sequencing 2 01-16-2012 10:55 AM
refFlat File for DEGseq newbietonextgen Bioinformatics 1 12-30-2010 09:16 AM
DEGseq or EdgeR MerFer Bioinformatics 3 02-25-2010 01:48 AM
DEGseq or edgeR mmanrique Bioinformatics 10 02-12-2010 03:13 PM

Reply
 
Thread Tools
Old 10-08-2010, 01:06 PM   #101
luoruicd
Junior Member
 
Location: LA

Join Date: Oct 2010
Posts: 4
Default

Hi Xi Wang,
Thanks for your software. I am wondering what's the difference between DEseq and DEGseq?
luoruicd is offline   Reply With Quote
Old 10-11-2010, 06:30 PM   #102
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by luoruicd View Post
Hi Xi Wang,
Thanks for your software. I am wondering what's the difference between DEseq and DEGseq?
Please read the respective articles for the two tools.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 10-26-2010, 11:54 AM   #103
Sol
Member
 
Location: Brazil

Join Date: Oct 2010
Posts: 13
Default

Hi
I ran the program DEGexp of the DEGseq. The output file generated a table with values of log2 fold change, z-score, p-value, q-value and signature (p-value <0.001). How to interpret the gene upregulation and downregulation? what each column means? the input file was RPKM and genes. I have not replicate, but I'm comparing two conditions.
Thanks
Sol is offline   Reply With Quote
Old 10-26-2010, 07:37 PM   #104
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by Sol View Post
Hi
I ran the program DEGexp of the DEGseq. The output file generated a table with values of log2 fold change, z-score, p-value, q-value and signature (p-value <0.001). How to interpret the gene upregulation and downregulation? what each column means? the input file was RPKM and genes. I have not replicate, but I'm comparing two conditions.
Thanks
Hi Sol, Thanks for using DEGseq.

In the output file, there are 2 columns for fold-change: "log2(Fold_change)" and "log2(Fold_change) normalized". log2(Fold_change) = log(value1/value2), and the normalized value is got from the normalized value1 and value2. From the value of fold-change, you can judge this gene is up-regulated or down-regulated. For example, for a gene if its log2(Fold_change) > 0, which means value1 > value2, and if its signature = TRUE, this gene is significantly down-regulated in condition 2. Also, you can look into z-scores.

Hope this helps.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 10-27-2010, 09:14 AM   #105
Sol
Member
 
Location: Brazil

Join Date: Oct 2010
Posts: 13
Default

but, o z-score, is based on what data? and what is difference between q-value and p-value. I don't understand.
Thanks
Sol is offline   Reply With Quote
Old 10-27-2010, 09:29 AM   #106
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by Sol View Post
but, o z-score, is based on what data? and what is difference between q-value and p-value. I don't understand.
Thanks
Z-score is also based on your input data. We assume that most of genes are not differentially expressed. Please refer to our DEGseq paper's supplementary material: http://bioinformatics.oxfordjournals...28-File001.pdf
Search "Z-score" for details.

q-value is a kind of corrected p-value for multiple testing. Please refer to Section 2.3 of our DEGseq paper:
"2.3 Multiple testing correction
For the above methods, the P-values calculated for each gene are adjusted to Q-values for multiple testing corrections by two alternative strategies (Benjamini and Hochberg, 1995; Storey and Tibshirani, 2003). Users can set either a P-value or a false discovery rate (FDR) threshold to identify differentially expressed genes.
"

If it is still unclear, please let me know. Thanks.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 10-27-2010, 10:01 AM   #107
Sol
Member
 
Location: Brazil

Join Date: Oct 2010
Posts: 13
Default

Thanks

and the RPKM. how to normalize the data?
I divide the number of reads by the size of the gene and divide by all the reads?
How it is calculated
thanks
Sol is offline   Reply With Quote
Old 10-27-2010, 10:20 AM   #108
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by Sol View Post
Thanks

and the RPKM. how to normalize the data?
I divide the number of reads by the size of the gene and divide by all the reads?
How it is calculated
thanks
Actually, we recommand the users feed the raw read counts (that is the number of reads falling in a gene's exonic region) to DEGseq. DEGseq will normalize the data according the sequencing depth for each sample.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 10-27-2010, 10:31 AM   #109
Sol
Member
 
Location: Brazil

Join Date: Oct 2010
Posts: 13
Default

but, what is the mean RPKM?
thanks
Sol is offline   Reply With Quote
Old 10-27-2010, 10:39 AM   #110
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by Sol View Post
but, what is the mean RPKM?
thanks
The normalized read count for a region (say a gene or an exon), against the region length (measured by kilo-base) and the sequencing depth (measured by million reads). So RPKM is short for Reads Per Kilo-base per Million reads.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 10-27-2010, 12:09 PM   #111
Sol
Member
 
Location: Brazil

Join Date: Oct 2010
Posts: 13
Default

What is the function of the MA plot the fold change??
The graph shows the relationship in the MA-plot, no??
thanks
Sol is offline   Reply With Quote
Old 10-27-2010, 01:20 PM   #112
Sol
Member
 
Location: Brazil

Join Date: Oct 2010
Posts: 13
Default

the results of the DEGseq, already can be used directly for analysis or should i make some other standardization
thanks
Sol is offline   Reply With Quote
Old 10-27-2010, 10:21 PM   #113
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by Sol View Post
What is the function of the MA plot the fold change??
The graph shows the relationship in the MA-plot, no??
thanks
I am not very clear what you want to know by asking these questions.
For detailed questions, I prefer you sent me emails: wang-xi05@mails.tsinghua.edu.cn. I will give you more rapid replies regarding DEGseq. Thanks.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 10-27-2010, 10:29 PM   #114
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Quote:
Originally Posted by Sol View Post
the results of the DEGseq, already can be used directly for analysis or should i make some other standardization
thanks
The answer is yes. You can apply the results to function analysis, say GO enrichment analysis. Alternatively, you can refer to GOseq, which takes into account the gene length bias.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 11-03-2010, 12:44 PM   #115
Marisa_Miller
Member
 
Location: St. Paul, MN

Join Date: Aug 2010
Posts: 34
Default

Hello,
I am new to R, and would like to use your DEGseq package to identify differentially expressed genes between my libraries. In my case I have 2 libraries to compare. I have calculated the RPKM's using a program written by a member of my lab, an example of how the file looks is shown here:

Chr Gene Start End Gene_len Reads RPKM Log2(RPKM)
1 AT1G01010.1 3631 5899 1688 45 1.58899 0.668107
1 AT1G01020.1 5928 8737 1623 104.73 3.84621 1.94344
1 AT1G01020.2 6790 8737 1085 72.2697 3.97015 1.98919
1 AT1G01030.1 11649 13714 1905 78 2.44051 1.28718
1 AT1G01040.1 23146 31227 6251 1159 11.0513 3.46615
1 AT1G01046.1 28500 28706 207 4 1.15178 0.203866
1 AT1G01050.1 31170 33153 976 2186 133.5 7.06069

I am unsure which part of your package to use (I think DEGexp?) to analyze the data. Also, if you could help me with what method to use (i.e. LRT, MATR, etc...).

I have read through the examples on the usage of the package, but am still unsure.

Thank you in advance for your help
Marisa_Miller is offline   Reply With Quote
Old 11-03-2010, 09:59 PM   #116
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Hello Marisa,

Thanks for using DEGseq. You may use the function DEGexp to detect differentially expressed genes, and the read counts (the 6th column of your file) are recommanded to feed to DEGexp. Details can be found in our Bioinformatics paper. There are slight difference between the method LRT, FET and MARS, of which the MARS method was proposed by us based on the M-A plot using a normal distribution approximation.

hope this helps.
__________________
Xi Wang
Xi Wang is offline   Reply With Quote
Old 11-08-2010, 08:00 AM   #117
Marisa_Miller
Member
 
Location: St. Paul, MN

Join Date: Aug 2010
Posts: 34
Default

Hi and thanks for the reply! I have decided to use the MARS method after reading through your paper. The one thing I an confused about is the actual usage of R. When reading through your manual I am still confused about the actual commands used to run DEGexp. Below is the usage you have listed in the manual for DEGexp2 (which I think I need to use since I have two different input files).

geneExpFile <- system.file("extdata", "GeneExpExample5000.txt", package="DEGseq")
outputDir <- file.path(tempdir(), "DEGexpExample")
exp <- readGeneExp(file=geneExpFile, geneCol=1, valCol=c(7,9,12,15,18))
exp[30:35,]
exp <- readGeneExp(file=geneExpFile, geneCol=1, valCol=c(8,10,11,13,16))
exp[30:35,]
DEGexp2(geneExpFile1=geneExpFile, geneCol1=1, expCol1=c(7,9,12,15,18), groupLabel1="kidney", geneExpFile2=geneExpFile, geneCol2=1, expCol2=c(8,10,11,13,16), groupLabel2="liver", method="MARS", outputDir=outputDir)
cat("outputDir:", outputDir, "\n")

Questions:
1) Is each command entered on a separate line? It is unclear where the line breaks are.
2)I am unsure which parts of the example usage listed above I need to change for my specific case. Obviously I need to specify the correct file paths and names etc...
3)Since I am using two separate files where do I specify this in the commands above? I can't tell from the example commands where to do this.
4) Where can I enter a q-value threshold?
5)I tried to highlight in bold parts of the example I do not understand the meaning of or could not find an explanation of in the manual.

The example of my input file is in the previous post.
Basically, my problem is with the usage of R. If you could help me by indicating how I can apply the example usage to my files that would be great.

Thank you,
Marisa
Marisa_Miller is offline   Reply With Quote
Old 11-08-2010, 11:21 PM   #118
Xi Wang
Senior Member
 
Location: MDC, Berlin, Germany

Join Date: Oct 2009
Posts: 317
Default

Hi Marisa,

Thanks for your questions.

1) please refer to this site: http://www.bioconductor.org/packages...t/doc/DEGseq.R
"exp[30:35,]" is just used for display the values of the matrix "exp" in lines 30-35

2) yes, you need to specify your files, and the column for gene names (geneCol=?), the columns for gene expression values (valCol=??), etc..

3) Please pay attention to the parts in bold
Code:
DEGexp2(geneExpFile1="your_gene_exp_file_1", geneCol1=1, expCol1=c(7,9,12,15,18), groupLabel1="kidney", geneExpFile2="your_gene_exp_file_2",  geneCol2=1, expCol2=c(8,10,11,13,16), groupLabel2="liver", method="MARS", outputDir=outputDir)
4) do it like this:
Code:
DEGexp2(geneExpFile1="your_gene_exp_file_1", geneCol1=1, expCol1=c(7,9,12,15,18),geneExpFile2="your_gene_exp_file_2",  geneCol2=1, expCol2=c(8,10,11,13,16), thresholdKind=3, qValue=1e-3, method="MARS", outputDir=outputDir)
5) ignore the bold words in the first two lines, they are just for this example.
valCol stands for which column contains the (expression) value you want to analyze. For you case, you may set "valCol=6".
__________________
Xi Wang

Last edited by Xi Wang; 11-08-2010 at 11:23 PM.
Xi Wang is offline   Reply With Quote
Old 11-09-2010, 08:36 AM   #119
Marisa_Miller
Member
 
Location: St. Paul, MN

Join Date: Aug 2010
Posts: 34
Default

Thank you so much! That solved my problems!
Marisa_Miller is offline   Reply With Quote
Old 11-09-2010, 02:00 PM   #120
anamaretti
Junior Member
 
Location: Los Angeles

Join Date: Sep 2010
Posts: 2
Default

Hello

I'm using samWrapper to do some statistical analysis in my samples.
I have 2 groups, each one with 5 biological replicates.
However, I'm having some weird results.
Even if all the samples don't show any read to some genes, some times these genes are included in the list of genes with difference in gene expression (Signature =TRUE).

The parameters that I used are:
Value are in RPKM
nperm= 1000
min.fold-change=2
max.qValue=1e-04
paired=FALSE

Should I include some restriction term to avoid that??

By the way, the seed standard value is 100???
Is there some benefit if I modify it?

Thanks for the help and for this program, it is great!
anamaretti is offline   Reply With Quote
Reply

Tags
degseq, rna-seq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:55 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2022, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO