PDA

View Full Version : RNAseq data analysis


chknbio
05-24-2012, 07:45 AM
I recently started some RNAsequencing of tumor samples. My main question is regarding the differential gene expression among samples and controls. I have successful utilized tophat, cufflinks, and cuffdiff. I am new to bioinformatics and RNAseq analysis. My experience utilizing Linux terminals for analysis is only recent. I have seen a few tutorials online for RNAseq, but some of them only take you through the cufflinks data retrieval. I am looking for what next....the steps and analysis after cuffdiff. Are there programs that exist or tutorials that exist for running a script in python or R for analysis of the data? How do others go about analyzing RNAseq data for differential expression among samples?

Any advice on the downstream analysis of RNAseq data after the tuxedo suite would be greatly appreciated.

Thanks!

ETHANol
05-24-2012, 08:29 AM
cumbRbund is the next step in the tuxedo suite. After that it all depends on what you want to do.

usad
05-24-2012, 08:30 AM
Hi

whilst you get much further with the command line, you can always try out our tool RobiNA http://mapman.gabipd.org/web/guest/robin which will give you a graphical user interface and run all necessary steps in the background to call differential expression. It also comes with a long manual.

Apart from that you might check out the tutorials in the seqanswers for running your data through.

Cheers,
björn

chknbio
05-24-2012, 08:37 AM
cumbRbund is the next step in the tuxedo suite. After that it all depends on what you want to do.

yes, I have used cummeRbund to plot some of the graphs. Which look nice, but I am more curious in pulling out differential expressed genes. I would like to look at the data globally vs. picking and choosing genes. From my use of cummeRbund, I was able to obtain plots which show differences and similarities between samples however, I wasn't able to see where those differences were located (i.e., specific genes or novel transcripts) - just dots on a graph. Maybe I am missing the feature that allows one to export the transcripts or genes that are upregulated or downregulated x-fold from the control or other samples.

chknbio
05-24-2012, 08:50 AM
Hi

whilst you get much further with the command line, you can always try out our tool RobiNA http://mapman.gabipd.org/web/guest/robin which will give you a graphical user interface and run all necessary steps in the background to call differential expression. It also comes with a long manual.

Apart from that you might check out the tutorials in the seqanswers for running your data through.

Cheers,
björn

Hi Bjorn,

Thank you for the reply. And for the link on this tool. I already have the cuffdiff output and I would use command line but I am wondering where to begin. How do I retrieve all of the differential regulated genes or transcripts?

-Tom

chknbio
05-24-2012, 09:56 AM
This is a very good tutorial here, but it still leaves some questions unanswered.
http://vallandingham.me/RNA_seq_differential_expression.html

How do you determine what cuffdiff lists as noteworthy genes for upregulation or downregulated? I am only seeing the FPKM for each gene and the lo and high confidence. How can I see the genes that are predicted to be noteworthy as mentioned in this tutorial?

This author took the log2 of all FPKM values? Isn't the FPKM a value of within samples expression compared to overall transcript numbers?

I would also like to know what scripts or programs did he run in R?

Any guidance or other tutorials that will lead me forward after obtaining cuffdiff file would be helpful. I have four samples and run cuffdiff with tophat bam files for each and the chicken annotation gtf.

Thanks!

ETHANol
05-24-2012, 10:40 AM
There should be a file gene_exp.diff or something like that which has the results of the statistical test. Sort by qvalue and chose a cutoff you like. Is that what you are asking?

You can then take your significantly expressed gene list and feed that into goseq if you want to do gene ontology analysis.

chknbio
05-24-2012, 11:20 AM
There should be a file gene_exp.diff or something like that which has the results of the statistical test. Sort by qvalue and chose a cutoff you like. Is that what you are asking?

You can then take your significantly expressed gene list and feed that into goseq if you want to do gene ontology analysis.

This is helpful. Thank you!

EGrassi
05-24-2012, 11:44 PM
The tuxedo procotol article has a section on what you want: http://www.nature.com/nprot/journal/v7/n3/full/nprot.2012.016.html#/procedure check steps 16-18, there is some simple R code that you could follow and expand.