SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
0 value in RPKM (RNA-seq) cuekt RNA Sequencing 0 02-14-2012 10:54 AM
R: heatmap color palette akolman Bioinformatics 5 06-16-2011 11:51 AM
ncRNA heatmap with next gen sequencing NicoBxl Bioinformatics 2 11-23-2010 01:12 AM
'heatmap' tool Livi81 General 0 07-14-2010 03:07 PM
RNA SEQ Bias toward short transcripts (RPKM) vruotti Bioinformatics 4 01-06-2010 10:09 AM

Reply
 
Thread Tools
Old 08-03-2010, 07:26 PM   #1
sdwy2008
Member
 
Location: IL

Join Date: May 2010
Posts: 10
Default RNA-seq, RPKM and heatmap???

I calculated the RPKM based on my RNA-seq data. I am trying to cluster the data and explain the gene expression through a time series (along which my samples are taken).

Could anybody recommend some good method to do so?

I am thinking to log-transform the RPKM data, and then make a heatmap graphs like what we usually do for microarray data. What do you guys think about this?

Thanks
sdwy2008 is offline   Reply With Quote
Old 08-23-2010, 07:24 AM   #2
epigen
Senior Member
 
Location: Germany

Join Date: May 2010
Posts: 101
Default

I have a similar task and would be interested in a professional answer. Naively I'll try with HTSeq and DESeq on simple read count data and compare my samples pairwise.
epigen is offline   Reply With Quote
Old 08-23-2010, 08:08 AM   #3
severin
Genome Informatics Facility
 
Location: Iowa @isugif

Join Date: Sep 2009
Posts: 105
Default Boxplot-dendrogram

We ran into similar problems when looking at this kind of data. The resulting dendrograms for the large sets of gene lists that come out of the next generation sequencing data can be difficult to visualize. We used both a heatmap approach and a combination of a dendrogram with boxplots over a time series in the paper we just published (RNA-Seq atlas of Glycine max -- http://seqanswers.com/forums/showthread.php?t=6321).
severin is offline   Reply With Quote
Old 08-24-2010, 06:49 PM   #4
quix
Junior Member
 
Location: New York

Join Date: Aug 2010
Posts: 6
Default

I have exactly the same question. Can anybody give some idea?
quix is offline   Reply With Quote
Old 08-24-2010, 09:58 PM   #5
schmima
Member
 
Location: Zürich

Join Date: Apr 2010
Posts: 56
Default

some none professional answers

Quote:
The resulting dendrograms for the large sets of gene lists that come out of the next generation sequencing data can be difficult to visualize.
Definitely - but this applies to all large data sets. Drawing a heatmap and dendrogram with 20'000 genes in 20 samples will never look very nice - and I personally think it is also not giving you a lot of information what is going on biologically. So either one takes a subset (as severin in the paper) or one groups the genes in a senseful way prior to plotting (eg GO terms / gene families / PFAM domains etc). Depending on the experiment there may be also some groups that are anyway not in the focus and can be left out.

So - in my opinion - I would first think on what I would like to show... So if I have a timecourse where I'm interested in what makes the difference I would first search for genes / gene sets (grouped together in a senseful way - eg function) that show the major difference between the samples and only plot these. This should reduce the amount of data plotted, in case of groups it links naked gene names to a term that one understands (e.g. 'ABC transporters' tells me personally more than 'ATXGXXXXX' or a '.' in a picture).

However - this requires some timecourse analysis... What is not the most unproblematic thing (eg due to between timepoints correlation). And it is also the question what is tested/what would you like to know... I guess there may be some helpful literature related to timecourses and ANOVA (not that you need to use ANOVA - but I think it is a good option to get some general principles and problems of timecourse studies).
schmima is offline   Reply With Quote
Old 08-25-2010, 06:28 AM   #6
severin
Genome Informatics Facility
 
Location: Iowa @isugif

Join Date: Sep 2009
Posts: 105
Default groupings

Quote:
Originally Posted by schmima View Post
group the genes in a senseful way prior to plotting (eg GO terms / gene families / PFAM domains etc).
I am in agreement with schmima here. One of the easiest ways to group genes is to look into the following groups: highest expressed (rowsum across the time points), time point specific expression, expressed in one time point significantly higher than all other time points (this is what we did for seed over all other tissues in the paper I mentioned before).

Genes that show no expression in any time point can be removed from the analysis and reduce your gene list sometimes substantially.

I have also seen analysis that group expression into groups in a K-means manner to try to identify the major themes in the expression.

Like with most data I strongly recommend just playing with the data and seeing what jumps out at you then follow up on it. Look closely at the subgroups I mentioned above and also transcription factors and tissue related gene families in the time series.

You can also look at change in expression rather than expression values. how does the expression change between point 1 and 2 or point 2 and 3 or 1 and 3 etc.
severin is offline   Reply With Quote
Old 10-27-2010, 12:00 PM   #7
Sol
Member
 
Location: Brazil

Join Date: Oct 2010
Posts: 13
Default

I need to take in the graph generated in MA-plot DEGseq, the differentially expressed genes. has some software that does this? or script?
thanks
Sol is offline   Reply With Quote
Old 10-27-2010, 12:57 PM   #8
severin
Genome Informatics Facility
 
Location: Iowa @isugif

Join Date: Sep 2009
Posts: 105
Default graphs and figures

Any command in R that produces a figure can typically be wrapped to produce a pdf or tiff or jpeg output rather than output to an R graph. Look into the R help on each output type for more information.

Here is a really simple pdf wrapper function

makepdf<-function(x,filename,w,h){
pdf(file=filename,width=w,height=h)
x
dev.off()
}

An example of how to use it.


makepdf(plot(1:10),"plot.pdf",5,5)
severin is offline   Reply With Quote
Old 11-15-2010, 05:31 AM   #9
Sol
Member
 
Location: Brazil

Join Date: Oct 2010
Posts: 13
Default

Good morning.I need to normalize the data leaving the software analysis of SOLiD, Bioscope?
I need to normalize?
thanks
Sol is offline   Reply With Quote
Old 11-30-2010, 07:14 AM   #10
sdvie
Member
 
Location: Spain

Join Date: Jul 2010
Posts: 68
Default time courses and heat maps

From my previous experience with time course experiments (however, this was in the proteomics field), I recommend the following:
- Decide first which is your time point of reference. This has to be clear already when you design the experimental protocol.
- Use the data of this timepoint as "background"/ zero / reference (whatever you would like to call it) and then calculate the ratio of all the other time points with respect to this one.
- Once you have fold-chance or log ratio values by gene per time point, you can visualize the values in a heatmap (I did this once with RPKMs using Gitools @ http://www.gitools.org)
sdvie is offline   Reply With Quote
Old 07-14-2011, 10:41 PM   #11
dphansti
Member
 
Location: Bay Area

Join Date: May 2011
Posts: 28
Default

I would recommend clustering the time-course expression profiles of each gene using fuzzy c means clustering. I am pretty sure this can be done in R fairly easily. Then you can look for enrichment of specific pathways or GO terms in each cluster. And maybe you can see what genes are regulated early, middle, and late. Perhaps middle or late genes are regulated by a transcription factor that you see increased in the early group. Just an idea.

But i would definitely look into the fuzzy c means clustering. Look at figure 7 in this paper for the type of output you can expect from it.

Rigbolt KT, Prokhorova TA, Akimov V, Henningsen J, Johansen PT, Kratchmarova
I, Kassem M, Mann M, Olsen JV, Blagoev B. System-wide temporal characterization
of the proteome and phosphoproteome of human embryonic stem cell differentiation.
Sci Signal. 2011 Mar 15;4(164):rs3. PubMed PMID: 21406692.
__________________
Doug
www.sharedproteomics.com
dphansti is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:05 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO