SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
RNA-Seq: GFOLD: a generalized fold change for ranking differentially expressed genes Newsbot! Literature Watch 8 03-07-2016 12:17 PM
Cut-off values for fold change in RNA seq smalan RNA Sequencing 1 07-12-2012 11:10 AM
infinite fold change RNAseq biofreak Bioinformatics 6 07-02-2012 11:02 AM
DESeq DiffExpress Vs. Fold Change KellerMac Bioinformatics 3 06-10-2011 11:49 AM
fold change value-cuffdiff madsaan Bioinformatics 4 02-10-2011 07:51 AM

Reply
 
Thread Tools
Old 11-05-2012, 11:31 AM   #1
devking
Member
 
Location: Palo Alto, California

Join Date: Nov 2012
Posts: 11
Default A fold change heatmap for RNA seq analysis using CuffDiff and cummerbund

Hi friends,

I am an undergrad and was asked by a post doc to help analyze an Illumina Hi-seq single end RNA seq dataset. Needless to say I am a complete novice to this type of analysis and a little bit overwhelmed. We are working with S. cerevisiae; one wildtype condition and five test conditions (two biological replicates each, so 12 total FASTQ files). We are not interested in novel gene/splice variants and simply want to explore gene expression profiles of known genes. So I first aligned the reads to an annotated reference genome with TopHat then used CuffDiff using a reference transcriptome along with the BAM files from TopHat.

My cuffdiff input looked like:

$ cuffdiff -p 8 -o cuffdiff_out -b genome.fa -u genes.gtf -L WT,ies2,arp8,ino80,ies6,arp5 ./BY4741_1_accepted_hits.bam,./BY4741_2_accepted_hits.bam
./ies2_1_accepted_hits.bam,./ies2_2_accepted_hits.bam
./arp8_1_accepted_hits.bam,./arp8_2_accepted_hits.bam
./ino80_1_accepted_hits.bam,./ino80_2_accepted_hits.bam
./ies6_1_accepted_hits.bam,./ies6_2_accepted_hits.bam
./arp5_1_accepted_hits.bam,./arp5_2_accepted_hits.bam

My question is: How do I generate a heatmap showing the fold change in gene expression of each of the 5 mutants compared to WT using cummeRbund?

With cummerBund I was able to generate a heatmap with
> csHeatmap(myGenes, cluster="both") where 'myGenes' was a list of the top 50 differentially expressed genes. I have attached the figure.

However, I would rather have a heatmap where there is no WT (BY4741) column, and instead just shows fold change of the five mutant samples relative to WT. Is there a way to do this? If so, is there a way to specify the color scheme so that down-regulated genes are clearly one color (like blue) and up-regulated genes are clearly another color (like red), and genes that have no change are just black?

Again, I'm new to this kind of analysis, and completely new to R, but cummeRbund seems like the easiest/most powerful way to do this. Advice?
Attached Images
File Type: png top50DEHeatmap.png (126.8 KB, 343 views)

Last edited by devking; 11-08-2012 at 10:17 AM.
devking is offline   Reply With Quote
Old 11-15-2012, 02:30 AM   #2
jordiet
Junior Member
 
Location: Spain

Join Date: Jun 2012
Posts: 2
Default

Hi devking,

sorry I can't help you with your CuffDiff issue. Have you used BY4741 as a reference genome for your alignments? I am working with the same strain and I would like to use it as a reference genome to align my evolved strains to this one. Do you know where can I find an assembled genome of BY4741 to use as a reference?
jordiet is offline   Reply With Quote
Old 11-15-2012, 03:20 AM   #3
tir_al
Member
 
Location: Croatia

Join Date: Sep 2010
Posts: 22
Default

Dear Devkin,

The heatmap you posted was done in R using the ggplot2 package.

As you are already using the tuxedo pipeline, you could take a look at the cummeRbund

which should enable you to get easy data integration.

Best
tir_al is offline   Reply With Quote
Old 11-15-2012, 05:36 AM   #4
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 263
Default

Did you try DESeq ? There is an interesting clustering section in the DESeq vignette on bioconductor.

It's pretty simple to use :

1. align
2. Extract read count with htseq-count
3. Use DESeq to perform DE analysis and plot some results

Check : http://bioconductor.org/packages/rel...tml/DESeq.html
NicoBxl is offline   Reply With Quote
Old 11-15-2012, 11:22 AM   #5
devking
Member
 
Location: Palo Alto, California

Join Date: Nov 2012
Posts: 11
Default

Quote:
Originally Posted by jordiet View Post
Hi devking,

sorry I can't help you with your CuffDiff issue. Have you used BY4741 as a reference genome for your alignments? I am working with the same strain and I would like to use it as a reference genome to align my evolved strains to this one. Do you know where can I find an assembled genome of BY4741 to use as a reference?
Hi Jordiet

For my analysis I was only interested in quantifying expression according to a known and annotated genome so I used the Ensembl release 69 EF4 genome.

I figured since the transcripts in yeast are so well-annotated already it seemed like more trouble than it was worth to assemble a new transcriptome using cufflinks/cuffmerge.

I might be misunderstanding your question, but it seems like you're more interested in doing a full analysis as outlined in the nat protocol paper "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks" in which you'll have your own assembled transcriptome annotation to quantify.

check the paper: http://www.nature.com/nprot/journal/....2012.016.html

Best,
Devin
devking is offline   Reply With Quote
Old 11-15-2012, 11:36 AM   #6
devking
Member
 
Location: Palo Alto, California

Join Date: Nov 2012
Posts: 11
Default

Quote:
Originally Posted by tir_al View Post
As you are already using the tuxedo pipeline, you could take a look at the cummeRbund
Hi Tir,

Thanks for your response! I have been using cummeRbund and it has been very easy to use so far to make heatmaps of gene expression. But I didn't immediately see a way to look at *fold-change* expression between two conditions. (I've never used something like R before so things that should be simple usually aren't for me ). However, on the advice of one of the cummeRbund authors, Loyal, I was able to make log fold-change heat maps by extracting raw FPKM values from fpkmMatrix(), do the log2FC transformation in R, then use heatmap.2() from gplots to create something like the attached file.
Attached Images
File Type: png 165OverlappingGenesCLUSTER.png (39.4 KB, 247 views)
devking is offline   Reply With Quote
Old 11-15-2012, 11:43 AM   #7
devking
Member
 
Location: Palo Alto, California

Join Date: Nov 2012
Posts: 11
Default

Quote:
Originally Posted by NicoBxl View Post
Did you try DESeq ? There is an interesting clustering section in the DESeq vignette on bioconductor.
Hi Nico,

Thanks for the tip. It seems like all of the differential gene expression statistical methods (e.g. cuffdiff, DESeq, edgeR, NOIseq etc) return slightly different sets of significantly differentially expressed genes. E.g. check out:
http://nar.oxfordjournals.org/conten...ar.gks804.full

So I was actually planning on using DESeq next and comparing the results with my cuffdiff data. I was thinking of just focusing on the genes that tested significant for DE from both methods. Not sure if this is reasonable or worth the time...

Best,
Devin
devking is offline   Reply With Quote
Old 06-17-2013, 11:39 AM   #8
emolinari
Member
 
Location: New haven

Join Date: May 2013
Posts: 47
Default

Quote:
Originally Posted by devking View Post
Hi Tir,

Thanks for your response! I have been using cummeRbund and it has been very easy to use so far to make heatmaps of gene expression. But I didn't immediately see a way to look at *fold-change* expression between two conditions. (I've never used something like R before so things that should be simple usually aren't for me ). However, on the advice of one of the cummeRbund authors, Loyal, I was able to make log fold-change heat maps by extracting raw FPKM values from fpkmMatrix(), do the log2FC transformation in R, then use heatmap.2() from gplots to create something like the attached file.
Hi Devking,

I was looking to do the same plot as you did...is it done with DESeq?
did you use a customized R script?
Thanks!!!
Manu
emolinari is offline   Reply With Quote
Old 06-17-2013, 01:13 PM   #9
devking
Member
 
Location: Palo Alto, California

Join Date: Nov 2012
Posts: 11
Default

I quantified expression using the 'Tuxedo' protocol (http://www.nature.com/nprot/journal/....2012.016.html) and used a custom R script to generate the heatmap. Using cummeRbund, you can get a matrix of FPKM expression values, and then use heatmap.2() in the R 'gplots' package for the heatmap call. You could also import your expression matrix into Java TreeView which also plots nice heatmaps. Hope this helps!
devking is offline   Reply With Quote
Old 06-17-2013, 01:20 PM   #10
emolinari
Member
 
Location: New haven

Join Date: May 2013
Posts: 47
Default

Quote:
Originally Posted by devking View Post
I quantified expression using the 'Tuxedo' protocol (http://www.nature.com/nprot/journal/....2012.016.html) and used a custom R script to generate the heatmap. Using cummeRbund, you can get a matrix of FPKM expression values, and then use heatmap.2() in the R 'gplots' package for the heatmap call. You could also import your expression matrix into Java TreeView which also plots nice heatmaps. Hope this helps!
Thanks for the answer...I actually have followed the same path, but I have just been able to produce a differential expression heatmap.

I'll see what I can do for up and down regulation...unfortunately java is too hardcore informatics for me!!!
Thanks
Manu
emolinari is offline   Reply With Quote
Old 06-17-2013, 01:39 PM   #11
devking
Member
 
Location: Palo Alto, California

Join Date: Nov 2012
Posts: 11
Default

Maybe this can help you get started:


Code:
library(cummeRbund); library(gplots)
cuff <- readCufflinks("cuffdiff_out",rebuild=T)
db <- fpkmMatrix(genes(cuff))
sigGenes <- getSig(cuff,'BY4741','ino80',alpha=0.05,level='genes')
db <- db[sigGenes,]	


WT <- "BY4741" # Set the column name of the WT sample
mut <- c("ino80","arp5","ies6") # Set column names of test conditions you want in the analysis

logFC <- function(db,mutants,WT,logBase=2,pseudo=1) {
		if (length(WT) !=1 ) {
			stop('WT must refer to a single gene/column')
			}
		if (is.numeric(logBase)==FALSE) {
			stop('logBase must be a numeric value')
			}
		
		db <- (db[,mutants]+pseudo)/(db[,WT]+pseudo)
		db <- log(db,logBase)
		
}

db <- logFC(db,mutants=mut,WT=WT,logBase=2,pseudo=1) # This does the log transformation
db <- as.matrix(db)

heatmap.2(
		db,
		Rowv=TRUE,
		Colv=FALSE,
		dendrogram="row",
		trace="none",
		labRow="",
		density.info=c("none"),
		main="Log(Fold-Change) \nExpression Profiles"
	)
devking is offline   Reply With Quote
Old 06-18-2013, 09:11 AM   #12
emolinari
Member
 
Location: New haven

Join Date: May 2013
Posts: 47
Default

Thanks Devking!
This really helps me out!!!
emolinari is offline   Reply With Quote
Old 07-22-2013, 04:40 AM   #13
jp.
Senior Member
 
Location: NikoNarita.jp

Join Date: Jul 2013
Posts: 142
Question

Hi
Can you please give me your commands to make csHeatmap using cummeRbund ?
I tried making the using cummeRbund and succeeded, however, can not add dandogram and change color which you have done it.
May you please write back to me in little detail because I am a beginer.
Thank you in advance.
Jp.


Quote:
Originally Posted by devking View Post
Hi Tir,

Thanks for your response! I have been using cummeRbund and it has been very easy to use so far to make heatmaps of gene expression. But I didn't immediately see a way to look at *fold-change* expression between two conditions. (I've never used something like R before so things that should be simple usually aren't for me ). However, on the advice of one of the cummeRbund authors, Loyal, I was able to make log fold-change heat maps by extracting raw FPKM values from fpkmMatrix(), do the log2FC transformation in R, then use heatmap.2() from gplots to create something like the attached file.
jp. is offline   Reply With Quote
Old 10-30-2013, 11:49 PM   #14
jp.
Senior Member
 
Location: NikoNarita.jp

Join Date: Jul 2013
Posts: 142
Question

Hi
Would you post your commands to make dendrogram in csHeatmap ?
I just can not add dendrogram using csHeatmap.
You used heatmap.2(.... can read cuffdiff out and make heatmap with dendrogram ?
Thank you

Quote:
Originally Posted by devking View Post
Hi Tir,

Thanks for your response! I have been using cummeRbund and it has been very easy to use so far to make heatmaps of gene expression. But I didn't immediately see a way to look at *fold-change* expression between two conditions. (I've never used something like R before so things that should be simple usually aren't for me ). However, on the advice of one of the cummeRbund authors, Loyal, I was able to make log fold-change heat maps by extracting raw FPKM values from fpkmMatrix(), do the log2FC transformation in R, then use heatmap.2() from gplots to create something like the attached file.
jp. is offline   Reply With Quote
Old 11-07-2013, 06:22 AM   #15
hubery_Bio
Junior Member
 
Location: china

Join Date: Oct 2013
Posts: 5
Default

maybe the program cluster can help u. you just need to change the format of the cuffdiff result.
hubery_Bio is offline   Reply With Quote
Old 12-23-2014, 10:30 AM   #16
gareth.lim
Junior Member
 
Location: vancouver

Join Date: Feb 2014
Posts: 3
Default

My experiment is complicated (at least in my mind =) ). I have three time points (0,24,48 hrs) and two conditions (siControl and siTarget) for a total of 6 groups. I'm trying to see how knockdown of my target affects the transcriptome at each time point.

This information is the closest I can find online that describes how to make a log2-fold change heat map and a dendrogram.... but I'm having some troubles. Can anyone please help?

A few questions so far:

1. sigGenes <- getSig(cuff,'BY4741','ino80',alpha=0.05,level='genes')
-Why were only two of your groups selected here and not the whole data set? Does this impact the final result?

2. db <- logFC(db,mutants=mut,WT=WT,logBase=2,pseudo=1)
-when i enter this, I get an error message: Error in db[, mutants] : incorrect number of dimensions
gareth.lim is offline   Reply With Quote
Old 12-23-2014, 10:34 AM   #17
gareth.lim
Junior Member
 
Location: vancouver

Join Date: Feb 2014
Posts: 3
Default Sorry clicked too fast...

Sorry I should add that so far I have done this:
>WT<-"C1"
> mut<-c("C2","C3","C4","C5","C6")

Quote:
Originally Posted by gareth.lim View Post
My experiment is complicated (at least in my mind =) ). I have three time points (0,24,48 hrs) and two conditions (siControl and siTarget) for a total of 6 groups. I'm trying to see how knockdown of my target affects the transcriptome at each time point.

This information is the closest I can find online that describes how to make a log2-fold change heat map and a dendrogram.... but I'm having some troubles. Can anyone please help?

A few questions so far:

1. sigGenes <- getSig(cuff,'BY4741','ino80',alpha=0.05,level='genes')
-Why were only two of your groups selected here and not the whole data set? Does this impact the final result?

2. db <- logFC(db,mutants=mut,WT=WT,logBase=2,pseudo=1)
-when i enter this, I get an error message: Error in db[, mutants] : incorrect number of dimensions
gareth.lim is offline   Reply With Quote
Reply

Tags
cuffdiff, cummerbund, fold change, heatmap

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:40 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO