Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A fold change heatmap for RNA seq analysis using CuffDiff and cummerbund

    Hi friends,

    I am an undergrad and was asked by a post doc to help analyze an Illumina Hi-seq single end RNA seq dataset. Needless to say I am a complete novice to this type of analysis and a little bit overwhelmed. We are working with S. cerevisiae; one wildtype condition and five test conditions (two biological replicates each, so 12 total FASTQ files). We are not interested in novel gene/splice variants and simply want to explore gene expression profiles of known genes. So I first aligned the reads to an annotated reference genome with TopHat then used CuffDiff using a reference transcriptome along with the BAM files from TopHat.

    My cuffdiff input looked like:

    $ cuffdiff -p 8 -o cuffdiff_out -b genome.fa -u genes.gtf -L WT,ies2,arp8,ino80,ies6,arp5 ./BY4741_1_accepted_hits.bam,./BY4741_2_accepted_hits.bam
    ./ies2_1_accepted_hits.bam,./ies2_2_accepted_hits.bam
    ./arp8_1_accepted_hits.bam,./arp8_2_accepted_hits.bam
    ./ino80_1_accepted_hits.bam,./ino80_2_accepted_hits.bam
    ./ies6_1_accepted_hits.bam,./ies6_2_accepted_hits.bam
    ./arp5_1_accepted_hits.bam,./arp5_2_accepted_hits.bam

    My question is: How do I generate a heatmap showing the fold change in gene expression of each of the 5 mutants compared to WT using cummeRbund?

    With cummerBund I was able to generate a heatmap with
    > csHeatmap(myGenes, cluster="both") where 'myGenes' was a list of the top 50 differentially expressed genes. I have attached the figure.

    However, I would rather have a heatmap where there is no WT (BY4741) column, and instead just shows fold change of the five mutant samples relative to WT. Is there a way to do this? If so, is there a way to specify the color scheme so that down-regulated genes are clearly one color (like blue) and up-regulated genes are clearly another color (like red), and genes that have no change are just black?

    Again, I'm new to this kind of analysis, and completely new to R, but cummeRbund seems like the easiest/most powerful way to do this. Advice?
    Attached Files
    Last edited by devking; 11-08-2012, 10:17 AM.

  • #2
    Hi devking,

    sorry I can't help you with your CuffDiff issue. Have you used BY4741 as a reference genome for your alignments? I am working with the same strain and I would like to use it as a reference genome to align my evolved strains to this one. Do you know where can I find an assembled genome of BY4741 to use as a reference?

    Comment


    • #3
      Dear Devkin,

      The heatmap you posted was done in R using the ggplot2 package.

      As you are already using the tuxedo pipeline, you could take a look at the cummeRbund

      which should enable you to get easy data integration.

      Best

      Comment


      • #4
        Did you try DESeq ? There is an interesting clustering section in the DESeq vignette on bioconductor.

        It's pretty simple to use :

        1. align
        2. Extract read count with htseq-count
        3. Use DESeq to perform DE analysis and plot some results

        Check : http://bioconductor.org/packages/rel...tml/DESeq.html

        Comment


        • #5
          Originally posted by jordiet View Post
          Hi devking,

          sorry I can't help you with your CuffDiff issue. Have you used BY4741 as a reference genome for your alignments? I am working with the same strain and I would like to use it as a reference genome to align my evolved strains to this one. Do you know where can I find an assembled genome of BY4741 to use as a reference?
          Hi Jordiet

          For my analysis I was only interested in quantifying expression according to a known and annotated genome so I used the Ensembl release 69 EF4 genome.

          I figured since the transcripts in yeast are so well-annotated already it seemed like more trouble than it was worth to assemble a new transcriptome using cufflinks/cuffmerge.

          I might be misunderstanding your question, but it seems like you're more interested in doing a full analysis as outlined in the nat protocol paper "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks" in which you'll have your own assembled transcriptome annotation to quantify.

          check the paper: http://www.nature.com/nprot/journal/....2012.016.html

          Best,
          Devin

          Comment


          • #6
            Originally posted by tir_al View Post
            As you are already using the tuxedo pipeline, you could take a look at the cummeRbund
            Hi Tir,

            Thanks for your response! I have been using cummeRbund and it has been very easy to use so far to make heatmaps of gene expression. But I didn't immediately see a way to look at *fold-change* expression between two conditions. (I've never used something like R before so things that should be simple usually aren't for me ). However, on the advice of one of the cummeRbund authors, Loyal, I was able to make log fold-change heat maps by extracting raw FPKM values from fpkmMatrix(), do the log2FC transformation in R, then use heatmap.2() from gplots to create something like the attached file.
            Attached Files

            Comment


            • #7
              Originally posted by NicoBxl View Post
              Did you try DESeq ? There is an interesting clustering section in the DESeq vignette on bioconductor.
              Hi Nico,

              Thanks for the tip. It seems like all of the differential gene expression statistical methods (e.g. cuffdiff, DESeq, edgeR, NOIseq etc) return slightly different sets of significantly differentially expressed genes. E.g. check out:


              So I was actually planning on using DESeq next and comparing the results with my cuffdiff data. I was thinking of just focusing on the genes that tested significant for DE from both methods. Not sure if this is reasonable or worth the time...

              Best,
              Devin

              Comment


              • #8
                Originally posted by devking View Post
                Hi Tir,

                Thanks for your response! I have been using cummeRbund and it has been very easy to use so far to make heatmaps of gene expression. But I didn't immediately see a way to look at *fold-change* expression between two conditions. (I've never used something like R before so things that should be simple usually aren't for me ). However, on the advice of one of the cummeRbund authors, Loyal, I was able to make log fold-change heat maps by extracting raw FPKM values from fpkmMatrix(), do the log2FC transformation in R, then use heatmap.2() from gplots to create something like the attached file.
                Hi Devking,

                I was looking to do the same plot as you did...is it done with DESeq?
                did you use a customized R script?
                Thanks!!!
                Manu

                Comment


                • #9
                  I quantified expression using the 'Tuxedo' protocol (http://www.nature.com/nprot/journal/....2012.016.html) and used a custom R script to generate the heatmap. Using cummeRbund, you can get a matrix of FPKM expression values, and then use heatmap.2() in the R 'gplots' package for the heatmap call. You could also import your expression matrix into Java TreeView which also plots nice heatmaps. Hope this helps!

                  Comment


                  • #10
                    Originally posted by devking View Post
                    I quantified expression using the 'Tuxedo' protocol (http://www.nature.com/nprot/journal/....2012.016.html) and used a custom R script to generate the heatmap. Using cummeRbund, you can get a matrix of FPKM expression values, and then use heatmap.2() in the R 'gplots' package for the heatmap call. You could also import your expression matrix into Java TreeView which also plots nice heatmaps. Hope this helps!
                    Thanks for the answer...I actually have followed the same path, but I have just been able to produce a differential expression heatmap.

                    I'll see what I can do for up and down regulation...unfortunately java is too hardcore informatics for me!!!
                    Thanks
                    Manu

                    Comment


                    • #11
                      Maybe this can help you get started:


                      Code:
                      library(cummeRbund); library(gplots)
                      cuff <- readCufflinks("cuffdiff_out",rebuild=T)
                      db <- fpkmMatrix(genes(cuff))
                      sigGenes <- getSig(cuff,'BY4741','ino80',alpha=0.05,level='genes')
                      db <- db[sigGenes,]	
                      
                      
                      WT <- "BY4741" # Set the column name of the WT sample
                      mut <- c("ino80","arp5","ies6") # Set column names of test conditions you want in the analysis
                      
                      logFC <- function(db,mutants,WT,logBase=2,pseudo=1) {
                      		if (length(WT) !=1 ) {
                      			stop('WT must refer to a single gene/column')
                      			}
                      		if (is.numeric(logBase)==FALSE) {
                      			stop('logBase must be a numeric value')
                      			}
                      		
                      		db <- (db[,mutants]+pseudo)/(db[,WT]+pseudo)
                      		db <- log(db,logBase)
                      		
                      }
                      
                      db <- logFC(db,mutants=mut,WT=WT,logBase=2,pseudo=1) # This does the log transformation
                      db <- as.matrix(db)
                      
                      heatmap.2(
                      		db,
                      		Rowv=TRUE,
                      		Colv=FALSE,
                      		dendrogram="row",
                      		trace="none",
                      		labRow="",
                      		density.info=c("none"),
                      		main="Log(Fold-Change) \nExpression Profiles"
                      	)

                      Comment


                      • #12
                        Thanks Devking!
                        This really helps me out!!!

                        Comment


                        • #13
                          Hi
                          Can you please give me your commands to make csHeatmap using cummeRbund ?
                          I tried making the using cummeRbund and succeeded, however, can not add dandogram and change color which you have done it.
                          May you please write back to me in little detail because I am a beginer.
                          Thank you in advance.
                          Jp.


                          Originally posted by devking View Post
                          Hi Tir,

                          Thanks for your response! I have been using cummeRbund and it has been very easy to use so far to make heatmaps of gene expression. But I didn't immediately see a way to look at *fold-change* expression between two conditions. (I've never used something like R before so things that should be simple usually aren't for me ). However, on the advice of one of the cummeRbund authors, Loyal, I was able to make log fold-change heat maps by extracting raw FPKM values from fpkmMatrix(), do the log2FC transformation in R, then use heatmap.2() from gplots to create something like the attached file.

                          Comment


                          • #14
                            Hi
                            Would you post your commands to make dendrogram in csHeatmap ?
                            I just can not add dendrogram using csHeatmap.
                            You used heatmap.2(.... can read cuffdiff out and make heatmap with dendrogram ?
                            Thank you

                            Originally posted by devking View Post
                            Hi Tir,

                            Thanks for your response! I have been using cummeRbund and it has been very easy to use so far to make heatmaps of gene expression. But I didn't immediately see a way to look at *fold-change* expression between two conditions. (I've never used something like R before so things that should be simple usually aren't for me ). However, on the advice of one of the cummeRbund authors, Loyal, I was able to make log fold-change heat maps by extracting raw FPKM values from fpkmMatrix(), do the log2FC transformation in R, then use heatmap.2() from gplots to create something like the attached file.

                            Comment


                            • #15
                              maybe the program cluster can help u. you just need to change the format of the cuffdiff result.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              72 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              81 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X