Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Within Sample Correlation of 2 genes - Rna-seq

    Hi Guys,

    I am doing some differential expression analysis of rna seq data using deseq2 . I have 12 different samples and i am using the raw count data and then inputting the matrix in deseq2.

    my question is that if i wanted to compare a correlation of Gene A and Gene B within samples (not between samples - as they are co-expressed): do I do this on the raw counts or normalized counts.

    so I have 12 values for Gene A across 12 samples
    and 12 values for Gene B across 12 samples

    doing a raw count correlation gives me around rho 0.8 something
    however normalizing using the method in DESeq2 will scale each sample differently by size factors and the rho goes down to 0.5

    anyway i am not sure how i should be doing the correlation (raw or normalized), if normalized then which method is preferable for within sample comparisons for 2 different genes.

    Thank you for taking the time to read this and hope someone can give me some advice.

  • #2
    bump!!

    just bumping the post up - as i posted it very late in the evening.

    Comment


    • #3
      hi,

      Note that DESeq2 doesn't really help you out with this question, as it focuses on gene-by-gene differential expression, and the transformations are most useful for visualizing and clustering samples.

      You don't want the sequencing depth as a factor in the correlation. Consider a situation where gene A and B are not correlated, but you sequence the samples so that each sample has double the number of reads as the previous sample. Then you will get a really high correlation which has no biological significance.

      So you could* do:

      nc <- counts(dds,normalized=TRUE)
      cor(nc[idx,])

      where idx gives the index of genes you want to find correlations for.

      *However, I would also consider batch effects if you are calculating gene-gene correlations and the samples were processed in batches. This would be another way to get spurious large-in-absolute-value correlations. You can check for batch effects using either of the transformations and the plotPCA workflow in the DESeq2 vignette.

      If the samples cluster by batch, then the cqn package vignette explains how to get "normalized expression values", where the normalization takes care of sequencing depth, GC-content bias and gene length bias:

      A normalization tool for RNA-Seq data, implementing the conditional quantile normalization method.

      Comment


      • #4
        Thank you for clearing this up

        hi michael,

        thank you for clearing this up and giving a comprehensive response to this problem. i was thinking along the same lines. someone suggested to use the vsd transformed data from deseq2 and then plot these correlations

        - that however gives really high correlations, almost in line with non-normalized data. i understand that transformations from deseq2 are useful if we want to perform clustering

        - the method that you suggest i.e. using the normalized data makes sense to me. and then look for batch effects as well.

        thanks again

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        26 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        29 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X