Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DE Analysis of targeted RNA sequencing with many DE genes

    Hi Everyone,
    About a month ago I started working on a project involving targeted RNA seq and have been rapidly learning about the techniques/analyses, …
    I have however run into some theoretical questions specific to our set-up, which, after extensive online searches, I can’t seem to find an answer to. I hope that some of you may be able to give some advice… I apologise if any of my questions are due to my inadequate knowledge, I am trying very hard to catch up on all there is to know about RNA-seq…

    Our assay set-up is the following:
    The assay is a Targeted RNA next generation sequencing assay to screen for differential expression in a limited number of gene targets (88) between tumor samples and adjacent ’normal/healthy’ samples from patients. For this, targets are enriched by amplification with specific primers in both tumor and adjacent sample and then run on a Miseq, similar to the Truseq Targeted RNA expression kits available from Illumina. Illumina offers kits with different panels of genes related to certain pathways.

    Illumina then offers the possibility to analyse the data using DESeq. For the initial sizing of the library, you can either use DESeq to estimate the size factors, or indicate reference genes that are then used to estimate the size factor based on their geometric mean. DESeq is then used to perform the differential expression analysis (or that’s how I understood it, because normally, DESeq needs the raw counts as an input).

    DESeq (and many other packages available for RNA-seq ) assumes that the majority of genes is not differentially expressed. Since my panel is mostly composed of (according to micro-array data) differentially expressed genes, 4 reference genes were included to bypass this issue.
    The reference genes were selected from a list of 11 candidates using Normfinder on Micro-array data.

    I have the following questions:
    1. Is there any way that one can validate the stable expression of reference genes in NGS data? How have other people selected reference genes for targeted RNA-seq and subsequently confirmed that they are indeed stable?
    2. I would like to use a package like DESeq for the differential expression analysis. But for the estimation of the dispersion of the genes, it again assumes that the majority of the genes are not differentially expressed. For our data, that would render it too conservative, possible giving a lot of false negatives. Would it be valid approach to use a number of ‘healthy’ samples to create an artificial ‘reference’ sample and estimate the dispersion factors from this artificial reference sample? Can I then put these parameters into DESeq so that when I compare a tumor sample to it’s healthy sample the dispersion of the healthy is based on that reference?

    I have a lot more questions, but I guess this is a start… I hope I’m making sense.
    Looking forward to hear from any of you…

  • #2
    No one has any ideas?

    I'm at loss here...

    Comment


    • #3
      Hi Kristen!


      I have a few comments that may help, even if they don't actually answer your 2 questions per se.

      First, if you like DESeq, then I strongly recommend "upgrading" to DESeq2, which has many improvements over its predecessor - for example, extended support for more complex experimental designs (Bioconductor page: http://www.bioconductor.org/packages...ml/DESeq2.html).

      Second, to my understanding the base DESeq methodology assumes that most genes are not DE only in the way it calculates its scaling factors (DESeq: http://genomebiology.com/2010/11/10/R106 ; Normalisation method review: http://bib.oxfordjournals.org/content/14/6/671.full), i.e. the median-of-ratios approach.
      Regarding gene expression dispersion estimation, it only assumes that most genes are not DE when there are no replicates - section "Working without replicates" in the DESeq paper. I'm going to assume you have several samples per condition...
      If you expect most/all of your genes to be DE, as one might expect when doing targeted sequencing, then as you pointed out this would be a problem. However, DESeq (and DESeq2) offer the possibility of using supplied scaling factors, rather than the DESeq ones, so if you can come up with alternative scaling factors that don't rely conceptually on most genes being non-DE, then you can use those instead.
      Which leads me to my third remark!

      Third, rather than use "reference genes", you could consider using ERCC RNA spike-ins as internal controls (Publications that may be of interest: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3166838/ ; http://www.plosone.org/article/info%...l.pone.0041356 ; Life Technologies page: http://www.lifetechnologies.com/orde...roduct/4456739). The general idea is to add in RNA molecules with known but varying concentrations into your sample. This allows one to derive several sequencing QC checks, helps make data comparable across experiments, and more - especially relevant here, they can be used for sample normalisation, either by normalising to one or more ERCCs, or by letting DESeq calculate its scaling factors on just the ERCCs (see the 2nd reply to this reply of this topic: https://www.biostars.org/p/81803/#81817). Either way it should be more robust that "housekeeping" genes.

      Obviously I may be biased as we're a) using ERCC spike-ins in our 3 sample-per-condition RNA seq project and b) using DESeq2 to analyse the expression estimates, and we're pretty happy with the results!


      Let me know what you think!

      Hope this helps,

      -- Alex

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 11:49 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 08:47 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      61 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Working...
      X