Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNA-seq Exfold Tutorial

    Hi everyone,

    This is a beefy one... sort of.

    I've set up a big RNA-seq experiment where I'm comparing pooled mouse samples. I've clipped off a bit of tissue and extracted the RNA, then for each time point I've pooled together a few individuals since the tissue I'm using is very limited and I can't get much RNA from them. There are 4 samples per pool. After pooling, I ran Ribozero to get rid of the rRNA and during the process I spiked the samples using the ERCC ex-Fold spike in mix (0.5ul - a dilution amount that seemed to be appropriate for my experiment).

    This is the set up:

    Mouse E11.5 Control (4 individuals pooled in to the same tube)
    Mouse E11.5 TEST (4 individuals pooled in to the same tube)
    Mouse E12.5 Control (individuals...etc)
    Mouse E12.5 TEST... etc etc
    All the way up to
    Mouse E17.5 Control (4 individuals pooled)
    Mouse E17.5 TEST (4 individuals pooled)

    Each pool was sequenced on the Illumina hi-seq using v3 chemistry and I have the data. The problem that I have is trying to analyze the pools for differential expression and using the ERCC spike-ins for normalization.

    So just to clear a couple things up first

    --The point of this experiment is not to generate an end all serial transcriptome data set for the tissue I'm studying. We were willing to spend the money to do this as an exploratory experiment to highlight specific genes that we would follow up later. So its just exploratory and not for publishing, necessarily.

    --We are aware of the alternatives for the approach to this experiment, but decided that based on our goals and our budget that this would be the best approach.

    Okay - so considering all of these details, I was hoping I might get some feed back on the following questions:

    1. Was it necessary for us to use the ERCC ex-fold spike ins for this experimental set up? We went back and forth about this a little bit, and decided it would be best to use them. But I wanted to get a feel from the community on this. I know the ERCC spikes are supposed to help control for platform variation, but since we multiplexed all of the pools during the run (across several lanes), does this even matter?

    2. How on earth do I actually normalize the data from the ERCC spike ins. I mean step by step. I have run CuffDiff, and it seems to have its own normalizing standard when performing the analysis which did produce some very interesting results... but surely it doesn't it take in to account the ERCC spike ins automatically? I've also come across forum threads where people reference random functions with no context, like "loess.normalization()". What on Earth is that supposed to mean? Sounds like excel! haha I haven't been able to find a single how-to or tutorial on how to actually run the ERCC normalization. Maybe I'm not looking in the right place? I'm not hugely familiar with the bioinformatics skills necessary for doing this, but there is also no guidance or expertise on this at the institution/dept. I'm in. But we also don't want to outsource. Can anyone give me a step by step or link to a guide for normalizing my RNA-seq data using the ERCC spike ins? I don't have an intuitive knowledge of which programs I am supposed to use and I don't know what some random function is supposed to represent or where I'm supposed to implement it... but I do have the skills to learn how to use the tools with a little guidance.

    Thanks so much for any help and please let me know if you need any more information!

    Cheers!

    Paul

  • #2
    1. No, they weren't needed. ERCC spike-ins are mostly useful for single-cell sequencing. I wouldn't bother with them here unless the library normalization goes weird.
    2. I seriously doubt that you can use spike-ins with cuffdiff. When you see people mentioning loess normalization, they're talking about doing things in R, which is pretty much what you'll have to do as well. The general idea is to align to a genome containing the ERCC sequences in it (just concatenate your reference with them) and then get count information for the spike-ins as well as the real genes. You then import that into R using whatever method you prefer and use on the ERCC subset of that for library normalization. You then apply the computed size factors to the dataset (removing the ERCC probes) and continue with the analysis. If you have no clue what that means then either don't bother with the ERCC spike-ins (a good idea anyway since they're likely to produce crappier results) or collaborate with a local bioinformatician.

    Comment


    • #3
      Hi Paul,

      You might be interested in the new erccdashboard R package for analyzing your data. The package is available on Bioconductor: http://bioconductor.jp/packages/3.0/...dashboard.html

      The publication describing the erccdashboard,"Assessing technical performance in differential gene expression experiments with external spike-in RNA control ratio mixtures" is here: http://www.nature.com/ncomms/2014/14...comms6125.html

      These resources will provide you with details and empirical evidence that should more substantively answer your questions about the utility of the ERCC spike-ins compared to the level of detail that can reasonably be provided in replies to your post. The ERCC spike-ins can be used for more than single-cell sequencing and normalization -- although these have been areas where they've seen a lot of use.

      I'd be happy to work with you on your analysis of the ERCC spike-ins in your experiments and your use of the erccdashboard -- you can feel free to contact me directly.

      Cheers,
      Sarah

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM
      • seqadmin
        Techniques and Challenges in Conservation Genomics
        by seqadmin



        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

        Avian Conservation
        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
        03-08-2024, 10:41 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:37 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 06:07 PM
      0 responses
      8 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-22-2024, 10:03 AM
      0 responses
      49 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 03-21-2024, 07:32 AM
      0 responses
      67 views
      0 likes
      Last Post seqadmin  
      Working...
      X