Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • clarification of rna-seq normalization

    Hi everybody,

    I read a lot in the last few days about the different opinions to rna-seq normalization methods.
    To be honest I'm quite a bit confused at the moment and so I would like to ask for your help to try and clarify me about how to use what kind of normalization method.

    I'm sure that there is no straightforward answer for such a question but I would really appreciate contradictory opinions if it will help for other users also to explain the problem.

    As far as I understand it there is no "standard" method for normalizing methods.

    We have one rna-seq experiment with each only one set for control and one set for treatment. Albeit the fact of insignificance regarding the lack of replicates, I would like to understand how to work in general with rna-seq data.

    we would like to look into both differential expression and differences in splice variants between the two conditions.
    I have read opinion about how to normalize the data in best way for identifying differentially expressed genes and for identifying isoforms.
    Apparently these two goals should be analyzed differently.
    The best example for that was the discussion between Simon and lpachter about when to normalize how here: http://seqanswers.com/forums/showthr...p?t=586&page=1

    I think it shows how controversy this can be. I was interested in this discussion, though it is quite an old one and a lot have changed probably.

    RPKM measure the relative level of gene expression between experiments, but apparently some people are against it, due to certain biases, which it can't compensate. In the posting above, Simon mentions DESeq (EdgeR), which suppose to work better for differential expression

    So my questions are:
    (well I will probably have a lot more, but these are to begin with)

    1. Will it be better to normalize the data twice separately for the two goals

    2. Does it make sense to normalize data one time after the other?

    3. Can I relay on cuffdiff/cuffcompare to give me a good estimation on the splice variants and on DESeq/DEGSeq to give me a good estimation about the differentially expressed genes?

    I would appreciate every comment or discussion.

    Thanks

    A.

  • #2
    Clearly it is important to follow the assumptions and models within each of the tools you mention.

    If you want to compile a simple "table of expression", you can produce RKPMs, fold-changes, etc. If, however you use a specific tool, such as edgeR, which has its own methodology for normalizing and estimating differences in expression (bearing in mind that edgeR has a variety of models implemented, as explained in its manual), then you should provide it what it expects, i.e. raw read counts

    Since we are still in early days clearly lab validation of results is the key to understanding which tools are giving you best answers in the end....
    --------------------------------------
    Elia Stupka
    Co-Director and Head of Unit
    Center for Translational Genomics and Bioinformatics
    San Raffaele Scientific Institute
    Via Olgettina 58
    20132 Milano
    Italy
    ---------------------------------------

    Comment


    • #3
      Hey,

      you are asking somewhat for the 'holy grail' - how to normalize my data.
      In my opinion the most crucial step is to know where your data comes from. Thus, DE normalization between technical replicates needs to be different from DE detection between biological replicates (poisson vs. neg. binom (see Marioni et al.)). In addition, as mentioned above, every method assumes a different distribution of reads.
      RPKM 'just' normalize for gene length and amount of reads in total. It does not correct biases coming from transcript abundance in the library. Thus your RPKM values should follow a normal distrib. and they should not show a linear correlation between gene length and transcript abundance. However, since housekeepers provide a great amount of transcript one should also take into account to normalize maybe with quantile normalization, for instance. DESeq (and stuff like that) want the raw counts to estimate dispersion and distribution to optimally fit the assumptions to the given data. So I would do different analysis (i.e. using DESeq as well as RPKM/FC analysis) and compare the results. From that comparison you can figure out what distribution fits best to your data, at least somewhat.

      Comment


      • #4
        Hi frymor,

        You may try different methods but ultimately you must rely on the follow-up experiment(s) to validate the results. Let's say you try 2-3 analysis methods/models, you will have DE genes identified by all methods or by some. You need to validate them by independent methods - e.g., qPCR. The field needs sufficient validation results to see which method is better suited for a certain application.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        8 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        66 views
        0 likes
        Last Post seqadmin  
        Working...
        X