Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • The Best way to normalize miRNA HTS data

    Hi,

    I'm actually working on HTS miRNA data. I've 8 samples with about 20 M reads per file.
    After adapter trimming in each sample, the number of reads are different in each sample (each sample has a different quality (preparation and sequencing)).

    Example :

    file 1 : 200 000 reads
    file 2 : 1 000 000 reads
    ...

    Now I want to analyze the differential expression. The problem for doing that is the normalization step . What is the best way to normalize the data for comparing multiple samples ?

    Thanks a lot,

    N.

  • #2
    After searching I found some methods :

    - RPM ( Read Per Million )
    - RPKM (Reads per kilobase per million mapped) : Mortazi et al, Nat. Methods, 2008
    - Trimmed mean of M-values : Robinson, Oshlack, Genome Biology 2010
    - upper-quantile : http://www.ncbi.nlm.nih.gov/pubmed/20167110

    Which method is the best for HTS miRNA data ?
    Last edited by NicoBxl; 08-23-2010, 12:37 AM.

    Comment


    • #3
      Dividing by the number of sequenced or mapped reads is a bad idea, for the reasons explained by Robinson and Oshlack.

      I'd advise against quantile normalization (as suggested by Bullard et al., the fourth in your list) . RNA-Seq is known to be linear, and quantile normalization will only distort this. This leaves you with TMM (Robinson and Oshlack) or with the method that we implemented in DESeq (preprint here), which is similar in spirit to TMM but uses a bit different math.

      This all is said assuming that you want to make comparisons between samples, i.e., see whether a given gene's expression depends on your experimental conditions. If you want to compare different genes within the same sample, you need a very different approach. In that case, look at cufflinks (Trapnell et al, 2010).

      Simon

      Comment


      • #4
        Originally posted by Simon Anders View Post
        This all is said assuming that you want to make comparisons between samples, i.e., see whether a given gene's expression depends on your experimental conditions. If you want to compare different genes within the same sample, you need a very different approach. In that case, look at cufflinks (Trapnell et al, 2010).
        Sorry, this paragraph was nonsense; I forgot that you aredealing with miRNA. The whole point of cufflinks (an its FPKM measure) is to deal with splicing variants and differing transcript length. This is not an issue if the read length exceeds the transcript length as is the case with miRNA.

        Simon

        Comment


        • #5
          ok thanks Simon,

          I'll try DESeq

          Nicolas

          Comment


          • #6
            Is it possible wit DESeq to get the normalized count matrix ? to draw a heatmap with the normalized data per example

            Comment


            • #7
              Originally posted by NicoBxl View Post
              Is it possible wit DESeq to get the normalized count matrix ?
              Yes, just divide the jounts by the size factors:

              t( t(counts(cds)) / sizeFactors(cds) )

              (All the 't are to make sure that R divides by column, not by row.)

              to draw a heatmap with the normalized data per example
              You might want to take the log or use DESeq's variance-stabilizng transformation for such a heatmap. (Careful with the latter; I've just found a bug in 'getvarianceStabilizedData'. It's fixed in the devel version (1.1.11) of DESeq but not yet in the release branch.)

              Simon

              Comment


              • #8
                ok thanks,

                I'll try that.

                For the heatmap, it's the log(normalized counts) that you talking about ?

                For the variance, I'll download the dev version

                Comment


                • #9
                  Originally posted by Simon Anders View Post
                  Yes, just divide the jounts by the size factors:
                  You might want to take the log or use DESeq's variance-stabilizng transformation for such a heatmap.

                  I don't understand how to get the log ? Can you explain me that Simon?

                  Thanks a lot

                  N.

                  Comment


                  • #10
                    Hi guys,

                    I'm also trying to normalized the diff expression in my small RNAs. I'd like to try DESeq in R; but after looking of its info:

                    I can't understand how could I do this analysis. Is there any DESeq guide for Dummies? does it take any of my bam files???

                    Sorry to bother you guys with these silly questions.

                    Comment


                    • #11
                      Originally posted by cascoamarillo View Post
                      I'm also trying to normalized the diff expression in my small RNAs. I'd like to try DESeq in R; but after looking of its info:
                      http://www-huber.embl.de/users/anders/DESeq/
                      Have you even tried to read the manual ("vignette")? I ask because the URL you have put does not contain much useful information in addition to the link to the manual, but it does not sound as if you followed it.

                      Comment


                      • #12
                        Originally posted by Simon Anders View Post
                        Have you even tried to read the manual ("vignette")? I ask because the URL you have put does not contain much useful information in addition to the link to the manual, but it does not sound as if you followed it.
                        Thanks for the reply.

                        Yes, I've been looking the manual. Maybe the problem is my little experience with R. I made same plots and HTS data analysis in R, but not an expert on it. Sorry to express my frustration in the post. I'll try to look at the manual further and see what happen.

                        Comment


                        • #13
                          Ok, after a further reading of the manual ("vignette") I've some questions. First of all, I don't work with Drosophila, so I do not need the library pasilla, right?. What I did was to take my sam and gff file and create the count table of my treatened/untreatened conditions (HTSeq). But how can I load this table into DESeq? Do I need my own library "pasilla" ?

                          Thanks

                          Comment


                          • #14
                            No, the 'pasilla' library is example data. It comes with another vignette which, among other things, tells you how to get the data into R.

                            On the long run, it may be well worth your while to read some short introductory tutorial to R. Reading in data tables and writing them out again is such a common procedure that hardly any documentation for a specific R package will explain you how this is done. When I write things like the DESeq vignette, I assume that the reader has already familiarized himself with these basics.

                            (I know that a lot of people here hope for single-click push-button solutions. You will get them --- in ten years or so, when analysis methods are no longer new and constantly changing, but well-matured textbook knowledge with fixed consensus recipes that can be followed blindly without a need to understand them. Until then, you will need some basic statistics and bioinfomatics knowledge to make informed choices about the many different possibilities to analyse your data.)

                            Comment


                            • #15
                              Ok, there's another DESeq manual. I guess this is the one you are talking about:


                              The other is:
                              The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.


                              Looks like the same but not; sorry for the confusion. "Analysing RNA-Seq data with the “DESeq” package" is the one that I need to start working with DESeq. Thanks a lot for this package and the complete documentation that you've provided.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X