Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help Needed to Interpret DESeq Plots

    Hi,

    I am new to RNA-Seq analysis. I have recently tried out DESeq packages and followed the protocol strictly.

    The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.


    I have produced a MA-Plot that looks like this. Is my MA-Plot normal because most MA-Plot I saw was actually symmetrical!! But, mine is out of shape.



    How can I interpret this plot? Is it meaningful?

  • #2
    Yikes! I can't say I've ever seen that in an MA plot. I would be hesitant to go any further with the analysis before finding out what is causing the graph to look like that. Have you opened the alignments in IGV (or similar) to see if things look reasonable and that samples show similarity? Did one of your groups have quality issues?

    Comment


    • #3
      This usually happens if you have an unbalanced design (number of treatment samples is not equal to number of control samples), or if the sequencing depth differs strongly between treatment and control libraries.

      You can go ahead, but you have to bear in mind that lack of balance can cause artifacts in downstream analysis.

      Also: When posting a question here, please give sufficient details! At least always describe the nature of your data, the design of your experiment, and the biological question you aim to address.

      Comment


      • #4
        Hi Anders,

        I believe that the sequencing depth differs between the samples (A and B)

        Library A1 : 36515699 paired-end reads
        Library A2 : 25049861 paired-end reads
        Library B1 : 40222802 paired-end reads
        Library B2 : 15446770 paired-end reads

        Library A1 and B1 are generated at the same time, while Library A2 and B2 are generated at the same time.

        This sample is Mus Musculus.

        The read length is 75bp, and the biological question would be is to find out genes that are differentially expressed in the samples.

        The mapping percentage of Library A1 and A2 is very low (<40%), but library B1 and B2 is high (>80%). I have checked the reads quality, and they are OK.

        Do you suggest that I do not combine both replicates in each sample during the analysis, that means I will use the library with higher coverage for analysis only? For example, Library A1 and Library B1 are used only.

        Thank you so much! BTW, DESeq is well designed! Easy to learn. =)

        Comment


        • #5
          1. Again: Please give details. Contrary to popular belief, biology is not at all irrelevant when discussion statistical analysis. So, what is the difference between samples A and B? And what are the samples? I guess we are not talking about whole ground-up adult mice.

          2. Are your number total reads or mapped reads? What are the size factors?

          3. If you have strong differences in the number of mapped reads, the asymmetry of the MA plot is expected and unavoidable. You simply have less power to see differential expression of a gene that is weak in the shallow and strong in the deep condition than vice versa.

          4. No, you cannot omit one sample, because, the, you would not have any replicates any more/

          5. The fact that your mapping percentage differs so much is very worrying. If you did four times the same experiment this should not happen. If, however, you first did condition A and then, much later, did condition B, and maybe performed your experiment slightly differently, you should stop right there, give up, and start over. (Confounding experimental conditions with batches is a mortal sin that completely invalidates any experiment.)

          Comment


          • #6
            Dear Anders,

            I did not run the experiment myself. Based on my understanding, sample A is a diseased Sample (lower mapped reads number) and sample B is a normal (reference) sample.

            The reads that I have provided is the total number of reads generated for each library.

            We are looking at heart sample of the mice. We are looking at how the expression level of transcriptome changes when the heart is having disease (such as heart attack).

            The replicates for each library are generated together, meaning that A1, and B1 are generated together, A2 and B2 are generated together.

            Comment


            • #7
              Wait, you have only two samples, one from a single healthy mouse and one from a single mouse with a heart disease?

              Comment


              • #8
                No, it is a heart tissue sample, where normal heart tissue (RNA-Seq sample is generated when the tissue is un-stimulated) and diseased (RNA-Seq sample is generated when the tissue is stimulated to disease state).

                Comment


                • #9
                  Sorry, I don't get it. Are you talking about a tissue _culture_ (heart muscle cells grown in a dish) or a tissue _sample_ (a piece of tissue dissected from the heart of an animal)? How can you "stimulate a tissue to a disease state"?

                  But, to cut things short: In principle, an asymmetric MA plot is fine, only you have to be extra careful in interpreting your results. However, given the information available in this thread, it sounds as this experiment may have much more serious issues than an unbalanced sequencing depth, but this is hard to say without more details. Therefore, I hesitate suggesting that you simply go ahead.

                  On the other hand, it sounds as if you are just performing a service for somebody else, and so may not have much of a say there anyway. So, you may want to just leave it at this, and simply run a standard RNA-Seq analysis.
                  Last edited by Simon Anders; 04-25-2013, 01:37 AM.

                  Comment


                  • #10
                    Dear Anders,

                    Thank you so much for your time and help. =)

                    I guess for the time being, I will focus on the list of reported differentially expressed genes, and discuss with the biologist if the result makes any sense. And, hopefully, plan and run the experiments again.

                    =)

                    Comment


                    • #11
                      I have a burning question,

                      will trimming off low quality end of the reads help to improve mapping percentage?

                      CTGACCTCAAACTGTACTACAGAGCAATANNNACTANATNCTGCATGGTANTNNNNNNNNNNNNNNNNNNNNNNT

                      I will remove the end padded with N, and reduce the length?

                      Is this advisable?

                      Comment


                      • #12
                        This depends on the aligner you are using. Most newer aligners take base-call qualities into account and hence disregard bad ends anyway. Just check whether all these reads with Ns at the end actually did get aligned, to see whether your aligner managed to deal with them.

                        Rather, I'd suggest to take a few of your a non-aligned reads with reasonable quality (at least over part of the read) and blast them against the full NCBI nucleotide collection. Once you know where they do come from (often: primer dimers, human contamination) you usually also know why they did not align.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin




                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                          04-22-2024, 07:01 AM
                        • seqadmin
                          Current Approaches to Protein Sequencing
                          by seqadmin


                          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                          04-04-2024, 04:25 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 11:49 AM
                        0 responses
                        15 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-24-2024, 08:47 AM
                        0 responses
                        16 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-11-2024, 12:08 PM
                        0 responses
                        61 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-10-2024, 10:19 PM
                        0 responses
                        60 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X