Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • correlation in amplification RNAseq

    Hi all,
    I am trying to find correlation between biological replicates and between different conditions.
    After aligned to genome and converting to counts i have counts matrix.

    I am new to statistical tests so it might sound silly.
    i tried to use pearson and spearman and the correlation wasnt so high..
    Between biological replicates ~0.6 and so between conditions..
    i was expecting to get higher correlation between biological rather than conditions and i am getting the same..

    The experiment was amplification from small amount of RNA.. so i have samples with 0 to some genes.
    So, i tried to remove those 0, but i didnt change much the results..
    Do anyone have any advice?
    Did someone experience with correlation in amplification ?

    Thanks,
    Pap

  • #2
    At first glance you should plot the data against each other. Which means that, for instance let's assume you got one dataset (A) and one biological replicate (B) , you plot the expression value of A against the value of B. In best case, when correlation is 1 you should get a straight line through your coordinate system. For sure their will be some bias through amplification but in my opinion it shouldn't count that much (at least in bio-replicates). Comparing conditions might rise results like 0.6 correlation if the conditions are more or less totally different!

    Comment


    • #3
      I am trying to find correlation between two biological replicates RNA-seq runs.
      I ran Tophat and Cufflinks, extracted the FPKM values and plotted these values against each other and find correlation using R.

      Code:
      library(car)
      reg1<-lm(FPKM_1~FPKM_2)
      cor(FPKM_1,FPKM_2)
      The correlation I found was .2 which is disappointing given that these are replicates. Have anyone else tried other ways to find correlation between replicates and found a good correlation?

      Comment


      • #4
        re #1: You could try to run a standard DESeq analysis on your data, and see if it finds anything. Maybe, you have only very few differentially expressed genes, which a DE analysis can find but which are not enough to change the overall correlation coefficient. More likely, though, a DE analysis will confirm what your comparison of correlation coefficients suggests, namely, that the effect of the differences between your conditions is weaker than your variation between replicates, i.e., that your experiment has failed.

        re #1 and #3: It is generally more helpful to look at scatter plots than only at correlation values. You want to know whether is is only the genes with low counts or all genes that differ a lot. For this, make sure to plot raw counts, not RPKM values.

        Comment


        • #5
          Originally posted by Simon Anders View Post
          re #1 and #3: It is generally more helpful to look at scatter plots than only at correlation values. You want to know whether is is only the genes with low counts or all genes that differ a lot. For this, make sure to plot raw counts, not RPKM values.
          Thank you Simon.Plotting raw counts scatter plot showed a tight plot at 45 degrees from the origin. The correlation value was .92 too.Using the count values does make sense, however, why such pattern is not found in the FPKM values is something I cannot make sense of.

          While one possibility is that cufflinks could throw some FPKM values which could be outliers, I filtered all rows which have an exponential value(such as 1.78989e+06, 8.1667e-05, etc..) and used the remaining values for the scatter plot.

          If the FPKM values are indeed normalized what I really should be seeing is difference between these values(FPKM_sample1-FPKM_sample2) close to 0?

          Comment


          • #6
            Originally posted by vyellapa View Post
            Thank you Simon.Plotting raw counts scatter plot showed a tight plot at 45 degrees from the origin. The correlation value was .92 too.Using the count values does make sense, however, why such pattern is not found in the FPKM values is something I cannot make sense of.
            How did you change from fpkm to counts?

            Comment


            • #7
              Counts can be obtained using two ways:
              -Using HTseq-counts
              htseq-count -m intersection-strict -s no queryNameSorted.sam ~/GRCh37_E64_1kg.gtf > output

              -Using cuffdiff.cuffdiff now gives a another output file with count information.

              It is possible to convert FPKMs to counts but the above methods are more straightforward and tested.
              Last edited by vyellapa; 07-17-2012, 09:26 AM.

              Comment


              • #8
                Originally posted by vyellapa View Post
                Counts can be obtained using two ways:
                -Using HTseq-counts
                htseq-count -m intersection-strict -s no queryNameSorted.sam ~/GRCh37_E64_1kg.gtf > output

                -Using cuffdiff.cuffdiff now gives a another output file with count information.

                It is possible to convert FPKMs to counts but the above methods are more straightforward and tested.
                Thanks!
                But, when i looked into cuffdiff output, i found the mean for each replicate..(genes.count_tracking)
                if i want to plot 2 replicates.. i need the raw for each...
                Am i missing something?

                Thanks

                Comment


                • #9
                  I have been using HTseq for my counts but the most obvious thing to do it seems is to rerun cuffdiff now with each replicate as the new input as opposed to grouping replicates.

                  Comment


                  • #10
                    Yes sure.
                    I thought maybe there is a shortcut that i am missing.

                    Thank!

                    Comment


                    • #11
                      in my experiment we did amplification.. which cause to many genes have expression of 0.. even within the same condition..
                      i can have for the same gene in
                      condition 1 replicate A count of 10
                      condition 1 replicate B count of 100
                      condition 1 replicate C count of 1000

                      or,
                      condition 1 replicate A count of 0
                      condition 1 replicate B count of 0
                      condition 1 replicate C count of 150

                      That is why when i am doing correlation i am getting very low correlation(0.2)..
                      But, when i am doing Differential expression within replicates and between conditions (Both with Deseq) i am getting much more DE genes between the conditions..

                      To conclude:
                      Differential expression looks to work fine..
                      Correlation not.. (i tried also to remove outliers.. 0'z or 10000+)

                      Any advice?
                      I plotted in scatter and it looks not good..

                      Any help will be appreciated!

                      Comment


                      • #12
                        We have to remember that when the resolution is higher the variance also..

                        Comment


                        • #13
                          I am not sure what you mean by "doing differential expression within replicates" but if you really have so strong variance, the DE analysis between conditions should better not give many hits.

                          Comment


                          • #14
                            Originally posted by Simon Anders View Post
                            I am not sure what you mean by "doing differential expression within replicates" but if you really have so strong variance, the DE analysis between conditions should better not give many hits.
                            What i was meanning is doing DE tests (as DESeq).
                            For example, if for condition 1 i have 3 replicates. so, by examining rep1 to rep2 using DESeq i am getting 26 DE genes.
                            Moreover, if i am examining cond1 rep1 vs. cond2 rep1 i am getting 300 DE genes..

                            hope it was more clear..

                            Thanks

                            Comment


                            • #15
                              If you compare a single sample to another single sample, you heve to use DESeq's "blind" dispersion estimation method. For the comparison between conditions you probably have not used the blind settings. Comparing the number of DE genes from two analyses performed with so completely different settings makes no sense.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advancing Precision Medicine for Rare Diseases in Children
                                by seqadmin




                                Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
                                12-16-2024, 07:57 AM
                              • seqadmin
                                Recent Advances in Sequencing Technologies
                                by seqadmin



                                Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                                Long-Read Sequencing
                                Long-read sequencing has seen remarkable advancements,...
                                12-02-2024, 01:49 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 12-17-2024, 10:28 AM
                              0 responses
                              39 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-13-2024, 08:24 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-12-2024, 07:41 AM
                              0 responses
                              38 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 12-11-2024, 07:45 AM
                              0 responses
                              46 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X