Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • the correlation of RNA-seq data.

    I generated an correlation value table for all biological replicates.
    Most of them have good correlation 0.9;
    2-3 set around 0.8;
    6-7 sets are < 0.8.

    How about the data? How to improve it? Can I use all of them for differential expression analysis?
    Thank you!

  • #2
    The nature of taking samples and the techniques makes it possible for some biological (or even technical) replicates to seem dissimilar for any given metric (p-values, correlations, etc.)

    I would use any or all of them for DE analysis. If you account for the variance between replicates in your DE analysis you will reduce the number of false positives. That gets into the question of how you intend to determine differential expression. Some methods will allow you to use that replicate variance, while others won't. Either way, it shouldn't hurt you to include them all.

    Comment


    • #3
      For me, most replicates have correlation > 0.95. Less than 0.8 is too low. You can try qq normalization to see if it improves.

      Originally posted by kentnf View Post
      I generated an correlation value table for all biological replicates.
      Most of them have good correlation 0.9;
      2-3 set around 0.8;
      6-7 sets are < 0.8.

      How about the data? How to improve it? Can I use all of them for differential expression analysis?
      Thank you!

      Comment


      • #4
        Could any of you post the method/code for how you're finding the correlation values.
        I did a simple cor(FPKM_1,FPKM_2) on the FPKM values from cufflinks on biological replicates and got a value of .2.

        It would be really useful to know how you guys are doing it.

        Comment


        • #5
          A simple cor() (in R, I assume) can get very skewed by large outlier values such as those that can appear in Cufflinks output. You can try a Spearman correlation cor(FPKM_1, FPKM_2, method="spearman"), or take logs of the FPKM values, or do an arcsine transformation (http://bridgecrest.blogspot.fi/2011/...technical.html)

          Comment


          • #6
            Originally posted by kopi-o View Post
            A simple cor() (in R, I assume) can get very skewed by large outlier values such as those that can appear in Cufflinks output. You can try a Spearman correlation cor(FPKM_1, FPKM_2, method="spearman"), or take logs of the FPKM values, or do an arcsine transformation (http://bridgecrest.blogspot.fi/2011/...technical.html)
            I tried the spearman method which increased my correaltion value to ~.4. I did remove the outliers, ie. all values with exponential values such as 8.1667e-05, 6.162e+05, etc before I ran the correlation test. I also tried the cor of logs(after adding 1's to elimintae 0's) ie. cor(log10(FPKM_1), log10(FPKM_2)) but it did not show any better correlation value. The arcsine method is something I have not tried yet but seems interesting.

            A scatter plot and correlation values using count data seemed good though.
            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

            Comment


            • #7
              If you are only interested in the gene-level FPKMs, you could always divide the counts by gene lengths and the number of mapped reads (divided by a million) which would then lead to well-correlated FPKMs (since you are only scaling the values linearly). If you are interested in isoform-level FPKMs you obviously need something like Cufflinks, but you could try alternatives like RSEM, MISO (etc) to check how they perform.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              31 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              27 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              52 views
              0 likes
              Last Post seqadmin  
              Working...
              X