Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • correlation value in scatterplots using cummerbund??

    Hi,
    I'm running cummerbund and I made a scatterplot of my data, but I'd like to see the correlation value of that plot. Is it possible? Is there any command to see the R value of that correlation?? Thanks!

  • #2
    You might have to use cor(), a base R function. However, I don't know exactly how to subset a CuffData object so that you get the proper input for cor().

    Comment


    • #3
      That's my big problem right now. I'd like to know how can I use cor() with CuffData.

      Comment


      • #4
        Originally posted by Marcos Lancia View Post
        That's my big problem right now. I'd like to know how can I use cor() with CuffData.
        It's been a while since I've used cummeRbund, but I remember that you can use the fpkmMatrix() function (can't remember the exact usage) to get a matrix that plays nicely with the base R functions.

        Comment


        • #5
          Thanks for the tip cmbetts. It looks like you want to execute
          Code:
          m <- fpkmMatrix(genes(cuffdiff_output))
          cor(m[, 1], m[, 2]) # or whatever columns you need
          Keep in mind that csScatter() might discard some of the data points. You can verify this by running
          Code:
          csScatter(genes(cuffdiff_output))
          # compare to
          plot(m[, 1], m[, 2])

          Comment


          • #6
            I'm progressing, now a new problem

            Thanks for writing everyone! I finally could see the correlation value, but I crashed with another new problem. One of the r value is near to 0, but the plot looks very good, near to 1. I'm pretty sure that data plotted is the same analized. Anybody saw something similar? What did you do? Thanks!

            Comment


            • #7
              Marcos,

              Did you ever sort this out? I would double check that you're supplying the right vectors to cor(). It might also have to do with how csScatter doesn't plot all of the data points. It's hard for me to think of anything else since I don't have the data to play with myself.

              Comment


              • #8
                Hi, again

                For example: I want to analyze TRAP_Sm_rep1 vs TRAP_Sm_rep2. So, I write:

                >samples (cuff)

                sample_index sample_name sample_name parameter value
                1 1 SN16K_mock_rep2 <NA> <NA> <NA>
                2 2 SN16K_mock_rep1 <NA> <NA> <NA>
                3 3 SN16K_Sm_rep2 <NA> <NA> <NA>
                4 4 SN16K_Sm_rep1 <NA> <NA> <NA>
                5 5 TRAP_mock_rep1 <NA> <NA> <NA>
                6 6 TRAP_mock_rep2 <NA> <NA> <NA>
                7 7 TRAP_Sm_rep1 <NA> <NA> <NA>
                8 8 TRAP_Sm_rep2 <NA> <NA> <NA>

                So, I write:

                >cor(m[, 7],m[, 8])

                Is it right?

                Comment


                • #9
                  I have a couple questions for you. Are you following some sort of published analysis? Also, does each row of samples(cuff) have a column in m? In other words, what does
                  Code:
                  all(colnames(m) %in% samples(cuff)[, 2])
                  say when you enter it into your R session?

                  Edit: I realize that the code above isn't exactly what I meant to ask for. Can you paste what R prints for
                  Code:
                  colnames(m)
                  Last edited by blakeoft; 05-04-2015, 10:36 AM.

                  Comment


                  • #10
                    all(colnames(m) %in% samples(cuff)[, 2])
                    [1] TRUE

                    I'm working by myself. I don't following any published analysis. Do you know any? The R help isn't good either.

                    Comment


                    • #11
                      colnames(m)

                      [1] "SN16K_mock_rep2" "SN16K_mock_rep1" "SN16K_Sm_rep2" "SN16K_Sm_rep1"
                      [5] "TRAP_mock_rep1" "TRAP_mock_rep2" "TRAP_Sm_rep1" "TRAP_Sm_rep2"

                      Comment


                      • #12
                        I saw the prefix "TRAP", and it made me think of Trapnell, as in Cole Trapnell. This is why I was curious if you were working with some kind of sample data set or something.

                        Your code should be correct though,
                        Code:
                        cor(m[, 7], m[, 8])
                        should give you what you want. Have you tried checking the correlation between both of these columns with all of the others?

                        You might also compare the following values:

                        Code:
                        sum(m[, 7] == 0)
                        sum(m[, 8] == 0)
                        sum(m[, 7] == 0 & m[, 8] == 0)
                        to see if you have many more zeros in one the columns or if they don't share many of the same zeros.

                        My suggestions are shots in the dark, so I apologize if nothing enlightening happens.

                        Comment


                        • #13
                          No, I´m not working with Cole Trapnell datasets, these data are mine.
                          Question: How can I be sure that data plotted are the same analyzed by correlation? Make some kind of matrix, maybe?
                          Thanks so much for writing, you've been very helpful.

                          Comment


                          • #14
                            After reading in some of my old cufflinks data, I've realized that my second post in this discussion has some incorrect code. Please plot these two figures for comparison:

                            Code:
                            csScatter(genes(cuff), "TRAP_Sm_rep1", "TRAP_Sm_rep2")
                            # compare to
                            plot(log(m[, 7] + 1), log(m[, 8] + 1))
                            These plots should look pretty similar. You should be able to tell that you're passing the right vectors into cor().

                            Comment


                            • #15
                              Hi,
                              Your suggestion of plotting log(m[, ]+1) gaves me an idea. I made the cor(log(m[, ]+1)) and it worked! The cor() values are up to 0.95 all of them. Thanks for your help, mission accomplished, up to now.
                              Do you know how can I put labels in genes with differential expression? I tried with labels=T in scatterplots, but it didn´t work.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              48 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X