Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RNAseq vs Microarray

    Hi everyone!

    I am new to this forum because I started working with rnaseq & microarray analysis.

    I have carbonstarvation data of an organism. I have the rnaseq and microarray data of 4 different conditions (day 0, day1, day3, day6)
    I have the analyzed the DEG by Affy and DESeq.

    The rnaseq results are in a csv file and the microarray results are in a text file

    > colnames(RNAseq.data_df)
    [1] "ID" "mean.d0" "mean.d1" "mean.d3" "mean.d6" "FC.d1.d0" "FC.d3.d0"
    [8] "FC.d6.d0" "logFC.d1.d0" "logFC.d3.d0" "logFC.d6.d0" "pval.d1.d0" "pval.d3.d0" "pval.d6.d0"
    [15] "qval.d1.d0" "qval.d3.d0" "qval.d6.d0"

    > colnames(MA.data)
    [1] "X" "GSM542228.CEL.gz"
    [3] "GSM542335.CEL.gz" "GSM542336.CEL.gz"
    [5] "GSM971682_080805MJA_ANIGERa_100480_03.CEL.gz" "GSM971683_080805MJA_ANIGERa_100480_07.CEL.gz"
    [7] "GSM971684_080805MJA_ANIGERa_100480_04.CEL.gz" "GSM971685_080805MJA_ANIGERa_100480_08.CEL.gz"
    [9] "GSM971686_080805MJA_ANIGERa_100480_05.CEL.gz" "GSM971687_080805MJA_ANIGERa_100480_09.CEL.gz"
    [11] "mean.d0" "mean.d1"
    [13] "mean.d3" "mean.d6"
    [15] "FC.d1...d0" "FC.d3...d0"
    [17] "FC.d6...d0" "pValue.d1...d0"
    [19] "pValue.d3...d0" "pValue.d6...d0"
    [21] "qValue.d1...d0" "qValue.d3...d0"
    [23] "qValue.d6...d0"

    I would like to make a Venn-diagram to compare the 2 datasets. I really dont know where to start and how to make the data comparable. Does anyone has any tips?

    Thank you!

  • #2
    The general idea would be to convert the probe IDs on the array to whatever format you used for the counting in DESeq. So if you used Ensembl gene IDs then that'd be the proper ID to convert things to. There are usually packages in Bioconductor to make this easier. One common issue is that microarrays will have multiple probes for the same gene, so it's not always clear how to summarize these in a coherent way.

    Comment


    • #3
      Originally posted by dpryan View Post
      One common issue is that microarrays will have multiple probes for the same gene, so it's not always clear how to summarize these in a coherent way.
      Indeed. Given how much data analysis has been done in and around microarrays, I find surprising how this issue has been left essentially unresolved (neglected?). Especially considering that probes on the same gene, if not on the same exon, might not agree too much in terms of logFC. (Happy to be proved wrong...!)

      Comment


      • #4
        Thank you both for your input. I did not know that I had to go back to the raw counts to make the data comparable. I'm still thinking of a good way to compare the data and a package that I can use. Do you have any experience with this kind of experiments?

        Comment


        • #5
          Originally posted by dariober View Post
          Indeed. Given how much data analysis has been done in and around microarrays, I find surprising how this issue has been left essentially unresolved (neglected?). Especially considering that probes on the same gene, if not on the same exon, might not agree too much in terms of logFC. (Happy to be proved wrong...!)
          I think the you basically said why this isn’t done much. People don’t really combine their RNA-seq and microarray data, rather RNA-seq data replaces the microarray data. Expression arrays are a pain to deal with and just aren’t anywhere as good as RNA-seq, so why even bother? Maybe you can see it like RNA-seq can validate the positives in your expression array, but I wouldn’t care about anything that was DE in your RNA-seq not “replicating” in your expression array (and I mean that in the way that I’d still want to keep those RNA-seq only DE genes).

          BTW, though, to willemate, once you get those probe IDs, which you can also do in Biomart, to the correct gene name, you can just use the linux command “join”. If you’re already in R, you can do this there too, but personally, I already know how to use join, so I’ve just done it that way.

          And you can print unpairable lines in either or both data sets, as well as deal with duplicates, where for multiple array probes, you just print the same gene level RNAseq data in all rows with probes for that gene. So if you must, just do that. But remember you need to sort your files by the column you join on so it might be something like this:

          Code:
          sort -k1,1 array.data.txt > array.data.sort.txt
          sort -k1,1 RNAseq.data.txt > RNAseq.data.sort.txt
          join -j1 -a1 -o #your_output_column_format array.datat,sort.txt RNAseq.data.sort.txt > joined.txt
          You can read up on how the -o option needs to be formatted. But basically, its comma separated list of the file name dot the column name for the new column order to go into the output.

          Ie. 1.1,2.2,1.2 would print the first column of the first file, then the second column of the second file, then second column of the first file in columns 1-3 in the new file respectively.

          You can even use join to merge a biomart export of the conversion table of Affi (or who ever) probe ids to gene names first.

          Comment


          • #6
            Thank you for your input Wallysb. I found a paper (link, fig3 &5) where they compute a compare study comparable to mine. It is just weird that I cannot find any Bioconductor CaseStudy information about this. I guess I should compute a Spearman/Pearson correlation and correlate the FCs in some way...?


            Comment


            • #7
              "To perform correlations between the DEGs patterns from the two platforms, common sets of genes were selected, which were above the detectable threshold and common to both the platforms"

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:47 AM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              57 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Working...
              X