SEQanswers

Go Back   SEQanswers > Introductions



Similar Threads
Thread Thread Starter Forum Replies Last Post
Overlap/comparison microarray and RNAseq MDonlin Bioinformatics 7 03-10-2014 01:53 AM
EST, microarray, RNAseq difference? dkrtndhkd Bioinformatics 1 08-26-2013 09:32 AM
use a microarray-based R package with RNAseq data? piemmea Bioinformatics 1 05-06-2013 02:26 PM
Transcriptomics RNASeq + Microarray dataset rmflight Bioinformatics 1 12-13-2012 11:27 PM
who coined RNAseq? RNAseq as an alignment first approach only brachysclereid Bioinformatics 3 01-10-2012 12:17 PM

Reply
 
Thread Tools
Old 03-10-2014, 02:26 AM   #1
willemate
Junior Member
 
Location: Berlin

Join Date: Mar 2014
Posts: 9
Question RNAseq vs Microarray

Hi everyone!

I am new to this forum because I started working with rnaseq & microarray analysis.

I have carbonstarvation data of an organism. I have the rnaseq and microarray data of 4 different conditions (day 0, day1, day3, day6)
I have the analyzed the DEG by Affy and DESeq.

The rnaseq results are in a csv file and the microarray results are in a text file

> colnames(RNAseq.data_df)
[1] "ID" "mean.d0" "mean.d1" "mean.d3" "mean.d6" "FC.d1.d0" "FC.d3.d0"
[8] "FC.d6.d0" "logFC.d1.d0" "logFC.d3.d0" "logFC.d6.d0" "pval.d1.d0" "pval.d3.d0" "pval.d6.d0"
[15] "qval.d1.d0" "qval.d3.d0" "qval.d6.d0"

> colnames(MA.data)
[1] "X" "GSM542228.CEL.gz"
[3] "GSM542335.CEL.gz" "GSM542336.CEL.gz"
[5] "GSM971682_080805MJA_ANIGERa_100480_03.CEL.gz" "GSM971683_080805MJA_ANIGERa_100480_07.CEL.gz"
[7] "GSM971684_080805MJA_ANIGERa_100480_04.CEL.gz" "GSM971685_080805MJA_ANIGERa_100480_08.CEL.gz"
[9] "GSM971686_080805MJA_ANIGERa_100480_05.CEL.gz" "GSM971687_080805MJA_ANIGERa_100480_09.CEL.gz"
[11] "mean.d0" "mean.d1"
[13] "mean.d3" "mean.d6"
[15] "FC.d1...d0" "FC.d3...d0"
[17] "FC.d6...d0" "pValue.d1...d0"
[19] "pValue.d3...d0" "pValue.d6...d0"
[21] "qValue.d1...d0" "qValue.d3...d0"
[23] "qValue.d6...d0"

I would like to make a Venn-diagram to compare the 2 datasets. I really dont know where to start and how to make the data comparable. Does anyone has any tips?

Thank you!
willemate is offline   Reply With Quote
Old 03-10-2014, 05:33 AM   #2
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

The general idea would be to convert the probe IDs on the array to whatever format you used for the counting in DESeq. So if you used Ensembl gene IDs then that'd be the proper ID to convert things to. There are usually packages in Bioconductor to make this easier. One common issue is that microarrays will have multiple probes for the same gene, so it's not always clear how to summarize these in a coherent way.
dpryan is offline   Reply With Quote
Old 03-10-2014, 07:36 AM   #3
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Quote:
Originally Posted by dpryan View Post
One common issue is that microarrays will have multiple probes for the same gene, so it's not always clear how to summarize these in a coherent way.
Indeed. Given how much data analysis has been done in and around microarrays, I find surprising how this issue has been left essentially unresolved (neglected?). Especially considering that probes on the same gene, if not on the same exon, might not agree too much in terms of logFC. (Happy to be proved wrong...!)
dariober is offline   Reply With Quote
Old 03-18-2014, 06:46 AM   #4
willemate
Junior Member
 
Location: Berlin

Join Date: Mar 2014
Posts: 9
Default

Thank you both for your input. I did not know that I had to go back to the raw counts to make the data comparable. I'm still thinking of a good way to compare the data and a package that I can use. Do you have any experience with this kind of experiments?
willemate is offline   Reply With Quote
Old 03-18-2014, 07:57 AM   #5
Wallysb01
Senior Member
 
Location: San Francisco, CA

Join Date: Feb 2011
Posts: 286
Default

Quote:
Originally Posted by dariober View Post
Indeed. Given how much data analysis has been done in and around microarrays, I find surprising how this issue has been left essentially unresolved (neglected?). Especially considering that probes on the same gene, if not on the same exon, might not agree too much in terms of logFC. (Happy to be proved wrong...!)
I think the you basically said why this isn’t done much. People don’t really combine their RNA-seq and microarray data, rather RNA-seq data replaces the microarray data. Expression arrays are a pain to deal with and just aren’t anywhere as good as RNA-seq, so why even bother? Maybe you can see it like RNA-seq can validate the positives in your expression array, but I wouldn’t care about anything that was DE in your RNA-seq not “replicating” in your expression array (and I mean that in the way that I’d still want to keep those RNA-seq only DE genes).

BTW, though, to willemate, once you get those probe IDs, which you can also do in Biomart, to the correct gene name, you can just use the linux command “join”. If you’re already in R, you can do this there too, but personally, I already know how to use join, so I’ve just done it that way.

And you can print unpairable lines in either or both data sets, as well as deal with duplicates, where for multiple array probes, you just print the same gene level RNAseq data in all rows with probes for that gene. So if you must, just do that. But remember you need to sort your files by the column you join on so it might be something like this:

Code:
sort -k1,1 array.data.txt > array.data.sort.txt
sort -k1,1 RNAseq.data.txt > RNAseq.data.sort.txt
join -j1 -a1 -o #your_output_column_format array.datat,sort.txt RNAseq.data.sort.txt > joined.txt
You can read up on how the -o option needs to be formatted. But basically, its comma separated list of the file name dot the column name for the new column order to go into the output.

Ie. 1.1,2.2,1.2 would print the first column of the first file, then the second column of the second file, then second column of the first file in columns 1-3 in the new file respectively.

You can even use join to merge a biomart export of the conversion table of Affi (or who ever) probe ids to gene names first.
Wallysb01 is offline   Reply With Quote
Old 03-19-2014, 03:09 AM   #6
willemate
Junior Member
 
Location: Berlin

Join Date: Mar 2014
Posts: 9
Default

Thank you for your input Wallysb. I found a paper (link, fig3 &5) where they compute a compare study comparable to mine. It is just weird that I cannot find any Bioconductor CaseStudy information about this. I guess I should compute a Spearman/Pearson correlation and correlate the FCs in some way...?


http://pubs.acs.org/doi/pdf/10.1021/tx200103b
willemate is offline   Reply With Quote
Old 03-19-2014, 03:29 AM   #7
willemate
Junior Member
 
Location: Berlin

Join Date: Mar 2014
Posts: 9
Default

"To perform correlations between the DEGs patterns from the two platforms, common sets of genes were selected, which were above the detectable threshold and common to both the platforms"

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3790783/
willemate is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:04 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO