Seqanswers Leaderboard Ad

**dpryan** · 03-10-2014, 05:33 AM

The general idea would be to convert the probe IDs on the array to whatever format you used for the counting in DESeq. So if you used Ensembl gene IDs then that'd be the proper ID to convert things to. There are usually packages in Bioconductor to make this easier. One common issue is that microarrays will have multiple probes for the same gene, so it's not always clear how to summarize these in a coherent way.

**dariober** · 03-10-2014, 07:36 AM

Originally posted by dpryan View Post

One common issue is that microarrays will have multiple probes for the same gene, so it's not always clear how to summarize these in a coherent way.

Indeed. Given how much data analysis has been done in and around microarrays, I find surprising how this issue has been left essentially unresolved (neglected?). Especially considering that probes on the same gene, if not on the same exon, might not agree too much in terms of logFC. (Happy to be proved wrong...!)

**willemate** · 03-18-2014, 06:46 AM

Thank you both for your input. I did not know that I had to go back to the raw counts to make the data comparable. I'm still thinking of a good way to compare the data and a package that I can use. Do you have any experience with this kind of experiments?

**Wallysb01** · 03-18-2014, 07:57 AM

Originally posted by dariober View Post

Indeed. Given how much data analysis has been done in and around microarrays, I find surprising how this issue has been left essentially unresolved (neglected?). Especially considering that probes on the same gene, if not on the same exon, might not agree too much in terms of logFC. (Happy to be proved wrong...!)

I think the you basically said why this isn’t done much. People don’t really combine their RNA-seq and microarray data, rather RNA-seq data replaces the microarray data. Expression arrays are a pain to deal with and just aren’t anywhere as good as RNA-seq, so why even bother? Maybe you can see it like RNA-seq can validate the positives in your expression array, but I wouldn’t care about anything that was DE in your RNA-seq not “replicating” in your expression array (and I mean that in the way that I’d still want to keep those RNA-seq only DE genes).

BTW, though, to willemate, once you get those probe IDs, which you can also do in Biomart, to the correct gene name, you can just use the linux command “join”. If you’re already in R, you can do this there too, but personally, I already know how to use join, so I’ve just done it that way.

And you can print unpairable lines in either or both data sets, as well as deal with duplicates, where for multiple array probes, you just print the same gene level RNAseq data in all rows with probes for that gene. So if you must, just do that. But remember you need to sort your files by the column you join on so it might be something like this:

Code:

sort -k1,1 array.data.txt > array.data.sort.txt
sort -k1,1 RNAseq.data.txt > RNAseq.data.sort.txt
join -j1 -a1 -o #your_output_column_format array.datat,sort.txt RNAseq.data.sort.txt > joined.txt

You can read up on how the -o option needs to be formatted. But basically, its comma separated list of the file name dot the column name for the new column order to go into the output.

Ie. 1.1,2.2,1.2 would print the first column of the first file, then the second column of the second file, then second column of the first file in columns 1-3 in the new file respectively.

You can even use join to merge a biomart export of the conversion table of Affi (or who ever) probe ids to gene names first.

**willemate** · 03-19-2014, 03:09 AM

Thank you for your input Wallysb. I found a paper (link, fig3 &5) where they compute a compare study comparable to mine. It is just weird that I cannot find any Bioconductor CaseStudy information about this. I guess I should compute a Spearman/Pearson correlation and correlate the FCs in some way...?

Just a moment...

http://pubs.acs.org/doi/pdf/10.1021/tx200103b

**willemate** · 03-19-2014, 03:29 AM

"To perform correlations between the DEGs patterns from the two platforms, common sets of genes were selected, which were above the detectable threshold and common to both the platforms"

Page not available - PMC

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3790783/

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 9 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

RNAseq vs Microarray

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News