Hi all,
I have an experiment with 80 samples both of them run with microarray and RnaSeq. I want to correlate the results between the two technologies.
I received the results from the RnaSeq experiment in two ways:
a) Raw data (fastq files)
b)Table of ensembl id´s counts (no idea how this analysis was done).
I did the analysis in two ways:
1)
For the RnaSeq experiment I took the ensembl id´s counts, translated them into Gene Symbol identifiers (various ensembl Id`s derived in the same Gene Symbol so I just used one of them randomly selected and the other ensembl id´s were discarded), and normalized them with voom (log2 with some modifications).
For the Microarray experiment I normalized them (RMA) using a curated database (hgu133plus2hsentrezgcdf). I translated the entrez id´s probes into Gene Symbol identifiers.
I did the correlation between microarray and RnaSeq (cor.test, two sides, spearman method)) and I obtained good results for all the samples (07-0.9). Attached figure 1 with the scatterplot of sample 1.
2)
I took the fastq files and analyzed them taking into account the HG19 GRC 37 RefSeq as reference. I translated the refseq id´s into gene symbol. I randomly selected one gene symbol per refseq id.
Same microarray data was used for the correlation.
I did the same correlatin as before but the results were worse (0.3-48). Figure 2 shows scatterplot of sample 1.
My question is, does anybody have a clue about why starting with refseq id´s is not giving the same good results? Any clues?
Thanks in advance.
I have an experiment with 80 samples both of them run with microarray and RnaSeq. I want to correlate the results between the two technologies.
I received the results from the RnaSeq experiment in two ways:
a) Raw data (fastq files)
b)Table of ensembl id´s counts (no idea how this analysis was done).
I did the analysis in two ways:
1)
For the RnaSeq experiment I took the ensembl id´s counts, translated them into Gene Symbol identifiers (various ensembl Id`s derived in the same Gene Symbol so I just used one of them randomly selected and the other ensembl id´s were discarded), and normalized them with voom (log2 with some modifications).
For the Microarray experiment I normalized them (RMA) using a curated database (hgu133plus2hsentrezgcdf). I translated the entrez id´s probes into Gene Symbol identifiers.
I did the correlation between microarray and RnaSeq (cor.test, two sides, spearman method)) and I obtained good results for all the samples (07-0.9). Attached figure 1 with the scatterplot of sample 1.
2)
I took the fastq files and analyzed them taking into account the HG19 GRC 37 RefSeq as reference. I translated the refseq id´s into gene symbol. I randomly selected one gene symbol per refseq id.
Same microarray data was used for the correlation.
I did the same correlatin as before but the results were worse (0.3-48). Figure 2 shows scatterplot of sample 1.
My question is, does anybody have a clue about why starting with refseq id´s is not giving the same good results? Any clues?
Thanks in advance.