I have 2 sets of RNA-seq data.
One is for cancer patients and is from one of our collabourators. The RNA-seq data is processed and all I have is normalized counts for each Ensembl ID for cancer samples.
The other one is also processed RNA-seq data that I downloaded from TCGA website. This provides normalized counts for each isoform (UCSC Gene) for normal samples.
I need to identify differentially expressed genes between cancer and normal.
I drew cluster dendrogram for all samples (cancer and normal) with the original data, then cancer samples and normal samples consist 2 large clusters. So I used ComBat to adjust the batch effect and plotted cluster dendrogram again. This time, cancer and normal samples are not clearly separated to 2 large clusters, but the subclusters are either all normal or all cancer.
I wonder how people combine their own data and online data?
One is for cancer patients and is from one of our collabourators. The RNA-seq data is processed and all I have is normalized counts for each Ensembl ID for cancer samples.
The other one is also processed RNA-seq data that I downloaded from TCGA website. This provides normalized counts for each isoform (UCSC Gene) for normal samples.
I need to identify differentially expressed genes between cancer and normal.
I drew cluster dendrogram for all samples (cancer and normal) with the original data, then cancer samples and normal samples consist 2 large clusters. So I used ComBat to adjust the batch effect and plotted cluster dendrogram again. This time, cancer and normal samples are not clearly separated to 2 large clusters, but the subclusters are either all normal or all cancer.
I wonder how people combine their own data and online data?
Comment