Hiya,
I have a blast database for GO terms in blast2go which includes IDs for all the isoforms of the genes. When I make my gene lists for GO enrichment analysis, ithe list compiler pulls the IDs of all the isoforms associated with the genes of interest (DEG). My question is : Should I
(a) De-duplicate the list so just one ID per gene is input into the GO enrichment analysis
or
(b) Submit the full list containing the IDs of all the isoforms for each gene of interest?
I have run both and the de-duplicated list as I anticipated contains less GO terms than the full list containing all the isoforms.
I feel like it is correct to run the full list of IDs (option b) because otherwise the enrichment test could be negatively biased by terms where there are lots of isoforms present in the database, but only one is submitted - making it look like the GO term is less enriched than it actually is (I hope that makes sense).
Best wishes and any opinions/advice are greatly appreciated
I have a blast database for GO terms in blast2go which includes IDs for all the isoforms of the genes. When I make my gene lists for GO enrichment analysis, ithe list compiler pulls the IDs of all the isoforms associated with the genes of interest (DEG). My question is : Should I
(a) De-duplicate the list so just one ID per gene is input into the GO enrichment analysis
or
(b) Submit the full list containing the IDs of all the isoforms for each gene of interest?
I have run both and the de-duplicated list as I anticipated contains less GO terms than the full list containing all the isoforms.
I feel like it is correct to run the full list of IDs (option b) because otherwise the enrichment test could be negatively biased by terms where there are lots of isoforms present in the database, but only one is submitted - making it look like the GO term is less enriched than it actually is (I hope that makes sense).
Best wishes and any opinions/advice are greatly appreciated