Hi,
I would like to ask for some opinion and advice related to the different available GTF-file sources for annotated genes.(mm10, but others as well)
I did some search to avoid duplicate entries, (sorry if It is still one).
The topic I would like to discuss is briefly mentioned at other forums, but was never discussed thoroughly that gave a satisfactory explanation.
I wanted to download GTF files (mm10) from UCSC genome browser to have reference genes and transcript variants for differential transcript variant expression and splicing analyses.
However, it looks like no matter how I was setting up the table browser (UCSC genes, NCBI refseq, etc) the obtained GTF files from UCSC browser were not suitable for such analyses.
I noticed that these GTF files (from UCSC) treat each transcript variants as a separate gene, since the "transcript ID" is identical to "gene ID" in these files. (did I do something wrong?)
For these analyses I need a GTF file where each gene ID is linked ( aka repeated ) to multiple transcript variants (if there are variants of course). The only source I found such GTF file is Gencode and Ensembl.
However, these files contain approx 50000 genes and 150000 transcript variants which I found too much due to predictions. While the UCSC has approx 38000 entries which might be less redundant and speculative? (no idea)
I would like to ask for some advice about where to find / how to make an optimal GTF file that would be suitable for differential splicing/ transc. variant expression analyses?
Would you recommend to avoid using UCSC GTF files for expression analyses in general?
Thank you for your help.
Best.
I would like to ask for some opinion and advice related to the different available GTF-file sources for annotated genes.(mm10, but others as well)
I did some search to avoid duplicate entries, (sorry if It is still one).
The topic I would like to discuss is briefly mentioned at other forums, but was never discussed thoroughly that gave a satisfactory explanation.
I wanted to download GTF files (mm10) from UCSC genome browser to have reference genes and transcript variants for differential transcript variant expression and splicing analyses.
However, it looks like no matter how I was setting up the table browser (UCSC genes, NCBI refseq, etc) the obtained GTF files from UCSC browser were not suitable for such analyses.
I noticed that these GTF files (from UCSC) treat each transcript variants as a separate gene, since the "transcript ID" is identical to "gene ID" in these files. (did I do something wrong?)
For these analyses I need a GTF file where each gene ID is linked ( aka repeated ) to multiple transcript variants (if there are variants of course). The only source I found such GTF file is Gencode and Ensembl.
However, these files contain approx 50000 genes and 150000 transcript variants which I found too much due to predictions. While the UCSC has approx 38000 entries which might be less redundant and speculative? (no idea)
I would like to ask for some advice about where to find / how to make an optimal GTF file that would be suitable for differential splicing/ transc. variant expression analyses?
Would you recommend to avoid using UCSC GTF files for expression analyses in general?
Thank you for your help.
Best.
Comment