I have RNA-seq data and need the count of reads at gene level. To do this using BEDTOOLS I need an gene annotation file. I download the hg18 (group: Genes and Gene Prediction Tracks ; track: UCSC Genes) from UCSC in BED format. However, it seems that each line in this file represent an isoform instead of a gene (see the two lines below have the same position but different IDs).
chr7 50625258 50817758 uc003tpk.1 0 - 50628142 50767513 0 19
chr7 50625258 50817758 uc010kzb.1 0 - 50628142 50709814 0 18
What I need actually is each line for a gene. How can I get that? Thanks~
chr7 50625258 50817758 uc003tpk.1 0 - 50628142 50767513 0 19
chr7 50625258 50817758 uc010kzb.1 0 - 50628142 50709814 0 18
What I need actually is each line for a gene. How can I get that? Thanks~
Comment