I used IsoSCM to identify alternative 3' UTRs with different lengths in different tissue types(output files are GTFs containing tissue specific 3' UTRs), then I want to quantify the differential usage of these alternative 3' UTRs across different tissue types. I know that neither Ensembl nor UCSC GTF reference file contains the tissue specific 3' UTRs, so I tried to re-annotate IsoSCM output GTF files with reference GTF. The idea is that I want to get a GTF file containing all tissue specific 3' UTRs, which can be used for the following dexseq_count step.
But, if multiple 3' UTRs/terminal exons (same start site, different stop sites) are annotated with same transcript ID, dexseq_prepare_annotation will only include the longest one in the GFF output.
Here is an example of two possible 3' UTRs of gene Fubp1 in thymus GTF file:
Two different 3' UTRs of gene Fubp1 in liver GTF file:
One more 3' UTR of gene Fubp1 in spleen GTF file:
After merging them and put through dexseq_prepare_annotation step, I only got one exon to represent 3' UTR:
So, I am wondering how can I include all alternative 3' UTRs in the GFF file, then I can use it to count for the usage of these alternative 3' UTRs ??
Cheers
But, if multiple 3' UTRs/terminal exons (same start site, different stop sites) are annotated with same transcript ID, dexseq_prepare_annotation will only include the longest one in the GFF output.
Here is an example of two possible 3' UTRs of gene Fubp1 in thymus GTF file:
Code:
chr3 sol exon 152232395 152233253 . + . gene_id "Fubp1"; gene_name "Fubp1"; p_id "P24872"; transcript_id "NM_057172"; tss_id "TSS5999"; chr3 sol exon 152232395 152232485 . + . gene_id "Fubp1"; gene_name "Fubp1"; p_id "P24872"; transcript_id "NM_057172"; tss_id "TSS5999";
Code:
chr3 sol exon 152232395 152233965 . + . gene_id "Fubp1"; gene_name "Fubp1"; p_id "P24872"; transcript_id "NM_057172"; tss_id "TSS5999"; chr3 sol exon 152232395 152233885 . + . gene_id "Fubp1"; gene_name "Fubp1"; p_id "P24872"; transcript_id "NM_057172"; tss_id "TSS5999";
Code:
chr3 sol exon 152232395 152232532 . + . gene_id "Fubp1"; gene_name "Fubp1"; p_id "P24872"; transcript_id "NM_057172"; tss_id "TSS5999";
Code:
chr3 dexseq_prepare_annotation.py exonic_part 152232395 152233965 . + . transcripts "NM_057172"; exonic_part_number "018"; gene_id "Fubp1"
Cheers