Hi,
TopHat can accept user-specified junctions via a GFF3 file, so I'm trying to find a GFF3 file that represents the UCSC Gene model for Human (hg19).
There are a lot of posts asking for similar files, and the gist of the replies seems to be that the SONG gtf2gff3 perl script can be used to convert an Ensembl GTF file to a valid GFF3, but this doesn't work on the UCSC GTF files.
Does anybody know of a reliable tool for creating GFF3 from UCSC GTF?
If I need to write my own, would anyone be comfortable enough with TopHat or the GFF3 format to help answer these:
(1) does TopHat care if each transcript is modeled independently of the other transcripts in its cluster? I suspect the proper way to create a GFF3 would be to model the UCSC clusters (from knownIsoforms) as top level gene features, with the transcripts (from knownGene) modeled as child features. A side effect of this is that exon definitions can be shared across transcripts. If I ignore the top level and model transcripts independently, will TopHat be happy?
(2) does TopHat need the GFF records to be sorted in some way?
Thanks,
Bio.X2Y
TopHat can accept user-specified junctions via a GFF3 file, so I'm trying to find a GFF3 file that represents the UCSC Gene model for Human (hg19).
There are a lot of posts asking for similar files, and the gist of the replies seems to be that the SONG gtf2gff3 perl script can be used to convert an Ensembl GTF file to a valid GFF3, but this doesn't work on the UCSC GTF files.
Does anybody know of a reliable tool for creating GFF3 from UCSC GTF?
If I need to write my own, would anyone be comfortable enough with TopHat or the GFF3 format to help answer these:
(1) does TopHat care if each transcript is modeled independently of the other transcripts in its cluster? I suspect the proper way to create a GFF3 would be to model the UCSC clusters (from knownIsoforms) as top level gene features, with the transcripts (from knownGene) modeled as child features. A side effect of this is that exon definitions can be shared across transcripts. If I ignore the top level and model transcripts independently, will TopHat be happy?
(2) does TopHat need the GFF records to be sorted in some way?
Thanks,
Bio.X2Y
Comment