I posted this over at Biostars but it has not gotten much response so I am posting it here too.
I work on Mosquitoes which have notoriously bad annotations in their UTRs. I have a bunch of RNA-seq data that I want to use to help me better define the UTRs in the genes being expressed in my tissues. This is important because I want to study the putative promoter regions. As it is now, MANY regions 5' to the annotated start of the transcript are either UTR or intron due to the gene model missing the first exon (an all UTR exon).
My plan is to use a few de novo transcript assembly programs to come up with putative cDNA transcripts, then to map these to the genomes to 'annotate' the gene structures and then merge those into the official gtf annotations. For my specific use, the most important things are the border regions (start/stop of transcription), but I plan to publish the amended annotations in an effort to make the work more reproducible. And if they are going to be published, I would like the "guts" to be as accurate with regard to splicing as is reasonably possible.
I was planning to use exonerate, but in the last few weeks, I have read about other options such as using GMAP or simply using BLAT.
Is there a feeling of which mapper/method produces the best results at this time?
I work on Mosquitoes which have notoriously bad annotations in their UTRs. I have a bunch of RNA-seq data that I want to use to help me better define the UTRs in the genes being expressed in my tissues. This is important because I want to study the putative promoter regions. As it is now, MANY regions 5' to the annotated start of the transcript are either UTR or intron due to the gene model missing the first exon (an all UTR exon).
My plan is to use a few de novo transcript assembly programs to come up with putative cDNA transcripts, then to map these to the genomes to 'annotate' the gene structures and then merge those into the official gtf annotations. For my specific use, the most important things are the border regions (start/stop of transcription), but I plan to publish the amended annotations in an effort to make the work more reproducible. And if they are going to be published, I would like the "guts" to be as accurate with regard to splicing as is reasonably possible.
I was planning to use exonerate, but in the last few weeks, I have read about other options such as using GMAP or simply using BLAT.
Is there a feeling of which mapper/method produces the best results at this time?