Hi all
We are having problems predicting splice sites from our Solid rna-seq data. We have a draft genome (125Mb, a eukaryote) assembled from 454-data and are now trying to map our Solid reads to this genome to predict splice sites. The idea is to use these predicted splice sites to make intron hints for the gene finder Augustus to create correct gene models.
We are currently trying Bowtie/Tophat, but get weird results. For example, when working with a subset of our reads we find some splice sites, but these are not found when we add more data. Also, we have earlier tried Corona Light together with Splitseek, and Bowtie/Tophat does not find sites that were found with Corona Light/Splitseek. On the other hand, Corona Light/Splitseek is timeconsuming/awkward to run and often reports splice sites that are a few bp off, so that is not an ideal choice either.
This cannot be an uncommon situation, so what are the rest of you doing in these situations? No closely related genomes have been sequenced.
We are having problems predicting splice sites from our Solid rna-seq data. We have a draft genome (125Mb, a eukaryote) assembled from 454-data and are now trying to map our Solid reads to this genome to predict splice sites. The idea is to use these predicted splice sites to make intron hints for the gene finder Augustus to create correct gene models.
We are currently trying Bowtie/Tophat, but get weird results. For example, when working with a subset of our reads we find some splice sites, but these are not found when we add more data. Also, we have earlier tried Corona Light together with Splitseek, and Bowtie/Tophat does not find sites that were found with Corona Light/Splitseek. On the other hand, Corona Light/Splitseek is timeconsuming/awkward to run and often reports splice sites that are a few bp off, so that is not an ideal choice either.
This cannot be an uncommon situation, so what are the rest of you doing in these situations? No closely related genomes have been sequenced.
Comment