Hi all,
Wondering if anyone could offer me some suggestions on how to approach an analysis. Currently, I am examining a transcriptome and looking at some "known" elements of that transcriptome and looking to see if I can identify any alternative transcripts/evidence for gene duplication in the predicted transcripts for these known elements. This is a non-model species and there is not enough genome sequence for me to effectively map my predicted transcripts(i.e. ESTs, basically) in order to predict splice site. I can identify regions of the transcripts that are strong matches for conserved protein domains, however. I can use blast to do this. Currently, I am using a round-about approach where after using blast results to identify sequences of interests, I then have to go through by hand (or just a very hands-on fashion) and generate clusters and/or alignments for these sequences and try to figure out where they overall, to what degree and if they might represent transcript fragments, novel isoforms or possible gene duplication events. What I kind of would like to do is to align these transcript sequences to some kind of genome-like track, but for proteins, in a some kind of viewer. The EST alignments, however, have proved to be a challenge due to the breaks and gap in alignments. Does anyone have any thoughts on a good way to generate the kind of data I would like (basicaly, use EST alignment to identify exons and alignment of that data to a protein structure to identify the coding regions? Or maybe a better tool/approach to address this type of question with the kind of de novo transcriptome data that I have?
I appreciate any suggestions.
Cheers,
Nate
Wondering if anyone could offer me some suggestions on how to approach an analysis. Currently, I am examining a transcriptome and looking at some "known" elements of that transcriptome and looking to see if I can identify any alternative transcripts/evidence for gene duplication in the predicted transcripts for these known elements. This is a non-model species and there is not enough genome sequence for me to effectively map my predicted transcripts(i.e. ESTs, basically) in order to predict splice site. I can identify regions of the transcripts that are strong matches for conserved protein domains, however. I can use blast to do this. Currently, I am using a round-about approach where after using blast results to identify sequences of interests, I then have to go through by hand (or just a very hands-on fashion) and generate clusters and/or alignments for these sequences and try to figure out where they overall, to what degree and if they might represent transcript fragments, novel isoforms or possible gene duplication events. What I kind of would like to do is to align these transcript sequences to some kind of genome-like track, but for proteins, in a some kind of viewer. The EST alignments, however, have proved to be a challenge due to the breaks and gap in alignments. Does anyone have any thoughts on a good way to generate the kind of data I would like (basicaly, use EST alignment to identify exons and alignment of that data to a protein structure to identify the coding regions? Or maybe a better tool/approach to address this type of question with the kind of de novo transcriptome data that I have?
I appreciate any suggestions.
Cheers,
Nate