Dear all,
I have assembled a transcriptome from > 100 Mio 2x105 bp reads and obtained a nice set of sequences without a reference genome.
Some of them are very long (> 20 kb) and a blastn/blastx search revealed that they seem to correspond to transcripts that have retained intronic sequences. [I suspect that the velvet/oases assembly is most likely correct, as unprocessed transcripts are known contaminants of polyA-selected libraries. I don't have a reference genome to proof it, though.]
Still, I was wondering if anybody had any idea on how to remove such primary transcript sequences from my output before translating the sequences into putative proteins ?
Thanks for any input,
Thomas
I have assembled a transcriptome from > 100 Mio 2x105 bp reads and obtained a nice set of sequences without a reference genome.
Some of them are very long (> 20 kb) and a blastn/blastx search revealed that they seem to correspond to transcripts that have retained intronic sequences. [I suspect that the velvet/oases assembly is most likely correct, as unprocessed transcripts are known contaminants of polyA-selected libraries. I don't have a reference genome to proof it, though.]
Still, I was wondering if anybody had any idea on how to remove such primary transcript sequences from my output before translating the sequences into putative proteins ?
Thanks for any input,
Thomas
Comment