Dear all,
I am currently working on 454 transcriptome data. I have assembled the reads using the current newbler version, MIRA, and will try Velvet/Oases later on.
However, I am still not sure how to decide on one assembly to use for all further analysis. Comparing the basic statistics like contig length, reads per contig, N50 and so on only gives hints to which may be the better assembly. From these stats I would have choosen the newbler assembly, but they do not really tell me how well the transcripts are assembled. Alternative splicing and gene copies can make for bad assemblies, for example.
Some methods I thought of that may tell me more are looking at single genes that have a known sequence, and check how the transcripts for these genes are assembled. However, that has to be done manually, and is only possible for a few genes.
Maybe one could also blast all contigs and check which assembly gives more or better hits, or how many hits one contig has on average. Too many hits per contig could maybe mean that multiple transcripts are assembled to one contig.
How do you check assembly quality beyond the basic stats? Without a sequenced genome, of course.
Cheers,
Till
I am currently working on 454 transcriptome data. I have assembled the reads using the current newbler version, MIRA, and will try Velvet/Oases later on.
However, I am still not sure how to decide on one assembly to use for all further analysis. Comparing the basic statistics like contig length, reads per contig, N50 and so on only gives hints to which may be the better assembly. From these stats I would have choosen the newbler assembly, but they do not really tell me how well the transcripts are assembled. Alternative splicing and gene copies can make for bad assemblies, for example.
Some methods I thought of that may tell me more are looking at single genes that have a known sequence, and check how the transcripts for these genes are assembled. However, that has to be done manually, and is only possible for a few genes.
Maybe one could also blast all contigs and check which assembly gives more or better hits, or how many hits one contig has on average. Too many hits per contig could maybe mean that multiple transcripts are assembled to one contig.
How do you check assembly quality beyond the basic stats? Without a sequenced genome, of course.
Cheers,
Till
Comment