Hello,
We have sequenced an E coli strain using 454 and now we want to publish its genome.
One basic, but apparently fundamental question I have is: should I prefer a de novo approach or a mapping approach to some ref. strain?
I have enough data for de novo analysis. De novo assembling gave me about 90 contigs, some of them pretty long which are easy to work with, some medium sized and a few were small ones.
Mapping of reads with a ref. E coli strain (K-12) tells that our data cover 94% of the E coli genome and 88% of reads were successfully mapped.
However, when I tried to map the de novo contigs (and not the reads!), I got only 20% of the genome covered.
So I wonder what to do. On one side, I am looking for novell genes on this particular strain. I have no idea (yet) what strain is the closest to it phylogenetically. So de novo approach seems the right thing to do.
But then, I have trouble mapping the contigs in their predicted order.
On the other hand, ref. mapping makes life much easier. The genome is almost entirely covered. I have 12% of reads unmapped which are apparently an insertion and a plasmid. The problem is, am I loosing data here?
De novo x mapping is apparently giving my different contigs (as far as I understand it), which will give me different annotations. Or not? I am a microbiologist trying to understand some bioinformatics. The people I am working with on the analysis do not have a straight answer for me on this issue. I hope someone here has experience on bacteria genome analysis and is willing to give me some help on this question. What is the most acceptable thing to do?
Thanks a lot!
We have sequenced an E coli strain using 454 and now we want to publish its genome.
One basic, but apparently fundamental question I have is: should I prefer a de novo approach or a mapping approach to some ref. strain?
I have enough data for de novo analysis. De novo assembling gave me about 90 contigs, some of them pretty long which are easy to work with, some medium sized and a few were small ones.
Mapping of reads with a ref. E coli strain (K-12) tells that our data cover 94% of the E coli genome and 88% of reads were successfully mapped.
However, when I tried to map the de novo contigs (and not the reads!), I got only 20% of the genome covered.
So I wonder what to do. On one side, I am looking for novell genes on this particular strain. I have no idea (yet) what strain is the closest to it phylogenetically. So de novo approach seems the right thing to do.
But then, I have trouble mapping the contigs in their predicted order.
On the other hand, ref. mapping makes life much easier. The genome is almost entirely covered. I have 12% of reads unmapped which are apparently an insertion and a plasmid. The problem is, am I loosing data here?
De novo x mapping is apparently giving my different contigs (as far as I understand it), which will give me different annotations. Or not? I am a microbiologist trying to understand some bioinformatics. The people I am working with on the analysis do not have a straight answer for me on this issue. I hope someone here has experience on bacteria genome analysis and is willing to give me some help on this question. What is the most acceptable thing to do?
Thanks a lot!
Comment