We have several samples of DNA sampled from the ocean and want to make metagenomic libraries for a Illumina 150PE run. We are debating between doing overlapping reads or an insert size of 400-500bp.
Our end goal is to identify particular genes of interest (estimated to be present in 20-50% of the bacterial population of the ocean).
The way I see it - the pros of overlapping reads is that a 150PE run will give us reads of ~250bp after you merge the L and R side, and that is more easily blastable. The cons are data wasted and more difficulty assembling. The pros of a reasonable sized insert are that the PE data are likely easier to assemble, but you get less info per fragment that is blastable, and I am uncertain as to our ability to get a meaningful assembly (ie will assembly be so fragmented that it will likely not assemble our gene and we will end up having to identify unassembled fragments).
Many thanks for everyone's insights!
noa
Our end goal is to identify particular genes of interest (estimated to be present in 20-50% of the bacterial population of the ocean).
The way I see it - the pros of overlapping reads is that a 150PE run will give us reads of ~250bp after you merge the L and R side, and that is more easily blastable. The cons are data wasted and more difficulty assembling. The pros of a reasonable sized insert are that the PE data are likely easier to assemble, but you get less info per fragment that is blastable, and I am uncertain as to our ability to get a meaningful assembly (ie will assembly be so fragmented that it will likely not assemble our gene and we will end up having to identify unassembled fragments).
Many thanks for everyone's insights!
noa