Hi everyone
I am often mapping reads from shotgun data to reference databases, like Illumina 150bp reads to a virulence gene catalog or an AMR gene catalog etc using software like bowtie, KMA etc
It is a fast and efficient way to identify genes of interest in metagenomic reads without having to assemble first.
However, I often see that some of the matches cover only a small portion of the target gene/template. Example; for a 3k gene, there may be only a 150bp region covered, suggesting that the template coverage is only 5%. In these cases, I would be inclined to ignore that result, since having mapped only that small part of the gene does not mean that the gene is present.
My question is what would you consider a reasonable threshold to say that there may be the gene of interest in my samples? A minimum of a 30% coverage?
When using blast (or diamond blast) I always consider as true hits only the ones that have at least 70% coverage of the template length. But then again, different strokes. The samples that I am dealing with are shotgun metagenomic reads from complex (e.g. soil) samples, so a 70% is way too strict.
Thank you in advance for your suggestions
P
I am often mapping reads from shotgun data to reference databases, like Illumina 150bp reads to a virulence gene catalog or an AMR gene catalog etc using software like bowtie, KMA etc
It is a fast and efficient way to identify genes of interest in metagenomic reads without having to assemble first.
However, I often see that some of the matches cover only a small portion of the target gene/template. Example; for a 3k gene, there may be only a 150bp region covered, suggesting that the template coverage is only 5%. In these cases, I would be inclined to ignore that result, since having mapped only that small part of the gene does not mean that the gene is present.
My question is what would you consider a reasonable threshold to say that there may be the gene of interest in my samples? A minimum of a 30% coverage?
When using blast (or diamond blast) I always consider as true hits only the ones that have at least 70% coverage of the template length. But then again, different strokes. The samples that I am dealing with are shotgun metagenomic reads from complex (e.g. soil) samples, so a 70% is way too strict.
Thank you in advance for your suggestions
P