Here is my project:
- I have a group X of unknown strains of species A taken from, say, the oral cavities of a group of people. Each individual gives rise to a single sample. The bacterial species is cultured and then sequenced on MiSeq.
- Similarly to above I have a group of unknown strains Y of the same species but taken from another body compartment - say the gut. This may (or may not) be the same people as above.
The task is to find whether there is anything that systematically differs between the first and the second group.
I was first thinking of picking a reference and than doing alignment for each sample. But this poses the problem of picking the right reference. As we have dozens of samples this may not even be the same reference. Is there a way to compare the samples as groups so that we were able to check whether there are any
- systematic difference in gene presence (e.g. group X has a gene that is not present in group Y)
- systematic difference in genes that are present in both groups (e.g. gene B has a particular SNP/indel in group X but not in group Y)
We are, essentially, trying to compare groups. Is there a way to answer these questions on a group level without doing sample level analysis.
And if we still need to do a sample level analysis do we need to go for de novo assembly or reference-based alignment? Another question is - we have just MiSeq data. Do we need longer reads (e.g. PacBio)?
Essentially, what we are trying to do is some sort of GWA on bacteria without knowing the sequences of the individual strains. How to best approach this task? Can you suggest a pipeline? Do you know of any papers that have done this?
ps. I've posted a similar post on biostars but haven't received satisfactory answers.
- I have a group X of unknown strains of species A taken from, say, the oral cavities of a group of people. Each individual gives rise to a single sample. The bacterial species is cultured and then sequenced on MiSeq.
- Similarly to above I have a group of unknown strains Y of the same species but taken from another body compartment - say the gut. This may (or may not) be the same people as above.
The task is to find whether there is anything that systematically differs between the first and the second group.
I was first thinking of picking a reference and than doing alignment for each sample. But this poses the problem of picking the right reference. As we have dozens of samples this may not even be the same reference. Is there a way to compare the samples as groups so that we were able to check whether there are any
- systematic difference in gene presence (e.g. group X has a gene that is not present in group Y)
- systematic difference in genes that are present in both groups (e.g. gene B has a particular SNP/indel in group X but not in group Y)
We are, essentially, trying to compare groups. Is there a way to answer these questions on a group level without doing sample level analysis.
And if we still need to do a sample level analysis do we need to go for de novo assembly or reference-based alignment? Another question is - we have just MiSeq data. Do we need longer reads (e.g. PacBio)?
Essentially, what we are trying to do is some sort of GWA on bacteria without knowing the sequences of the individual strains. How to best approach this task? Can you suggest a pipeline? Do you know of any papers that have done this?
ps. I've posted a similar post on biostars but haven't received satisfactory answers.