Hi all, I'm fairly new to sequencing analysis and I would like to get some feedback on an experiment I'm planning: I have six metagenomic samples (gut bacteria, MiSeq reads, 2x150bp, after filtering about 1,000,000 reads per sample with one outlier having only about 100,000 reads). I am interested in the presence/absence of a specific (and small) group of genes/proteins. As far as I know comparing at amino acid level, i.e. using blastx or so, is standard in metagenomics. Due to time constraints and limited computational resources, I cannot blastx against a large database as e.g. NCBI nr. Would it be ok to blastx only against the genes/proteins I'm interested in? (As a control I would add a housekeeping gene to the database.) How would I determine which e-value or better (?) bitscore cutoff to use?
The six samples belong to two groups (three samples each): In order to determine whether the genes are significantly more present/abundant in group A than in group B, I plan on comparing the fraction of blasted reads per sample that map to the genes. Would that be ok? Is there a better way to do this?
Has anybody done or read about anything like this before?
Thanks so much in advance.
The six samples belong to two groups (three samples each): In order to determine whether the genes are significantly more present/abundant in group A than in group B, I plan on comparing the fraction of blasted reads per sample that map to the genes. Would that be ok? Is there a better way to do this?
Has anybody done or read about anything like this before?
Thanks so much in advance.