Hi everyone, I am using QIIME to analyse my 16S V3-V4 gene illumina dataset.
I have 2 groups of data and each group consists 10 datasets.
I have filtered and combine all 20 set of data and resulting an input file with 3 million of reads.
After I executed pick_open_reference_otus.py against 97_otus.fasta from greengenes.
There is a file 'rep_set.fna' in the output otus directory.
There are over 20k sequence in the rep_set.fna
Only around 300 sequences match with greengenes.
And around 300 otus are New Reference OTU.
Remaining 17k otus are New clean up reference OTU.
Is is normal to have this high no. of denovo OTU?
How should I deal with these denovo OTU? Because they cannot be assigned to any taxonomy in greengenes reference set, what further analyses can be done on them? Thanks!
I have 2 groups of data and each group consists 10 datasets.
I have filtered and combine all 20 set of data and resulting an input file with 3 million of reads.
After I executed pick_open_reference_otus.py against 97_otus.fasta from greengenes.
There is a file 'rep_set.fna' in the output otus directory.
There are over 20k sequence in the rep_set.fna
Only around 300 sequences match with greengenes.
And around 300 otus are New Reference OTU.
Remaining 17k otus are New clean up reference OTU.
Is is normal to have this high no. of denovo OTU?
How should I deal with these denovo OTU? Because they cannot be assigned to any taxonomy in greengenes reference set, what further analyses can be done on them? Thanks!