Dear all,
I am currently looking for different databases for assignment of taxonomy to my sequences from V3-V4 Illumina sequencing.
I found a blastdbcmd command to convert the NCBI 16S database files to fasta format. However, the output of the command has only 18k sequences.
This is surprising to me because I found 90k reference sequences from Greengenes database and 731k reference sequences in SILVA database.
Does the NCBI 16S database actually organize the sequences into certain similarity percentage and thus giving a lower number of reference sequences in the database? or there is something wrong with the command (below) I used to convert the database to fasta format?
blastdbcmd -db 16Smicrobial -out 16S_microbial.fasta -outfmt %f -entry "all"
Thanks a lot!
I am currently looking for different databases for assignment of taxonomy to my sequences from V3-V4 Illumina sequencing.
I found a blastdbcmd command to convert the NCBI 16S database files to fasta format. However, the output of the command has only 18k sequences.
This is surprising to me because I found 90k reference sequences from Greengenes database and 731k reference sequences in SILVA database.
Does the NCBI 16S database actually organize the sequences into certain similarity percentage and thus giving a lower number of reference sequences in the database? or there is something wrong with the command (below) I used to convert the database to fasta format?
blastdbcmd -db 16Smicrobial -out 16S_microbial.fasta -outfmt %f -entry "all"
Thanks a lot!
Comment