Seqanswers Leaderboard Ad

**westerman** · 09-23-2015, 10:34 AM

Originally posted by Naphtap View Post

Now I want to make an entire BLAST database that contains all fecal Bacteroidales sequences. I don't want to copy and paste all the Bacteroidales FASTA sequences one by one since it would be time-consuming. Is there a way to streamline that process using the BLAST program?

Well, where are you getting your sequences from? Usually you can obtain all of them together in one FastA file which you then give to Blast.

Also, I'm working with qPCR assays that target these Bacteroidales populations. I've already done an in silico analysis of each primer pair I'm working with using Primer Blast. Is there a way to perform Primer Blast on UNIX and then paste all the matching sequences onto a fasta file?

I do not use this NCBI service but given that it is a combination of Primer3 and Blast I do not think that there is a standalone program available for it.

**GenoMax** · 09-23-2015, 11:13 AM

@Naphtap: You should be able to subset the nt/nr databases using the blastdb_alias program that is part of the Blast+ suite. That will need gi# for all Bacteriodales you are interested in.

**Naphtap** · 09-23-2015, 11:52 AM

Originally posted by GenoMax View Post

@Naphtap: You should be able to subset the nt/nr databases using the blastdb_alias program that is part of the Blast+ suite. That will need gi# for all Bacteriodales you are interested in.

I have the basic code for blasting my sequences (YourSequences.fasta will be replaced with the name of the fasta file I'm working with) which is shown below.

/usr/local/blast/2.2.29/bin/blastn -query YourSequences.fasta -db yourDatabase -out blastn.outfmt6 -evalue 1e-20 -num_threads 6 -max_target_seqs 1 -outfmt 6

I also referenced the BLAST manual and they do indeed give me the ability to subset the nt/nr databases, with the example command below:
$ blastdb_aliastool -db nematode_mrna -gilist c_elegance_mrna.gi -dbtype \
nucl -out c_elegance_mrna -title "C. elegans refseq mRNA entries"

Now, where it says c_elegance_mrna.gi and nematode_mrna, I can replace that with the Bacteroidales gis I'm working with yes? Also, do I have to integrate this command into the blast command I placed above, or does running this command already limit the blast search to the subset I set beforehand?

Originally posted by westerman View Post

Well, where are you getting your sequences from? Usually you can obtain all of them together in one FastA file which you then give to Blast.

I'm currently looking for the Bacteroidales fasta, but I can definitely look for those ones as well.

**GenoMax** · 09-23-2015, 12:02 PM

blastdb_alias program will create a subset blast database that you will then use for your subsequent blast searches. You will run this command independently.

You can grab a list of bacteriodales GI's by doing an entrez search such as this http://www.ncbi.nlm.nih.gov/nuccore/?term=Bacteroidales.

Click on "Send to" and then choose "File" --> "Format" --> "gi list". That will download a file with all gi's which you can then feed blastdb_alis tool for the -gilist option.

You can also get a fasta file for these sequences using blastdbcmd program and the gi list file.

**Naphtap** · 09-23-2015, 03:40 PM

Originally posted by GenoMax View Post

blastdb_alias program will create a subset blast database that you will then use for your subsequent blast searches. You will run this command independently.

You can grab a list of bacteriodales GI's by doing an entrez search such as this http://www.ncbi.nlm.nih.gov/nuccore/?term=Bacteroidales.

Click on "Send to" and then choose "File" --> "Format" --> "gi list". That will download a file with all gi's which you can then feed blastdb_alis tool for the -gilist option.

You can also get a fasta file for these sequences using blastdbcmd program and the gi list file.

Ok so I performed the following command for the blastdb_alias program as follows, since it can create a subset blast database I can use for future blast searches (this one's for all 16S rRNA genes in the BLAST database):

blatsdb_aliastool -db not -gilist /home/gradstd5/16s.gi.txt -dbtype nucl -out 16S_rrna -title "16S_rRNA"

After running the command, 16S_rrna.n.gil file was among the outputs. First, what is this binary format the program's telling me?

"Converted 8847324 GIs from /home/gradstd5/16S.gi.txt to binary format in 16s_rrna.n.gil"

Also, it gives me this error:

"BLAST Database error: BLASTDB alias file creation failed. Some referenced files may be missing." Is it something wrong with the .gi.txt file I downloaded from BLAST after searching for all 16S_rrna genes, related to the way I wrote the command, or is it something else?

**susanklein** · 09-23-2015, 05:43 PM

"BLAST Database error: BLASTDB alias file creation failed. Some referenced files may be missing."

sounds like just what its says, those GIs were missing from the DB. Depends on the relative ages of the DBs you use. It shouldn't be very may GIs so I probably wouldn't worry about it.

Personally I don't like using the aliastool, it means you have to muck around with large DBs to get the tiny subset you actually want. I prefer to just do the enrez search and download all the fastas then make the db.

Sorry, I don't know what the binary file is, or how to get the aliastool output into a db.. but you could try doing it the way I do. You can also use esearch and efetch from NCBI (google it to install on your system). For example the following search (simply copied search text from the box in Entrez web search) gives 5337 files.

esearch -db nucleotide -query '("Bacteroides"[Organism] OR Bacteroides[All Fields]) AND (16S[All Fields] AND ribosomal[All Fields]) NOT genome[All Fields]' | efetch -format fasta > bact16s.fna

S.

**susanklein** · 09-23-2015, 05:53 PM

Just an afterthought - my experience with 16S and taxonomic identification is that within Bacteroides you are unlikely to get better resolution than the genus level. Are you sure that 16S is the best approach. If you are looking at a higher taxonomic level to profile human Vs animal Faeces with e.g. some indicator species then it might work ok.

S.

**Naphtap** · 09-23-2015, 05:59 PM

Originally posted by susanklein View Post

Just an afterthought - my experience with 16S and taxonomic identification is that within Bacteroides you are unlikely to get better resolution than the genus level. Are you sure that 16S is the best approach. If you are looking at a higher taxonomic level to profile human Vs animal Faeces with e.g. some indicator species then it might work ok.

S.

Since I'm working with 16S sequences, I also considered just obtaining all of the 16S sequences instead from the BLAST database using the commands I used today. I'm also going to get more sequencing data from the other water and fecal samples I'm working with, so I was also considering performing comparative analyses like PCA and clustering analysis with R after the sequencing annotations are complete.

The things I want to do with Bacteroidales (the order) isn't set in stone. It was just an idea I was thinking of because it's one of the fecal indicator markers I'm working with in my qPCR assays. Should I go this route, I would focus on seeing which types of Bacteroidales are in the water samples (human, chicken, pig, etc). Plus, I don't have to work with just Bacteroidales indicator. I also want to search for other indicators and perhaps waterborne pathogens if I go this way.

Regardless of the direction I take, I'd first trim the low-quality sequences from my Illumina Sequencing file for the forward and reverse reads (FASTQ format) and merge the corresponding reads together.

**Naphtap** · 09-23-2015, 06:07 PM

As an addendum, the server I'm working with already has edirect, but the unix shell is telling me they don't have esearch...

EDIT: Never mind, I solved it.

**GenoMax** · 09-24-2015, 03:10 AM

Originally posted by Naphtap View Post

Regardless of the direction I take, I'd first trim the low-quality sequences from my Illumina Sequencing file for the forward and reverse reads (FASTQ format) and merge the corresponding reads together.

It would be better to merge the data first and then trim. BBMerge (from BBMap) or FLASH are options to look at.

**GenoMax** · 09-24-2015, 03:15 AM

@Naphtap: If you are only interested in rRNA then you could download them from RNACentral: http://rnacentral.org/search?q=expert_db:%22RDP%22. Use the "Download" button on the right to get a FASTA format file.

**Naphtap** · 09-24-2015, 11:52 AM

I feel as though I'm really close to performing the blast search, but this error keeps getting in my way:

BLAST Database error: No alias or index file found for nucleotide database [16s.fasta] in search path [/home/gradstd5::]

I generated the 16s.fasta with the FASTA format file I got from rnacentral link posted by GenoMax. I then placed the following command, receiving the above error as a result:

/usr/local/blast/2.2.29/bin/blastn -query unibac.fasta -db 16s.fasta -out blastn.outfmt6 -evalue 1e-5 -num_threads 6 -max_target_seqs 1 -outfmt 6

**GenoMax** · 09-24-2015, 12:01 PM

Can you post a listing of files that start with "16s.fasta*"? If the blast index got created correctly then there should be multiple files with that basename.

**Naphtap** · 09-24-2015, 12:07 PM

Originally posted by GenoMax View Post

Can you post a listing of files that start with "16s.fasta*"? If the blast index got created correctly then there should be multiple files with that basename.

If I perform just ls 16s.fasta on its own on Unix, then I only have this one file.

However, I also tried formatdb -p F -i 16s.fasta -n 16sBLASTdb, which resulted in the creation of three files:

16sBLASTdb.nhr
16sBLASTdb.nin
16sBLASTdb.nsq

I know formatdb's used to construct a BLAST database from a fasta file, which is what I tried to do here.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Questions about Blast on Unix Shell

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News