Seqanswers Leaderboard Ad

**PFaucon** · 12-19-2013, 03:59 PM

Sorry I didn't explain that, nr_humans is an alias to the nr database created by applying the original gi list, I followed the instructions here:

blastdb_aliastool -gilist gi.txt -db nr -out nr_humans -title nr_humans

Essentially the output of the command should be everything that the alias sees (and would likely be the same as blastdbcmd -db nr_humans -out human_sequences.txt), but the OIDs are missing regardless of whether i use nr or nr_humans (which is expected)

**GenoMax** · 12-19-2013, 04:03 PM

I missed the line in your explanation before I read your post again.

Is the output file being populated irrespective of the database (or alias) being used? nr is so huge at this point in time that it may not be surprising to find errors in it.

What exactly are you interested in from the human subset from nr?

**GenoMax** · 12-19-2013, 04:07 PM

See this post and the "missing OID's": http://blastedbio.blogspot.com/2012/...cbi-blast.html

**PFaucon** · 12-19-2013, 04:09 PM

Yes, the file is being populated in either case, and the number of misses seems minute compared to the number of hits, I haven't run both to look for differences but I don't expect to find any (as the alias is a restriction with the list that I'm using to dump anyways).

At this point I'm interested in doing a homology search for yeast proteins against human proteins. I'm also only interested in humans so that is the reason for the restriction.

**GenoMax** · 12-19-2013, 04:19 PM

A quicker way to do this would be to get the human protein sequence complement from a "BioMart" search (http://useast.ensembl.org/info/data/biomart.html) from Ensembl site. I see a total of 64,138 at this time.

**PFaucon** · 12-19-2013, 04:22 PM

Hmm, so that link scared me initially but it appears that blastdbcmd is treating the gi's as gi's instead of OIDS (or perhaps they are the same thing in nr), I went through a few pages of the output and they are all [homo sapiens] (or sequences with multiple species at least include it).

Aside from that possibility I'm not seeing that it is directly related to the problem at hand.

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 22 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 28 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Missing OIDs during blast db dump?

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News