SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Looking for an Entry Level position bnfoguy Academic/Non-Profit Jobs 0 06-30-2012 07:46 PM
Stupid perl scripts for converting colour-space <-> base-space gringer Bioinformatics 7 07-20-2011 08:35 AM
Problems with mapping SOLID color space sequences to hg19 using TopHat davidehs Bioinformatics 1 06-24-2011 03:21 PM
Converting nucleotide-space to color-space javijevi Bioinformatics 7 11-29-2010 03:14 AM
Solid formats translator(base space/color space/double encoded) AronaldJ SOLiD 0 10-26-2010 01:10 AM

Reply
 
Thread Tools
Old 10-25-2013, 11:25 PM   #1
yingeddi2008
Junior Member
 
Location: Denton, TX

Join Date: Oct 2013
Posts: 6
Default Problems with blastdbcmd with entry ID contains space

I am a rookie still in this area, this is the first thing I was requested to do: to extract a list of 100% matched reads from a self-generated database. However, the reads' names are not formatted in the regular way. I assume that's what I am encountering now.

Below is a a list of my reads' names:
this is part of my entry_batch input file -- ID.txt

'M00344:4:000000000-A5RU9:1:2119:17016:21751 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2119:6591:19854 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2119:11445:14212 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2119:22676:7504 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2119:13009:4084 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2119:14454:4004 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2118:11021:19828 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2118:14025:16724 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2118:25864:15172 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2118:13018:13673 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2118:5760:11441 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2117:24461:19844 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2117:17300:18233 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2117:4137:17412 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2117:2789:15268 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2117:25164:15029 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2117:16039:7681 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2117:8713:5016 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2116:13795:20195 2:N:0:10'
'M00344:4:000000000-A5RU9:1:2116:6977:17108 2:N:0:10'

I used commands below:
$ blastdbcmd -db seqs.fasta -dbtype nucl -entry_batch ID.txt -out miseq.read.fasta

Error messages:

Error: 'M00344:4:000000000-A5RU9:1:1104:13049:19775: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:13044:19758: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:13062:19751: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:11099:18531: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:11118:18521: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:17175:17791: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:17452:17720: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:16737:13751: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:16726:13733: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:19339:9296: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:17187:8943: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:14936:7801: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:21379:6845: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:23493:5643: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:26299:4746: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:23691:4053: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1104:15699:3766: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1103:18377:16637: OID not found
Error: 'M00344:4:000000000-A5RU9:1:1103:16030:10176: OID not found


I tried changing white space to \s, or add ' before and after each id names, but it didn't help at all. The blastdbcmd program recognizes anything before the space as the id names. Anyone has any idea how to do it? Or I am totally heading in the wrong direction?

Eddi
yingeddi2008 is offline   Reply With Quote
Old 10-26-2013, 02:13 AM   #2
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836
Default

Here's what I run to generate a BLAST database out of a FASTA file:
Code:
makeblastdb -in <input>.fasta -title 'Something Stringy' -taxid <org_taxid> -dbtype nucl -out <dbname_ID>
It looks like you might be trying to query a database that doesn't exist (or hasn't been generated yet).

However, if you have an NGS-amount of reads, it's probably better to use something other than BLAST for sequence matching. I'd recommend Bowtie2, but BWA seems to also be commonly used here.

Here's the command I'd run to generate a Bowtie2 index:
Code:
bowtie2-build <input>.fasta <dbname_ID>
gringer is offline   Reply With Quote
Old 10-26-2013, 03:41 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by yingeddi2008 View Post
I tried changing white space to \s, or add ' before and after each id names, but it didn't help at all. The blastdbcmd program recognizes anything before the space as the id names. Anyone has any idea how to do it? Or I am totally heading in the wrong direction?
Hi Eddi. This is by partly design - most tools consider everything up to the first space as the ID.

However there are also some issues with the blastdbcmd, and the exact version of BLAST+ is important, see my blog post:
http://blastedbio.blogspot.co.uk/201...cbi-blast.html
maubp is offline   Reply With Quote
Old 10-26-2013, 09:13 AM   #4
yingeddi2008
Junior Member
 
Location: Denton, TX

Join Date: Oct 2013
Posts: 6
Default blastdbcmd sucks

Hi maubp,

Thank you very much. I read through your blog. I think that's exactly what I have problem now. Then there is no way I can extract sequences from my own custom database?!

For example, in my database, I have

Code:
>M00344:4:000000000-A5RU9:1:1101:17539:1069 1:N:0:14
AAGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGAGCGATGAAACCCTTCGGGGTGGATTAGCGGCGGACGGGTGAGTAACACGTGGGCAACCTGCCTCAAAGAGGGGGATAGCCTCCCGAAAGGGAGATTAATACCGCATAATAAGTACTTCTCGCATGGGAAGAACTTTAAAGGAGCAATCCGCTTTGAGATGGGCCCGCGGCGCATTAGCTAGTTGGTGAGGTAAAGGCTCACAAAGGCGACGATGCGTAGCCGACCTGAGAGGGTGATCGGCG
>M00344:4:000000000-A5RU9:1:1101:17556:1074 1:N:0:14
AAGAGTTTGATCATGGCTCAGGACGAACGCTGGCGGCGTGCCTAACACATGCAAGTCGAGCGATGAAACCCTTCGGGGTGGATTAGCGGCGGACGGGTGAGTAACACGTGGGCAACCTGCCTCAAAGAGGGGGATAGCCTCCCGAAAGGGAGATTAATACCGCATAATAAGTACTTCTCGCATGGGAAGAACTTTAAAGGAGCAATCCGCTTTGAGATGGGCCCGCGGCGCATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGACGATGCGTAGCCGAACTGAGAGGGGGATCGGC
But when I run

Code:
$ blastdbcmd -db seq.fasta -entry all -outfmt "OID: %o     TITLE: %t"
I got nothing back, I don't know whether there is an internal error or it won't recognize any IDs that are not in NCBI format. That is so unfortunate.


Eddi

Quote:
Originally Posted by maubp View Post
Hi Eddi. This is by partly design - most tools consider everything up to the first space as the ID.

However there are also some issues with the blastdbcmd, and the exact version of BLAST+ is important, see my blog post:
http://blastedbio.blogspot.co.uk/201...cbi-blast.html
yingeddi2008 is offline   Reply With Quote
Old 10-26-2013, 09:19 AM   #5
yingeddi2008
Junior Member
 
Location: Denton, TX

Join Date: Oct 2013
Posts: 6
Default

Hi gringer,

Thank you for your advice, I will try Bowtie2 or BWA. I have Illumina Miseq data here. Maybe I should try something else.

Eddi
yingeddi2008 is offline   Reply With Quote
Old 10-26-2013, 10:17 AM   #6
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by yingeddi2008 View Post
Thank you very much. I read through your blog. I think that's exactly what I have problem now.
Please email them to check (and make sure they know people are having problems with blastdbcmd to help prioritise fixing this). Thanks!
Quote:
Originally Posted by yingeddi2008 View Post
Then there is no way I can extract sequences from my own custom database?!
As long as you still have the FASTA file you made the BLAST database from, you can extract the records from the FASTA file. There are several tools for this (including support in scripting libraries like Biopython, BioPerl, BioRuby etc).
maubp is offline   Reply With Quote
Old 10-26-2013, 10:35 AM   #7
yingeddi2008
Junior Member
 
Location: Denton, TX

Join Date: Oct 2013
Posts: 6
Default Thank you.

Hi maubp,

Quote:
Originally Posted by maubp View Post
Please email them to check (and make sure they know people are having problems with blastdbcmd to help prioritise fixing this). Thanks!
Who should I email to? These NCBI guys?

Quote:
Originally Posted by maubp View Post
As long as you still have the FASTA file you made the BLAST database from, you can extract the records from the FASTA file. There are several tools for this (including support in scripting libraries like Biopython, BioPerl, BioRuby etc).
I will try those then. Thank you.

Eddi
yingeddi2008 is offline   Reply With Quote
Old 10-26-2013, 10:39 AM   #8
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by maubp View Post
Please email them to check (and make sure they know people are having problems with blastdbcmd to help prioritise fixing this). Thanks!
blast-help at ncbi.nlm.nih.gov as listed here:
http://blast.ncbi.nlm.nih.gov/Blast....TYPE=Blastdocs
maubp is offline   Reply With Quote
Reply

Tags
blast, linux, perl

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:43 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO