SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
MiSeq gDNA reads still fail "Kmer content" and "per base seq content" after trimming" ysnapus Illumina/Solexa 4 11-12-2014 07:25 AM
Pipelining "Samtools view" : invalid header in output .bam files fabfab Bioinformatics 4 04-02-2014 12:24 AM
Invalid VCF files from Lifescope - allele "-" yl01 Bioinformatics 0 09-27-2013 04:53 AM
[DESeq] invalid class "CountDataSet" object Azazel Bioinformatics 4 11-08-2011 12:34 AM
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? elgor Illumina/Solexa 0 06-27-2011 07:55 AM

Reply
 
Thread Tools
Old 10-12-2015, 05:07 AM   #1
GSviral
Member
 
Location: UK

Join Date: Dec 2014
Posts: 37
Default "Invalid byte in GI list"

Hello everyone,

I am currently working with a local BLAST nucleotide database. After getting it set up I am able to BLAST FASTA files without any bother.

What I wanted to do was only search the nt database for viruses.

To this end I have downloaded the virus accession list from NCBI.

I try to use the following command:

$ blastn -db nt -query sequence.fasta -num_alignments 10 -num_descriptions 10 -evalue 1e-6 -gilist viruses.nbr -num_threads 4 -out sequence.tab

When I input this command I get a result saying "Invalid byte in GI list" and the command does not run. Can anyone help me out with this error message? Has there been a problem downloading the accession list file?

Thanks for the help.
GSviral is offline   Reply With Quote
Old 10-12-2015, 07:12 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

Is there a header present in your gilist file? If it is there is try removing that.

Last edited by GenoMax; 10-12-2015 at 07:23 AM.
GenoMax is offline   Reply With Quote
Old 10-12-2015, 07:22 AM   #3
GSviral
Member
 
Location: UK

Join Date: Dec 2014
Posts: 37
Default

Hi Genomax,

Thanks for the advice. Yeah, there were header lines indicating accession number, organism name etc.

Instead of using 'gilist' I ended up using 'seqidlist' which accepted my downloaded file. I am not sure if the results will differ using 'gilist' successfully but I will indeed try and remove the headers and re-run using 'gilist' to see if there are any differences.

Cheers!
GSviral is offline   Reply With Quote
Old 10-12-2015, 07:25 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

If you had "gi's" then it may be best to stick with gilist option. Not sure if the gi is equivalent to seqid.

If you expect to do this often then consider sub-setting the viruses set permanently.
GenoMax is offline   Reply With Quote
Old 10-13-2015, 02:55 AM   #5
GSviral
Member
 
Location: UK

Join Date: Dec 2014
Posts: 37
Default

I had limited success with the GI List option. I have realised that the virus taxa file I downloaded was for whole genomes and not partially sequenced genomes.

I went back and downloaded the GenBank viral database in a FASTA file.

From this I want to make a custom viral database to put my sequences through in order to speed up processing time and get the data I want without any bacterial sequences etc. however a new problem occurred.

when typing $ makeblastdb -help in order to even just get the possible options I get a 'segmentation fault' error. Is this due to RAM limitations or problems with the BLAST+ application?

Cheers.
GSviral is offline   Reply With Quote
Old 10-13-2015, 03:31 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

What OS are you using?
GenoMax is offline   Reply With Quote
Old 10-13-2015, 03:36 AM   #7
GSviral
Member
 
Location: UK

Join Date: Dec 2014
Posts: 37
Default

GenoMax,

I am using GNOME CentOS 2.16.0.

Outdated I am sure but my institution are picky about software unfortunately.
GSviral is offline   Reply With Quote
Old 10-13-2015, 03:58 AM   #8
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

Looking at your blast command line you appear to be using the latest blast+ package. Can you confirm that? If blastn from that package worked then I am not sure why you are getting a seg fault with makeblastdb. Perhaps that is using a library that is missing from your system. Going to be hard to fix.

BTW: Are you really using a 10+ year old OS (if I googled it right)?
GenoMax is offline   Reply With Quote
Old 10-13-2015, 04:02 AM   #9
GSviral
Member
 
Location: UK

Join Date: Dec 2014
Posts: 37
Default

GenoMax,

Yep, I am using the latest BLAST+ package - 2.2.31 along with the most recent nt database.

Could it be possible an OS update could fix the problem?

And yes ha, we are using a 10 year old OS. As I mentioned my institute can be ridiculously picky when installing new software due to security measures. Even so a 10 year old OS is a bit ridiculous really.
GSviral is offline   Reply With Quote
Old 10-13-2015, 04:19 AM   #10
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

You need a complete reinstall of a newer vintage OS

On a serious note, if you are not able to update the OS you could try compiling blast from source code (I am not even sure if that will work). Blast may expect latest libraries and such that are likely not going to be available in a 10 yr old OS. Even the compiler you have available will likely not work.
GenoMax is offline   Reply With Quote
Old 10-13-2015, 04:26 AM   #11
GSviral
Member
 
Location: UK

Join Date: Dec 2014
Posts: 37
Default

Thanks for the help GenoMax. I may just have to find a different computer to do this all on unfortunately.

Just as a revision, in case I have installed something incorrectly.

I downloaded the latest NCBI BLAST+ package which I then extracted.

I also downloaded the most recent nucleotide database which I extracted in to the .bin folder of the extracted BLAST+ package. Does this all sound correct?

I do not have much experience when running command line so perhaps I have installed something incorrectly.
GSviral is offline   Reply With Quote
Old 10-13-2015, 04:39 AM   #12
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,049
Default

For the purpose of what you were trying to run that all sounds right. I am surprised that even blastn worked considering makeblastdb generates a seg fault.
GenoMax is offline   Reply With Quote
Old 10-13-2015, 04:42 AM   #13
GSviral
Member
 
Location: UK

Join Date: Dec 2014
Posts: 37
Default

Thanks GenoMax.

Seems I will have to find an alternative route.

Sometimes if I am BLASTing a particularly large FASTA file I will get a segmentation fault a short time after the command has been input and it stops the process then and there resulting in incomplete output.
GSviral is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:42 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO