SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
MUMmer VS blastp anyone1985 Bioinformatics 13 02-11-2013 07:52 AM
Can blastp (blast+) print a line for seqs with no hits when using -outfmt 6? kmkocot Bioinformatics 3 07-10-2012 10:10 AM
The same sequence occurs multiple times in blastp output bioagri Bioinformatics 1 03-19-2012 12:28 AM
[BLASTP] where are some hits ? sohnic General 0 11-24-2011 01:37 AM
MAQ output format m_elena_bioinfo Bioinformatics 0 12-09-2009 01:35 AM

Reply
 
Thread Tools
Old 10-09-2012, 10:48 AM   #1
easolvig
Junior Member
 
Location: Europe

Join Date: Oct 2012
Posts: 2
Default BLASTp output format problem

Hello to all the members!

This is my first post here on SEQanswers.

Today I was updating the BLASTp application on the nodes of our grid to the latest version and, after running some jobs to test it, I noticed that the output files have different number of lines depending on the output format. (CSV and tabular format.)

I used the same database and query file for both run, the only difference was the output format parameter:
blastp -evalue 0.1 -db F10DRD -out test_output_f10drd_180.txt -outfmt '10 qseqid sseqid qstart qend evalue' -query f10drd_180.fas
blastp -evalue 0.1 -db F10DRD -out test_output_f10drd_180.txt -outfmt '6 qseqid sseqid qstart qend evalue' -query f10drd_180.fas

The output CSV file contained 1288845, the tabular file contained 1293150.
I replaced the \t characters with commas in the tabular file and compared the two outputs with diff. It showed that the tabular file contains all lines from the CSV, but has 4305 more.

I would like to ask if any of you noticed the same problem before.

Thank you for your time and your answers!
easolvig is offline   Reply With Quote
Old 10-12-2012, 11:49 PM   #2
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

I tried to replicate yoru results with BLAST 2.2.27+ using blastn and 1000 sequences:

Code:
formatdb -i contigs.fa -p F -o T

blastn -query contigs.fa -db ./contigs.fa -evalue 0.1 -out blast.csv -outfmt '10 qseqid sseqid qstart qend evalue'
blastn -query contigs.fa -db ./contigs.fa -evalue 0.1 -out blast.tsv -outfmt '6 qseqid sseqid qstart qend evalue'

wc -l blast.*
14858 blast.csv
14858 blast.tsv
The number of lines matched for me. I know this is a one-off, but comforting for me at least.

Are you using version 2.2.27 ?
Did you use "-parse_seqids" for makeblastdb? (or -o T for formatdb)
Are the sequence IDs unique in your database file?
Torst is offline   Reply With Quote
Old 10-13-2012, 01:00 AM   #3
easolvig
Junior Member
 
Location: Europe

Join Date: Oct 2012
Posts: 2
Exclamation

Thank you for your reply, Torst!

Yes, -o T was used for formatdb, the IDs are unique and it is the 2.2.27 version. The problem was caused by something else.

After spending days with running several tests with different queries to find the source of this problem I found that those test jobs that completed in less than ~3 hours produced the same output in both CSV and tabular format. This led to ask for our computing grid’s error logs from the administrator.

I finally got the logs and it revealed that the different output files were the result of an incorrectly set CPU limit assigned to our account. It was a recent change what we were unaware of. Now, after they corrected it, the test runs I made gave correct and identically results.


I am sorry for taking your time with this question.

Last edited by easolvig; 10-13-2012 at 03:27 AM.
easolvig is offline   Reply With Quote
Old 10-13-2012, 07:28 PM   #4
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Glad it worked out, and there wasn't a bug in BLAST+.
Torst is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:51 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO