SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Converting Blast+ output to Fasta sequence files (http://seqanswers.com/forums/showthread.php?t=49760)

Dave_Carlson 01-22-2015 01:34 PM

Converting Blast+ output to Fasta sequence files
 
Hi All,

I have a bioinformatics problem that I'm hoping others have encountered (and solved) already.

I have a transcriptome that I have BLASTed against itself to look for putatively-paralogous genes. Right now the Blastn results are in tab-separated format, with several thousand rows of results. After filtering the results to remove unhelpful stuff (e.g., very, very short hsp's or sequences that hit to themselves), what I would like to be able to do is take each hit between two sequences, find the full sequences in the original transcriptome fasta file and output the two sequences into a new fasta file. Ideally, I would like to able to do this in a relatively automated manner, so that I get separate fasta files for each hit from my Blastn results.

Is anybody aware of a script or utility that could do something like that? My coding skills are not great, and I'm hoping that I don't have to spend a lot of time inventing (reinventing?) the wheel on this. Any help or suggestions would be appreciated! Thanks.

maubp 01-22-2015 07:28 PM

You probably want to use blastdbcmd to pull the full sequences from the BLAST database (assuming you are using NCBI BLAST+, this tool had a different name in NCBI legacy BLAST).

Dave_Carlson 01-23-2015 08:51 AM

Quote:

Originally Posted by maubp (Post 158663)
You probably want to use blastdbcmd to pull the full sequences from the BLAST database (assuming you are using NCBI BLAST+, this tool had a different name in NCBI legacy BLAST).

Thanks for the suggestion! Based on other threads I've seen, I had been considering using blastdbcmd, however if I understand correctly, this will only output a single file containing whatever sequences correspond to the ID's from the list provided as an argument. Is there a way to make blastdbcmd output multiple files?

maubp 01-23-2015 01:22 PM

If you want one sequence per FASTA file, either call blastdbcmd many times in a loop, or divide the big FASTA file using a tool like EMBOSS seqretsplit http://emboss.open-bio.org/rel/rel6/...qretsplit.html or a simple script.

Dave_Carlson 01-23-2015 09:05 PM

Quote:

Originally Posted by maubp (Post 158759)
If you want one sequence per FASTA file, either call blastdbcmd many times in a loop, or divide the big FASTA file using a tool like EMBOSS seqretsplit http://emboss.open-bio.org/rel/rel6/...qretsplit.html or a simple script.

Thanks! With a bit of fiddling around, I was able to get blastdbcmd to output the sequences I needed (and in the order I needed them to be in), and then I used Fasta Splitter to separate the putatively-paralogous sequences into their own files. Thanks for taking the time to help me!

maubp 01-25-2015 03:27 AM

Good work :)


All times are GMT -8. The time now is 05:04 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.