SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
BLAST convert xml to tabular format SDPA_Pet Bioinformatics 17 08-01-2014 09:38 AM
Convert FASTA to pgSNP Gursen Bioinformatics 2 05-29-2013 01:42 AM
Can any softwares or tools convert .KGML files(XML edited) to SVG ones? hugh_hang Bioinformatics 2 04-11-2013 10:20 PM
Help plz. Cannot get blastx XML output for analysis of de novo aphid transcriptome. matthew.christenson RNA Sequencing 5 10-04-2012 07:04 AM
Convert FASTA to FASTAQ jomaco Bioinformatics 6 10-31-2011 10:14 AM

Reply
 
Thread Tools
Old 07-21-2015, 09:01 PM   #1
Blaze9
Junior Member
 
Location: NJ

Join Date: Feb 2013
Posts: 8
Default Convert blastx XML to Fasta w/ NTs, not Proteins?

Hey,

I have a blastx xml output. I want to convert this blastx back into a fasta file for downstream analysis. However, the xml contains only protein sequences since it's a blastx output. I have the original fasta I sent into blast. Is there a way I can "merge" these two files together? I want to take the annotation from the xml and append it to the original fasta's sequences.

Is this possible via command line? Right now I'm doing this by importing my fasta and XML into blast2go and exporting the fasta w/ the new header information, but blast2go gets really slow with very large datasets.
Blaze9 is offline   Reply With Quote
Old 07-22-2015, 12:06 AM   #2
lindenb
Senior Member
 
Location: France

Join Date: Apr 2010
Posts: 143
Default

an idea (not tested) you could create a awk script from the xml using XSLT

Code:
<?xml version='1.0'  encoding="UTF-8" ?>
<xsl:stylesheet  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"  version='1.0'>

<xsl:output method="text" />


<xsl:template match="/">
<xsl:apply-templates select="BlastOutput/BlastOutput_iterations/Iteration"/>
</xsl:template>


<xsl:template match="Iteration">
<xsl:text>$1 == "&gt;</xsl:text><xsl:value-of select="Iteration_query-def/text()"/> <xsl:text>"	{ </xsl:text>
<xsl:for-each select="Iteration_hits/Hit/Hit_hsps/Hsp">printf("%s [%s-%s]\n%s\n",$1,<xsl:value-of select="Hsp_query-from/text()"/>,<xsl:value-of select="Hsp_query-to/text()"/>,substr($2,<xsl:value-of select="Hsp_query-from/text()"/>,<xsl:value-of select="Hsp_query-to/text()"/>));</xsl:for-each>
<xsl:text> next;}
</xsl:text>
</xsl:template>
</xsl:stylesheet>

Code:
xsltproc stylesheet.xsl input.blastx.xml  > my.awk
then, linearise the input fasta and use this awk script to extract the substring


Code:
awk '/>/ {printf("\n%s\t",$0);next;} {printf("%s",$0);} END {printf("\n");}' query.fasta  |\
  awk -F '\t' -f my.awk
if the input fasta is too large, you could index the query.fasta with makeblastdb and call blastdbcmd with 'range' instead of awk
lindenb is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO