Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting only the first match from FASTA file Henry_C Bioinformatics 4 06-01-2015 07:40 AM
FANCY BLAST PARSING NEEDED- extract fasta sequences distroboto Bioinformatics 15 11-06-2014 12:45 PM
Parsing multi fasta sequence file using Perl newbie2this Bioinformatics 9 09-11-2013 04:48 AM
Using Tabix for parsing files bnfoguy Bioinformatics 0 09-09-2011 08:10 AM
Help with FASTA parsing code. bigmac3000lbs Bioinformatics 6 03-28-2011 02:38 PM

Thread Tools
Old 01-11-2016, 08:42 PM   #1
Location: India

Join Date: Aug 2015
Posts: 11
Post Parsing and extracting from 2 Fasta Files

Hello I have 2 Multi-Fasta files with different Headers - a Reference file and a Test file

Reference file example
>gi|536779208|gb|GANF01000001.1| TSA: Momordica charantia Locus_17026_Transcript_1/1_Confidence_1.000_Length_828 transcribed RNA sequence

Test file example
>gi|537289490|gb|GANG01000001.1| TSA: Momordica charantia Locus_12460_Transcript_2/3_Confidence_0.400_Length_1699 transcribed RNA sequence

These files contain ~51000 entries.

I want to separate out entries that are similar in the Reference and Test with the preference of setting a similarity percentage - like 95% similar or so.
The output would ideally be in 2 files - the similar ones and the excluded ones.

Can the multiple sequence alignment programs like Mummer do that? or BLAST? If any similar program exists please help me out

Thank you.
Niranjanks is offline   Reply With Quote
Old 01-11-2016, 11:57 PM   #2
Senior Member
Location: Geneva

Join Date: Feb 2012
Posts: 177

The easiest and fastest would be to use BLAST (locally using one of your file as database and setting the tabular output) and then parse the result yourself... You can easily make it using R for example.
SylvainL is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 05:24 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO