SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
sort contigs based on fasta header akjones Bioinformatics 3 03-19-2014 12:14 PM
multiple sequences extraction from fasta sebl Bioinformatics 6 03-01-2014 10:41 PM
fasta header polijana Bioinformatics 2 03-31-2013 04:01 PM
FastaHack - FASTA file manipulation and subsequence extraction utilities ekg Bioinformatics 13 01-30-2013 11:25 AM
alignable portion of a genome fadista General 1 05-11-2009 12:06 PM

Reply
 
Thread Tools
Old 01-04-2015, 11:51 PM   #1
chayan
Member
 
Location: USA

Join Date: Nov 2012
Posts: 51
Default Extraction of a portion of a fasta header

Hi,

I have a multi fasta file whose header is like below,

">WLPTB:00031:00193;size=517;"

now i want to extract just the "size=517" portion from all the sequences..

any help???

Best,
Chayan
chayan is offline   Reply With Quote
Old 01-05-2015, 12:10 AM   #2
chayan
Member
 
Location: USA

Join Date: Nov 2012
Posts: 51
Default

And if possible basicslly i want to extract only the integer "517" after the "size=" portion

thnks
chayan is offline   Reply With Quote
Old 01-05-2015, 12:49 AM   #3
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Quote:
Originally Posted by chayan View Post
And if possible basicslly i want to extract only the integer "517" after the "size=" portion

thnks
This should do (seq.fa is your fasta file):

Code:
grep -P -o 'size=\d+' seq.fa | sed 's/size=//'
dariober is offline   Reply With Quote
Old 01-05-2015, 12:49 AM   #4
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Something like this should work:
Code:
grep ">" foo.fasta | cut -d "=" -f 2 | cut -d ";" -f 1
You could also just use biopython or bioperl, which would allow you to more easily keep these values associated with their sequences if that's needed.

Last edited by dpryan; 01-05-2015 at 01:25 AM. Reason: Forgot "-f"!
dpryan is offline   Reply With Quote
Old 01-05-2015, 12:50 AM   #5
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Behold, the usefulness and flexibility of standard unix command line tools!
dpryan is offline   Reply With Quote
Reply

Tags
fasta file editing, fasta format, fasta sequence cut, fasta-reader, headers

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:07 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO