SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
counting total bases in a fasta file morning latte Bioinformatics 6 09-23-2016 05:58 AM
How to get FASTA sequences from GI number fefe89 Bioinformatics 10 08-08-2015 08:34 AM
Adding count number at the end of header in a fasta file garethboy Bioinformatics 3 04-09-2015 05:58 AM
Python counting bases fasta file illinu Bioinformatics 9 08-22-2013 08:17 AM
Tabix index at contig, start and end?? priesgo Bioinformatics 2 02-20-2013 05:23 AM

Reply
 
Thread Tools
Old 05-09-2015, 02:18 PM   #1
Zapages
Member
 
Location: NJ

Join Date: Oct 2012
Posts: 94
Default Adding counting Contig number at the start of Fasta Header

I am trying to add Prefix to a list of Fasta sequences (30K+ which I have de-novo assembled) in a fasta file at the same time.

I know from the past that have been able to add it at the end of the sequences. But I am getting a bit confused on how should I do it for the start of the Fasta sequence.

For the end of sequence.
Code:
awk '/^>/{$0=$0"_Contig_"(++i)}1' input_file.fasta > output_file.fasta
This will change something like this in the fasta file:

From:
>Sequence_header
ATAGCATA
To:
>Sequences_header_Contig_1
ATAGCATA
...
>Sequences_header_Contig_n
ATAGCATA


I hope to do something like this:

>Sequence_header
ATAGCATA
To:
>Contig_1_Sequences_header
ATAGCATA
...
>Contig_n_Sequences_header
ATAGCATA


Code:
awk '/^>/{"Contig_"(++i)"_"$0=$0}1' input_file.fasta > output_file.fasta
Unfortunately I receive a synthax error.

If someone could kindly show what I am doing wrong. I would greatly appreciate it.

Many thanks.

Last edited by Zapages; 05-09-2015 at 05:02 PM.
Zapages is offline   Reply With Quote
Old 05-12-2015, 06:12 AM   #2
pmiguel
Senior Member
 
Location: Purdue University, West Lafayette, Indiana

Join Date: Aug 2008
Posts: 2,317
Default

Sorry, I don't speak awk.
Code:
perl -pe 'next unless /^>/; $i++; s/>(\S+)/>Contig_${i}_$1/' input_file.fasta > output_file.fasta
seems to work.

--
Phillip
pmiguel is offline   Reply With Quote
Old 05-12-2015, 06:50 AM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,054
Default

How about this?

Code:
$  awk '{if (/^>/) print ">Contig_"(++i)"_" substr($0,2); else print $0;}'  your_file > new_file
GenoMax is offline   Reply With Quote
Old 05-12-2015, 09:03 AM   #4
Zapages
Member
 
Location: NJ

Join Date: Oct 2012
Posts: 94
Default

Thank you Phillip and Genomax. Both of the strategies work.

Really appreciate all the help.

Many Thanks,

Zapages
Zapages is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:39 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO