SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Adding counting Contig number at the start of Fasta Sequences (http://seqanswers.com/forums/showthread.php?t=58436)

Zapages 05-09-2015 02:18 PM

Adding counting Contig number at the start of Fasta Header
 
I am trying to add Prefix to a list of Fasta sequences (30K+ which I have de-novo assembled) in a fasta file at the same time.

I know from the past that have been able to add it at the end of the sequences. But I am getting a bit confused on how should I do it for the start of the Fasta sequence.

For the end of sequence.
Code:

awk '/^>/{$0=$0"_Contig_"(++i)}1' input_file.fasta > output_file.fasta
This will change something like this in the fasta file:

From:
>Sequence_header
ATAGCATA
To:
>Sequences_header_Contig_1
ATAGCATA
...
>Sequences_header_Contig_n
ATAGCATA


I hope to do something like this:

>Sequence_header
ATAGCATA
To:
>Contig_1_Sequences_header
ATAGCATA
...
>Contig_n_Sequences_header
ATAGCATA


Code:

awk '/^>/{"Contig_"(++i)"_"$0=$0}1' input_file.fasta > output_file.fasta
Unfortunately I receive a synthax error.

If someone could kindly show what I am doing wrong. I would greatly appreciate it.

Many thanks.

pmiguel 05-12-2015 06:12 AM

Sorry, I don't speak awk.
Code:

perl -pe 'next unless /^>/; $i++; s/>(\S+)/>Contig_${i}_$1/' input_file.fasta > output_file.fasta
seems to work.

--
Phillip

GenoMax 05-12-2015 06:50 AM

How about this?

Code:

$  awk '{if (/^>/) print ">Contig_"(++i)"_" substr($0,2); else print $0;}'  your_file > new_file

Zapages 05-12-2015 09:03 AM

Thank you Phillip and Genomax. Both of the strategies work. :)

Really appreciate all the help.

Many Thanks,

Zapages


All times are GMT -8. The time now is 10:01 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.