Hi all, I'm trying to extract nucleotide sequences from a fasta file using the following script which allows multiple extractions:
perl -ne 'if(/^>(\S+)/){$c=grep{/^$1$/}qw()}print if $c' file.fasta
My fasta file "file.fasta" contains contigs with two different headers from two different assemblers which I've merged. It doesn't seem to extract sequences from the second assembly headers but works fine for the first. Here's an example of each header:
Header 1 (working) - comp89447_c0_seq4
Header 2 (won't extract) - BN2_l1_1_(paired)_merged_contig_14067
I can manually search and find the "BN2_l1_1_(paired)_merged_contig_14067" but I cannot extract it using the script. Any help would be greatly appreciated.
perl -ne 'if(/^>(\S+)/){$c=grep{/^$1$/}qw()}print if $c' file.fasta
My fasta file "file.fasta" contains contigs with two different headers from two different assemblers which I've merged. It doesn't seem to extract sequences from the second assembly headers but works fine for the first. Here's an example of each header:
Header 1 (working) - comp89447_c0_seq4
Header 2 (won't extract) - BN2_l1_1_(paired)_merged_contig_14067
I can manually search and find the "BN2_l1_1_(paired)_merged_contig_14067" but I cannot extract it using the script. Any help would be greatly appreciated.
Comment