Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
perl script to generate six frame sequences pony2001mx Bioinformatics 2 01-17-2014 08:32 PM
perl script to exact sequences by name list pony2001mx Bioinformatics 6 11-12-2013 06:35 PM
< Script to compute distribution length of sequences > Giorgio C Bioinformatics 8 08-23-2012 03:29 AM
Script for extracting random sub-set of sequences silver_steve Bioinformatics 9 01-16-2012 11:07 PM
Demultilplexing Sequences according to barcode --- Homemade script Giorgio C Bioinformatics 9 10-14-2011 08:17 AM

Thread Tools
Old 04-27-2015, 12:58 PM   #1
Junior Member
Location: DC

Join Date: Apr 2015
Posts: 2
Default awk script to print a set of target sequences to same file

I have a folder with sample files (81 total .fasta) from a barcoded MiSeq run.

Each sample file contains consensus sequences for up to 53 targets.

The .fasta is organized so that name ">" corresponds to locus (AT#G######), followed by the consensus sequence. I need to search all sample files (from 81 total taxa) and create new .fasta files for each locus lists the name of the taxon, followed by the locus consensus sequence for each locus.

With some help from stackexchange, I have a script that does this beautifully. I've now encountered only one hang-up. The new locus .fasta files are not merged for each taxon, so I get a .fasta for locus ATXGXXXXX for Sample_1 only, a separate .fasta for Sample_2 for the same locus, and so on and so forth for all samples. I can't seem to find a command to merge all Sample sequences for locus ATXGXXXXXX into the same .fasta.

Here is the script:
awk '
FNR==1 { sample = FILENAME ; sub(/\.fasta/, "", sample )}
/^>/ { target = substr($0,2)".fasta" ; next }
{ print items ">" sample > target ; print > target; close(target) }
' C_*.fasta
Does anyone have any thoughts?
MoGo is offline   Reply With Quote
Old 04-27-2015, 01:56 PM   #2
Junior Member
Location: DC

Join Date: Apr 2015
Posts: 2

Got it sorted out. It's a nifty little script if anyone needs to batch sort multilocus target consensus files from Geneious Export to a new .fasta for per-locus alignment. Huge thanks to Janis at Stackexchange for that one!
MoGo is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 11:16 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO