SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Meaning of 1KGP FASTA file headers gene coder Bioinformatics 0 07-09-2013 06:41 AM
Add count numbers to headers in a fasta file Giorgio C Bioinformatics 7 07-08-2013 04:06 AM
Any script to format headers in fasta files? Shishir Bioinformatics 2 02-05-2013 07:52 AM
Replacing FASTA headers for TopHat & Cufflinks brachysclereid Bioinformatics 2 02-16-2011 05:44 AM
Renaming reads within SFF files jvhaarst 454 Pyrosequencing 9 11-17-2010 06:26 AM

Reply
 
Thread Tools
Old 09-16-2013, 05:44 AM   #1
nouse
Member
 
Location: Germany

Join Date: Sep 2013
Posts: 11
Default Renaming of Fasta headers according to their container name

Hi there,
i am a somewhat beginner in bioinformatics, so please apologize if i am asking any silly questions.

I do have several dozens of files containing millions of processed illumina reads. The sequences have already been converted to fasta format.
Now i want to bin all my files into a single file, in order to have a uniform OTU nomenclature (i want to feed my otu picker with a single file).
I will use a normal cat command for merging all my files.
However, i cant think of commands, that
a) add an additional character (the number of sample) to any fasta header in the file
b) later on, after OTU picking, sort all sequences containing that same identifier into a new file.

I cant use the barcode information in the header, because several barcodes have been used multiple times.

Any idea? Thank you very much!
nouse is offline   Reply With Quote
Old 09-16-2013, 06:01 AM   #2
robinweide
Junior Member
 
Location: Utrecht

Join Date: Sep 2013
Posts: 9
Default

I would check the FastX toolskit, which has a renamer-tool in it: http://hannonlab.cshl.edu/fastx_tool...mmandline.html.

Edit: http://www.biostars.org/p/68477/ has more options for renaming headers.
robinweide is offline   Reply With Quote
Old 09-16-2013, 06:04 AM   #3
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

If all of your samples are in separate files, then you can use a simple sed command to add sample info to the fasta header for each sequence in a file as follows:

Code:
 sed 's/^>/>SampleA/g' INFILE.fasta>OUTFILE.fasta
That will add SampleA to the header for all sequences in the file, and you can just repeat that command for as many files/samples as you have.

For your second task, you can use a grep command as follows:

Code:
 grep -a1 '^>SampleA' INFILE.fasta>OUTFILE.fasta
That will find all lines in your concatenated file that begin with >SampleA, and then print that line and the following line (the sequence) into a new file. Again, you would have to execute the grep command for each different sample to produce a fasta file for each.

All of that is quite a lot of work, even if you shell script it, so you might want to take advantage of some of the scripts that are a part of the QIIME package, most importantly add_qiime_labels.py which takes a folder of fasta files and a sample mapping file and adds sample IDs to the fasta headers and concatenates the output into one file.
mcnelson.phd is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:02 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO