Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
batch rename multiple fasta Pedro Bioinformatics 3 03-19-2012 01:35 PM
Obtaining unique sequence tag file from fastQ format ramadatta.88 Introductions 0 09-26-2011 02:25 AM
human genome build identifier tool rworthi Bioinformatics 0 06-18-2011 09:44 AM
Converting Solexa FASTQ file to unique sequence tags DrD2009 Bioinformatics 16 08-09-2010 12:30 AM
Solexa - same sequence but unique identifier Layla Bioinformatics 5 11-27-2009 06:08 AM

Thread Tools
Old 03-28-2012, 12:24 PM   #1
Junior Member
Location: Switzerland

Join Date: Sep 2011
Posts: 2
Default Rename fastq seq ID with unique identifier

Hi everyone,
I have a problem, my solid data do not have unique identifier and I'm afraid it could be a problem at some point in the analysis.

I would like to rename every fastq sequence with an unique identifier

@Indx_1 - @Indx-10000000 etc

now they are labelled: @1_5_1645_F3

I'm a newbie in bioinformatics and wanted to try with awk but wasn't sure how to deal with the @ beeing also inside the qualities (in ASCII). And the fastx tool just renumber but do not give the possibility to add an identifier...

Any ideas?


Have a nice evening
454rocks is offline   Reply With Quote
Old 03-28-2012, 01:24 PM   #2
Alex Renwick
Location: Houston, Texas

Join Date: Jul 2011
Posts: 44

While this can certainly be done with standard awk, bioawk, makes it a bit easier:

awk -c fastx '{printf("@Indx-%d\n%s\n+\n%s\n",cnt++,$seq,$qual)}' old.fastq > new.fastq
...on reflection, this isn't too hard using standard awk...

awk '{if( (NR-1)%4 ) print; else printf("@Indx-%d\n",cnt++)}' old.fastq > new.fastq
...this depends on the first entry starts on the first line, and following entries occur every four lines after that.
Alex Renwick is offline   Reply With Quote
Old 03-28-2012, 01:29 PM   #3
Richard Finney
Senior Member
Location: bethesda

Join Date: Feb 2009
Posts: 700

Try something like this ...
cat file.fq | awk '{if ((p%4)==0) print "@whatev_"p;else print $0;p++}'

If you're repairing paired data, you will lose the pairs. sadly.

(edit: alex beat me to it)
Richard Finney is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 12:14 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO