SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
batch rename multiple fasta Pedro Bioinformatics 3 03-19-2012 12:35 PM
Obtaining unique sequence tag file from fastQ format ramadatta.88 Introductions 0 09-26-2011 01:25 AM
human genome build identifier tool rworthi Bioinformatics 0 06-18-2011 08:44 AM
Converting Solexa FASTQ file to unique sequence tags DrD2009 Bioinformatics 16 08-08-2010 11:30 PM
Solexa - same sequence but unique identifier Layla Bioinformatics 5 11-27-2009 05:08 AM

Reply
 
Thread Tools
Old 03-28-2012, 11:24 AM   #1
454rocks
Junior Member
 
Location: Switzerland

Join Date: Sep 2011
Posts: 2
Default Rename fastq seq ID with unique identifier

Hi everyone,
I have a problem, my solid data do not have unique identifier and I'm afraid it could be a problem at some point in the analysis.

I would like to rename every fastq sequence with an unique identifier

@Indx_1 - @Indx-10000000 etc

now they are labelled: @1_5_1645_F3

I'm a newbie in bioinformatics and wanted to try with awk but wasn't sure how to deal with the @ beeing also inside the qualities (in ASCII). And the fastx tool just renumber but do not give the possibility to add an identifier...

Any ideas?

Thank

Have a nice evening
454rocks is offline   Reply With Quote
Old 03-28-2012, 12:24 PM   #2
Alex Renwick
Member
 
Location: Houston, Texas

Join Date: Jul 2011
Posts: 44
Default

While this can certainly be done with standard awk, bioawk, makes it a bit easier:

Code:
awk -c fastx '{printf("@Indx-%d\n%s\n+\n%s\n",cnt++,$seq,$qual)}' old.fastq > new.fastq
...on reflection, this isn't too hard using standard awk...

Code:
awk '{if( (NR-1)%4 ) print; else printf("@Indx-%d\n",cnt++)}' old.fastq > new.fastq
...this depends on the first entry starts on the first line, and following entries occur every four lines after that.
Alex Renwick is offline   Reply With Quote
Old 03-28-2012, 12:29 PM   #3
Richard Finney
Senior Member
 
Location: bethesda

Join Date: Feb 2009
Posts: 688
Default

Try something like this ...
cat file.fq | awk '{if ((p%4)==0) print "@whatev_"p;else print $0;p++}'

If you're repairing paired data, you will lose the pairs. sadly.

(edit: alex beat me to it)
Richard Finney is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:19 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO