![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
batch rename multiple fasta | Pedro | Bioinformatics | 3 | 03-19-2012 01:35 PM |
Obtaining unique sequence tag file from fastQ format | ramadatta.88 | Introductions | 0 | 09-26-2011 02:25 AM |
human genome build identifier tool | rworthi | Bioinformatics | 0 | 06-18-2011 09:44 AM |
Converting Solexa FASTQ file to unique sequence tags | DrD2009 | Bioinformatics | 16 | 08-09-2010 12:30 AM |
Solexa - same sequence but unique identifier | Layla | Bioinformatics | 5 | 11-27-2009 06:08 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Switzerland Join Date: Sep 2011
Posts: 2
|
![]()
Hi everyone,
I have a problem, my solid data do not have unique identifier and I'm afraid it could be a problem at some point in the analysis. I would like to rename every fastq sequence with an unique identifier @Indx_1 - @Indx-10000000 etc now they are labelled: @1_5_1645_F3 I'm a newbie in bioinformatics and wanted to try with awk but wasn't sure how to deal with the @ beeing also inside the qualities (in ASCII). And the fastx tool just renumber but do not give the possibility to add an identifier... Any ideas? Thank Have a nice evening |
![]() |
![]() |
![]() |
#2 |
Member
Location: Houston, Texas Join Date: Jul 2011
Posts: 44
|
![]()
While this can certainly be done with standard awk, bioawk, makes it a bit easier:
Code:
awk -c fastx '{printf("@Indx-%d\n%s\n+\n%s\n",cnt++,$seq,$qual)}' old.fastq > new.fastq Code:
awk '{if( (NR-1)%4 ) print; else printf("@Indx-%d\n",cnt++)}' old.fastq > new.fastq |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: bethesda Join Date: Feb 2009
Posts: 700
|
![]()
Try something like this ...
cat file.fq | awk '{if ((p%4)==0) print "@whatev_"p;else print $0;p++}' If you're repairing paired data, you will lose the pairs. sadly. (edit: alex beat me to it) |
![]() |
![]() |
![]() |
Thread Tools | |
|
|