Seqanswers Leaderboard Ad

**dpryan** · 08-11-2014, 12:13 PM

Code:

cat foo | sed 's/>//' | awk '{idx+=1;printf(">%i\n%s\n",idx,$0)}'

or

Code:

cat foo | awk '{idx+=1;$1=substr($1,2,length($1));printf(">%i\n%s\n",idx,$1)}'

or

Code:

cat foo | awk '{idx+=1;sub(/>/,sprintf(">%i\n",idx),$1);print $1}'

among many other possibilities. You'll find that familiarizing yourself with the command line will come in useful.

**bckirkup** · 08-11-2014, 12:19 PM

also try jedit

Regex and beanshell can sort your problem out....

**GenoMax** · 08-11-2014, 12:21 PM

This should work

Code:

$ perl -p -i.bak -e '$c+=1; s/>/>$c\n/g' your_file

**satishg** · 08-11-2014, 01:16 PM

Thanks GenoMax. The output is as as follows:

>1

>2
>APEGDARPRQSGHPACHELDAADRRQGEIPGVPERRLCDASL
>3

>4
>ADSGGRGGCRRRCGDLPAAALIRGRGDDTDRPVPARRRPGRVRRGAGGPATAAGRARGVDRRAGLRGRA
>5

The order of the sequences is right but its introducing blank sequences of >1, >3 and >5.

Could you please look into it?

**GenoMax** · 08-11-2014, 02:27 PM

What OS are you doing this on? Did you edit/open this file on a PC/Mac?

NOTE: Before you edit/change a file it is important to make a backup copy (specially if you spent a day or two getting it). I have added a cp command below that would preserve an original copy should you need to go back to it.

Try the following first before you use the perl command (this will convert from windows to unix file format, if that is the issue though I am not certain). You will need to copy the .bak file (perl command made a backup of the original with .bak extension and changed the original so you can't use the original now) to the original name before you try this:

Code:

$ cp your_file.bak your_file.ORIG
$ cp your_file.bak your_file
$ awk '{ sub(/\r$/,""); print }' your_file

**rnaeye** · 08-11-2014, 07:07 PM

Code:

sed 's/>//' inputFile | awk '{print ">"NR"\n"$0}'

**satishg** · 08-12-2014, 06:42 AM

GenoMax - that didn't do anything. The .bak file has no numbers assigned and when I ran the awk command that was suggested it didn't make any changes or add numbers to the output file.

Thanks rnaeye. The original file has a sequence #5 which is of two lines. The code is making the second line of the sequence as sequence #6 in the output. I probably need to make changes to the number of characters per line on the original file. Please advise regarding the same.

The following are the input and output files:

INPUT-
>APEGDARPRQSGHPACHELDAADRRQGEIPGVPERRLCDASL
>ADSGGRGGCRRRCGDLPAAALIRGRGDDTDRPVPARRRPGRVRRGAGGPATAAGRARGVDRRAGLRGRA
>NSVNPDVSQHSPERHFHTSEGTLC
>AARHRAGQGARPPGLPPEHQPARRRDRAGAGLGGPASAGAAGRGAGGAATGRAVGAVRADGGR
>VRRLTWHGGGGDIRAFVFFLAKNVKNLDLFGASLFQVASFHPTASLGVSKLVIRSSIFNLLHCNFKKMRLAFFNLLHY
KEIRFAMITLIRSTATSGGYGICGFNLLHCHFGEIRFTMITSIRSTATLGGDKIHHGRFDPTYCNFRGIGFMVSLIVTPFSREHDL
>MNGAKAMEGMVCDARGEGDGGDVLQCTGRFGGKLTDLGNLGISEFREIGISESGQTRGKG

OUTPUT-
>1
APEGDARPRQSGHPACHELDAADRRQGEIPGVPERRLCDASL
>2
ADSGGRGGCRRRCGDLPAAALIRGRGDDTDRPVPARRRPGRVRRGAGGPATAAGRARGVDRRAGLRGRA
>3
NSVNPDVSQHSPERHFHTSEGTLC
>4
AARHRAGQGARPPGLPPEHQPARRRDRAGAGLGGPASAGAAGRGAGGAATGRAVGAVRADGGR
>5
VRRLTWHGGGGDIRAFVFFLAKNVKNLDLFGASLFQVASFHPTASLGVSKLVIRSSIFNLLHCNFKKMRLAFFNLLHY
>6
KEIRFAMITLIRSTATSGGYGICGFNLLHCHFGEIRFTMITSIRSTATLGGDKIHHGRFDPTYCNFRGIGFMVSLIVTPFSREHDL
>7
MNGAKAMEGMVCDARGEGDGGDVLQCTGRFGGKLTDLGNLGISEFREIGISESGQTRGKG

**satishg** · 08-12-2014, 07:23 AM

Thanks dpryan - the third code works effectively but it skips numbers for a sequence following the one which has it on two lines; say sequence #5 has two lines for which the output would be >5 followed by >7, skipping >6. This explains better:

>4
AARHRAGQGARPPGLPPEHQPARRRDRAGAGLGGPASAGAAGRGAGGAATGRAVGAVRADGGR
>5
VRRLTWHGGGGDIRAFVFFLAKNVKNLDLFGASLFQVASFHPTASLGVSKLVIRSSIFNLLHCNFKKMRLAFFNLLHY
KEIRFAMITLIRSTATSGGYGICGFNLLHCHFGEIRFTMITSIRSTATLGGDKIHHGRFDPTYCNFRGIGFMVSLIVTPFSREHDL
>7
MNGAKAMEGMVCDARGEGDGGDVLQCTGRFGGKLTDLGNLGISEFREIGISESGQTRGKG

I can live with it for now. I'll follow your advice and try to familiarize with the command line. Could you please fix the bug in the third code and let me know.....

**satishg** · 08-12-2014, 09:58 AM

Thanks ALL - I however have the issue with numbering sequences in order; removed the line delimiter and finally have the output file as:

>1
APEGDARPRQSGHPACHELDAADRRQGEIPGVPERRLCDASL
>2
ADSGGRGGCRRRCGDLPAAALIRGRGDDTDRPVPARRRPGRVRRGAGGPATAAGRARGVDRRAGLRGRA
>3
NSVNPDVSQHSPERHFHTSEGTLC
>4
AARHRAGQGARPPGLPPEHQPARRRDRAGAGLGGPASAGAAGRGAGGAATGRAVGAVRADGGR
>5
VRRLTWHGGGGDIRAFVFFLAKNVKNLDLFGASLFQVASFHPTASLGVSKLVIRSSIFNLLHCNFKKMRLAFFNLLHYKEIRFAMITLIRSTATSGGYGICGFNLLHCHFGEIRFTMITSIRSTATLGGDKIHHGRFDPTYCNFRGIGFMVSLIVTPFSREHDL
>7
MNGAKAMEGMVCDARGEGDGGDVLQCTGRFGGKLTDLGNLGISEFREIGISESGQTRGKG

Please help me fix the issue of numbering sequences in order.......

**dpryan** · 08-12-2014, 10:43 AM

That's less a bug than a feature request, but in any case it's pretty trivial to add support for multi-line entries:

Code:

cat foo | awk '{if(substr($1,1,1)==">"){idx+=1;sub(/>/,sprintf(">%i\n",idx),$1);}print $1}'

**satishg** · 08-12-2014, 11:29 AM

Finally.......it all looks good !

>1
APEGDARPRQSGHPACHELDAADRRQGEIPGVPERRLCDASL
>2
ADSGGRGGCRRRCGDLPAAALIRGRGDDTDRPVPARRRPGRVRRGAGGPATAAGRARGVDRRAGLRGRA
>3
NSVNPDVSQHSPERHFHTSEGTLC
>4
AARHRAGQGARPPGLPPEHQPARRRDRAGAGLGGPASAGAAGRGAGGAATGRAVGAVRADGGR
>5
VRRLTWHGGGGDIRAFVFFLAKNVKNLDLFGASLFQVASFHPTASLGVSKLVIRSSIFNLLHCNFKKMRLAFFNLLHYKEIRFAMITLIRSTATSGGYGICGFNLLHCHFGEIRFTMITSIRSTATLGGDKIHHGRFDPTYCNFRGIGFMVSLIVTPFSREHDL
>6
MNGAKAMEGMVCDARGEGDGGDVLQCTGRFGGKLTDLGNLGISEFREIGISESGQTRGKG
>7
MADPDEVIPTVRDVSDAPFVGSDGSNVILNEDSFGGGDNGLEEFRGEGSMGK

Thank You all for your time !

**syfo** · 08-13-2014, 08:54 AM

concise mode:

Code:

cat input |  awk '/^>/{$1=">"++n"\n"substr($1,2)}1'

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Fasta File Editing

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News