SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   problem with adding numerical sequence at the end of line (http://seqanswers.com/forums/showthread.php?t=34071)

garethboy 09-26-2013 09:29 PM

problem with adding numerical sequence at the end of line
 
Hi,

Anyone has any idea how to get this:

>no_name
TATGCATCGATGCACATATGCTAGTGCGCTAGTGTCGAGGCTAGCTACG
>no_name
GACGTACGTAGCATGCATGCATGCGTAGCTGTAGCTAGC
>no_name
GCTAGCTAGGTAGGTCATGTAGTAGGTGCACTGAGCTAGCTAGCTAGCTAGCAGC
>no_name
GCTAGCATGCTAGCTAGCTAGCACTAGCTAGCTAGCTAGCTAATGCATCATC
>no_name
GCTACGTAGCATGCTAGCGGATCATGCATGCATGCTAGCATCGATGCTAGCATGCAT

become this:

>no_name_1
TATGCATCGATGCACATATGCTAGTGCGCTAGTGTCGAGGCTAGCTACG
>no_name_2
GACGTACGTAGCATGCATGCATGCGTAGCTGTAGCTAGC
>no_name_3
GCTAGCTAGGTAGGTCATGTAGTAGGTGCACTGAGCTAGCTAGCTAGCTAGCAGC
>no_name_4
GCTAGCATGCTAGCTAGCTAGCACTAGCTAGCTAGCTAGCTAATGCATCATC
>no_name_5
GCTACGTAGCATGCTAGCGGATCATGCATGCATGCTAGCATCGATGCTAGCATGCAT

atcghelix 09-26-2013 09:47 PM

Here's one way using Perl. Save the text in a file named numbers.pl (or whatever). Usage would be:

perl numbers.pl --in file_to_change.fasta --out revised_file.fasta


Code:

#!/usr/bin/perl

use strict;
use warnings;
use Getopt::Long;

my $inFile;
my $outFile;

GetOptions  ("in=s"      => \$inFile,
            "out=s"      => \$outFile);

if (!$inFile or !$outFile) {
    die "Must supply both infile and outfile as command line arguments.\n";
}

open(my $inFH, "<", $inFile) or die "couldn't open infile for reading.\n";
if (-e $outFile) {
    die "Output file $outFile already exists--aborting so you don't overwrite.\n";
}
open(my $outFH, ">", $outFile) or die "couldn't open outfile for writing.\n";
   
my $counter = 1;
while (my $line = <$inFH>) {
    chomp $line;
    if ($line =~ /^(>.*)/) {
        print $outFH $1 . "_$counter\n";
        $counter++;
    } else {
        print $outFH "$line\n";
    }
}


Jeremy 09-26-2013 10:06 PM

Heres another way: R

Code:

library(seqinr)
read.fasta("fastafile.fa")->fa
write.fasta(fa,names=paste(getName(fa),1:5,sep="_"),file.out="fa_new_name.fa")

where you swap '1:5' with '1:n', n being the number of sequences you have.

garethboy 09-26-2013 10:31 PM

Anyone know how to use AWK to do this task?

garethboy 09-26-2013 10:32 PM

Thanks. I am pretty weak in Perl. Do you have any idea using AWK to do this?

atcghelix 09-26-2013 10:49 PM

What version of Awk are you running/what operating system?

garethboy 09-26-2013 11:03 PM

Running is UNIX

atcghelix 09-26-2013 11:26 PM

This work? (It assumes all sequence strings are on a single line)

Code:

awk '{if($0 ~ /^>/){print $0"_"(NR+1)/2}else{print $0}}' input.fasta > changed.fasta

Kennels 09-26-2013 11:33 PM

try this

Code:

paste - - < input.fa | awk ' { print $1"_"NR"\n"$2 } ' > output.fa
make sure to have spaces between the hyphens for 'paste'

garethboy 09-27-2013 01:27 AM

Thank you everybody. I have done my task. =)


All times are GMT -8. The time now is 09:23 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.