Seqanswers Leaderboard Ad

**Thomas Doktor** · 02-24-2011, 04:09 AM

Can you give an example of the intended output?

I'm thinking you would like this:
File 1:
E.coli ---MGKALVIVESPAKAKTINKYLGSD-----------YVVKSSVGHIRDLPTSGSAAKK
E.coli ---ALISLILIASLILILAISLA

File 2:
E.coli SADSTSTKTAKKPKKDERGALVNRMGVDPWHNWEAHYEVLPGKEKVVSELKQLAEKADHI

Output:
E.coli ---MGKALVIVESPAKAKTINKYLGSD-----------YVVKSSVGHIRDLPTSGSAAKK---ALISLILIASLILILAISLASADSTSTKTAKKPKKDERGALVNRMGVDPWHNWEAHYEVLPGKEKVVSELKQLAEKADHI

Is that correct?

EDIT
Ignore the whitespace in the sequences.

**semna** · 02-24-2011, 04:36 AM

Hi Thomas,
yes you are almost right.but this is tht I show just 1 file. And I want to do the same thing for other file and finally concatenate for each species and genereate just one final uotput.
for eample for E.coli we have:
AAAAAAAAAAAA

and again within this file for E.coli
BBBBB----BBB
for this file so we will have
AAAAAAAAAAAABBBBBBBB
and this for other species and so on.
Thanks

**Thorondor** · 02-24-2011, 06:06 AM

well i won't write the script for you, but it is quite easy. ;-)

get the files you want ( @files = <*.txt>)

go over all these files (foreach (@files)){ while <$_> { chomp;....
user pattern matching $_ ~= /((w+)|(w+\.w+))\s+(*.)/
remove the --- in $2
...
store it in a hash sequences($1) = "sequences($1)"."$2"

and in the end print the hash like you want it in the output file.

something like that, of course there will be some problems, but try it on your own first, then show what you got.

**JohnK** · 02-24-2011, 11:49 AM

in a file prog_name.pl:

#!/usr/bin/perl

$in=$ARGV[0]; #input file name
$out=$ARGV[1];

open(FH, "< $in") || die "here1";
open(OUT, "> $out") || die "here2";

while(<FH>){
if($_=~/E. coli/){
$hash{"e_coli"}.=$_;
}elsif($_=~/A. aeolicus/){
$hash{"a_aeol"}.=$_;
}elsif( keep listing it out for as many lines){

}

}
foreach $key (keys %hash){
print OUT $hash{$key}, "\n";
}
close FH;
close OUT;

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 37 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

concatenation by perl

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News