Seqanswers Leaderboard Ad

**mastal** · 04-07-2013, 05:36 AM

PERL] Text files manipulation/reorganisation

Hi,

I would use a hash of arrays.

So you would have $results{$line}->[0] would be 1 for sequences that
are found in file1 ($results{$line}->[0] = 1),

and

$results{$line}->[1] would be 1 for sequences that
are found in file2 ($results{$line}->[1] = 1).

when you read through the first file, you could set the counts for that sequence in the second file to 0 ($results{$line}->[1] = 0).

Then when you are reading through the second file, check if that sequence already exits, if it doesn't, set the count for file1 to 0 ($results{$line}->[0] = 0).

When you print the contents of the hash, if ($resuilts{$line}->[0] is 0, print that line to output file to only, etc.

Hope this helps,
maria

**Kawaccino** · 04-07-2013, 05:48 AM

Thanks for your answer but I have some questions on it: the way I understand it, I will only have the unique sequences in my output file,no? Sorry i'm a newbie in Perl, still learning and sometimes even evident things are not so evident for me...
In fact I don't understand how the hash af array will allow me to save the count value and keep it associated with the corresponding sequence.

**mastal** · 04-07-2013, 07:27 AM

[PERL] Text files manipulation/reorganisation

You're right, I forgot that you need to have the counts associated with your
sequence.

But the hash of arrays will still work:

When you read through file 1,

$results{$line}->[0] = $name;
$results{$line}->[1] = 0;

When you read through file 2,

if (exists $line){
$results{$line}->[1] = $name;
}
else{
$results{$line}->[0] =0;
$results{$line}->[1] = $name;
}

when you go through the hash to print the two output files,
do something like:

foreach $line (keys %results){

print OUT1 "$line\t $results{$line}->[0]\n";
print OUT2 "$line\t $results{$line}->[1]\n";
}

the hash keys are unique, but the values associated with each key are (anonymous) arrays instead of single (scalar) values.

so you store the counts for file1 at position [0] of each array, and the counts for file2 qt position [1] of each array.

hopefully this should work.

**Kawaccino** · 04-07-2013, 07:44 AM

Oh ok! Thank you so much, I understand how it works now. I will try to write the script, thank you!

Here I am again: I did as you told me an it works perfectly! So once again thank you Maria and thank your for the explanations of what you told me cause now, I know it works and I wil be able to use it for something else if I need.

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, Yesterday, 06:35 AM	0 responses 14 views 0 likes	Last Post by seqadmin Yesterday, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 18 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 17 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

[PERL] Text files manipulation/reorganisation

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News