Hello everyone,
guess what, I got a question! Here it is:
I got two files organised a little like fasta files (but they are txt files), that look like this:
>1 count:272019
TACCTGGTTGATCCTGCCAG
>2 count:48613
TTTGGATTGAAGGGAGCTCTA
>3 count:15422
TTTGGATTGAAGGGAGCTCT
>4 count:9818
TTGGACTGAAGGGAGCT
>5 count:8783
TTGGACTGAAGGGAGCTCCCT
These two files contain the same sequences and some sequences that are only in one of the files; these sequences are not in the same order in the two files and for an identical sequence, the count value isn't the same in the two files. What I want to do is order the two files so that the sequences are in the same order in each of them. I also want to modify the name, so that it consists only of the value of count. In fact, what I want to obtain is, for example:
272019
TACCTGGTTGATCCTGCCAG
48613
TTTGGATTGAAGGGAGCTCTA
15422
TTTGGATTGAAGGGAGCTCT
9818
TTGGACTGAAGGGAGCT
8783
TTGGACTGAAGGGAGCTCCCT
I also want that if a sequence is only in file1 for exemple, in file2 that sequence appears but with a count value of 0. Same thing if the sequence is only in file2, I want that in file1 it appears with a count value of 0.
So i wrote a script that seems to work for sorting my sequences in the same order in the two files; I say seems to work cause now, the two files have the same nomber of lines of sequences and they are in the same order.
But I have a problem: I don't know how to extract the count value and keep it associated with its sequence and how to say I want this value to be 0 if my sequence is only in one file. Cause, if I understood well the principle of hash, with my sequences having a different count value in each file, I can't do a
It only works for identical values, no?
Well all that to say I'm lost and blocked on this part of my script, so if anyone can help me, I will really appreciate it. Thank you very much.
By the way, here is the code I wrote, that works for sorting my sequences (if anybody thinks this code is not really good and could be optimised, don't hesitate to tell!):
The commented lines are an idea I got to extract my count values, but I didn't see how to continue on it, so...
Once again, thanks for any help you can give me!
Have a nice day!
guess what, I got a question! Here it is:
I got two files organised a little like fasta files (but they are txt files), that look like this:
>1 count:272019
TACCTGGTTGATCCTGCCAG
>2 count:48613
TTTGGATTGAAGGGAGCTCTA
>3 count:15422
TTTGGATTGAAGGGAGCTCT
>4 count:9818
TTGGACTGAAGGGAGCT
>5 count:8783
TTGGACTGAAGGGAGCTCCCT
These two files contain the same sequences and some sequences that are only in one of the files; these sequences are not in the same order in the two files and for an identical sequence, the count value isn't the same in the two files. What I want to do is order the two files so that the sequences are in the same order in each of them. I also want to modify the name, so that it consists only of the value of count. In fact, what I want to obtain is, for example:
272019
TACCTGGTTGATCCTGCCAG
48613
TTTGGATTGAAGGGAGCTCTA
15422
TTTGGATTGAAGGGAGCTCT
9818
TTGGACTGAAGGGAGCT
8783
TTGGACTGAAGGGAGCTCCCT
I also want that if a sequence is only in file1 for exemple, in file2 that sequence appears but with a count value of 0. Same thing if the sequence is only in file2, I want that in file1 it appears with a count value of 0.
So i wrote a script that seems to work for sorting my sequences in the same order in the two files; I say seems to work cause now, the two files have the same nomber of lines of sequences and they are in the same order.
But I have a problem: I don't know how to extract the count value and keep it associated with its sequence and how to say I want this value to be 0 if my sequence is only in one file. Cause, if I understood well the principle of hash, with my sequences having a different count value in each file, I can't do a
Code:
if (exists $results{$count} { $results{$count}++; }
Well all that to say I'm lost and blocked on this part of my script, so if anyone can help me, I will really appreciate it. Thank you very much.
By the way, here is the code I wrote, that works for sorting my sequences (if anybody thinks this code is not really good and could be optimised, don't hesitate to tell!):
Code:
use warnings; use strict; my $fast1="C:/Users/Moi/fichier1.fasta"; open (my $IN1, "<", $fast1) or die "Impossible d'ouvrir le fichier $fast1 $!"; my $fast2="C:/Users/Moi/fichier2.fasta"; open (my $IN2, "<", $fast2) or die "Impossible d'ouvrir le fichier $fast2 $!"; my $trie1="C:/Users/Moi/fichier1bis.fasta"; open (my $OUT1, ">", $trie1) or die "Impossible d'ouvrir le fichier $trie1 $!"; my $trie2="C:/Users/Moi/fichier2bis.fasta"; open (my $OUT2, ">", $trie2) or die "Impossible d'ouvrir le fichier $trie2 $!"; my %results; #my @tab; my ($name, $line); while($name = <$IN1> ) { $line=<$IN1>; chomp $name; chomp $line; #@tab = split (/:/, $name); #$count=$tab[1]; $results{$line}=1; } while($name = <$IN2>) { $line=<$IN2>; chomp $name; chomp $line; #@tab = split (/:/, $name); #$count=$tab[1]; if (exists $results{$line}) { $results{$line}++; } } foreach $line (keys %results) { if ($results{$line} == 1) { print $OUT1 "$line\n"; } if ($results{$line}==2) { print $OUT1 "$line\n"; print $OUT2 "$line\n"; } else { print $OUT2 "$line\n"; } } close ($IN1); close ($IN2); close ($OUT1); close ($OUT2);
Once again, thanks for any help you can give me!
Have a nice day!
Comment