I am wondering if anyone has come across a script that would do something like this:
I have a file containing paired end reads but the order is mixed up and some reads are missing a pair (its from traditional sanger sequencing). The names are like below, the .r. for reverse and .f. for forward. Basically I am looking for something that will go through this file and generate two files, one with all the forward reads and one with all the reverse reads. However, the order in both files must be identical and therefore reads not in pairs will be excluded in the new files. By any chance, is anyone aware of something that would do something similar to this.
I can manage to create two files, one with forward and one with reverse and order this alphanumerically. So an alternative would be something that can compare both files and eliminate reads that don't match before the first '.' .
LPP_Ba-LPP_Ba0277N19.r.scf
LPP_Ba-LPP_Ba0277N19.f.scf
I have a file containing paired end reads but the order is mixed up and some reads are missing a pair (its from traditional sanger sequencing). The names are like below, the .r. for reverse and .f. for forward. Basically I am looking for something that will go through this file and generate two files, one with all the forward reads and one with all the reverse reads. However, the order in both files must be identical and therefore reads not in pairs will be excluded in the new files. By any chance, is anyone aware of something that would do something similar to this.
I can manage to create two files, one with forward and one with reverse and order this alphanumerically. So an alternative would be something that can compare both files and eliminate reads that don't match before the first '.' .
LPP_Ba-LPP_Ba0277N19.r.scf
LPP_Ba-LPP_Ba0277N19.f.scf