I have 2 datasets (Dataset A and dataset B) in which I have reciprocally allocated as BLASTx hits against one another. However I am having difficulties in identifying those contigs that are hits to one another from both datasets. I have been able to export the BLAST results into sequence tables but I'm not sure how I can identify the reciprocal top hits of one another for a large number of contigs (160,000). Here is an example of how it looks in an Excel spreadsheet:
aaaaaDataset Aaaaaaaaaaaaaaaaa Dataset B
Contig123=Contig789_1 Contig789=111Contig123_1
Contig456=72Contig221 Contig221=Contig456_3
Contig777=43Contig954 Contig954=3Contig1561_1
In the example above you can see that the results or hit from each file have characters on the beginning and sometimes on the end of each corresponding hit making it hard to compare using excel formulas. In the example, the first two rows are the ones I'm interested in extracting as they have hit the same contig in both datasets, unlike row 3 which do not match.
Any help would be greatly appreciated!
aaaaaDataset Aaaaaaaaaaaaaaaaa Dataset B
Contig123=Contig789_1 Contig789=111Contig123_1
Contig456=72Contig221 Contig221=Contig456_3
Contig777=43Contig954 Contig954=3Contig1561_1
In the example above you can see that the results or hit from each file have characters on the beginning and sometimes on the end of each corresponding hit making it hard to compare using excel formulas. In the example, the first two rows are the ones I'm interested in extracting as they have hit the same contig in both datasets, unlike row 3 which do not match.
Any help would be greatly appreciated!
Comment