Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Shorash
    Member
    • Sep 2012
    • 18

    Reciprocal blast help

    I have 2 datasets (Dataset A and dataset B) in which I have reciprocally allocated as BLASTx hits against one another. However I am having difficulties in identifying those contigs that are hits to one another from both datasets. I have been able to export the BLAST results into sequence tables but I'm not sure how I can identify the reciprocal top hits of one another for a large number of contigs (160,000). Here is an example of how it looks in an Excel spreadsheet:

    aaaaaDataset Aaaaaaaaaaaaaaaaa Dataset B
    Contig123=Contig789_1 Contig789=111Contig123_1
    Contig456=72Contig221 Contig221=Contig456_3
    Contig777=43Contig954 Contig954=3Contig1561_1


    In the example above you can see that the results or hit from each file have characters on the beginning and sometimes on the end of each corresponding hit making it hard to compare using excel formulas. In the example, the first two rows are the ones I'm interested in extracting as they have hit the same contig in both datasets, unlike row 3 which do not match.

    Any help would be greatly appreciated!
    Last edited by Shorash; 07-15-2014, 08:40 PM.
  • bio_boris
    Member
    • Mar 2013
    • 14

    #2
    What have you come up with so far? Do you have any scripts to use or code that you have tried to create?

    Some resources


    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

    Comment

    • someperson
      Junior Member
      • Jul 2013
      • 9

      #3
      Not really a direct answer to your question, but a tip for a tool that probably already does what you want:
      I'm mostly using the tool proteinortho (curren version proteinortho5) for reciprocal blast analyses.
      A strong advantage of this tool is, that it does not only direct orthologs via direct reziprocal blast, but can also list the respective paralogs and group them into "orthologeous groups".
      links:

      Comment

      • Shorash
        Member
        • Sep 2012
        • 18

        #4
        Originally posted by bio_boris View Post
        What have you come up with so far? Do you have any scripts to use or code that you have tried to create?

        Some resources


        http://seqanswers.com/forums/showthread.php?t=20652
        I haven't managed to create any scripts or codes. I've been manually looking at specific genes of interest but it would be great to be able to do all of them at once.

        Comment

        • Shorash
          Member
          • Sep 2012
          • 18

          #5
          Originally posted by someperson View Post
          Not really a direct answer to your question, but a tip for a tool that probably already does what you want:
          I'm mostly using the tool proteinortho (curren version proteinortho5) for reciprocal blast analyses.
          A strong advantage of this tool is, that it does not only direct orthologs via direct reziprocal blast, but can also list the respective paralogs and group them into "orthologeous groups".
          links:

          http://www.biomedcentral.com/1471-2105/12/124
          Great thanks for that, I'll give this a try.

          Comment

          • rhinoceros
            Senior Member
            • Apr 2013
            • 372

            #6
            Originally posted by Shorash View Post
            how I can identify the reciprocal top hits of one another for a large number of contigs (160,000).
            I would go for tabular blast output and first sort for best hits. So then, depending how you did your blasts, you can have e.g. two best-hit sorted output files with query in the first column and subject in the second. One option would be to cut columns 1-2 and switch the the order in one file and then cat it with the other file. Then you'd sort based on column 1 and only output the lines where uniq -c is 2. I'm sure there's an awk one-liner for this too..
            savetherhino.org

            Comment

            Latest Articles

            Collapse

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            30 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            38 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            42 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 12:03 PM
            0 responses
            64 views
            0 reactions
            Last Post SEQadmin2  
            Working...