Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Blast multifasta protein to genome and get protein which match to genome.

    Hello,

    I want to blast multifasta protein to genome and get multifasta protein file with all proteins that aligned to the genome. Is there any way to do this?

    It is even fine if I can get all the header of proteins that aligned to the genome.

    Thanks in advance
    Sandesh

  • #2
    If this multifasta file comes from the same (or a closely related genome) then blat may be the fastest option to get the alignments you need. You can choose several output formats with blat.

    You will need to do some additional parsing to pare down your original multifasta file to only retain sequences that had a "hit" in the genome.
    Last edited by GenoMax; 01-25-2015, 11:20 AM.

    Comment


    • #3
      Use makeblastdb to turn your genome into an nucleotide BLAST database. Then use tblastn since you have protein queries and a nucleotide database (with suitable score threshold). Filter your protein file according to if any hits were found. Done?

      Comment


      • #4
        Thanks for response. I am trying to align uniref90 proteins to get all protein hits and run exonerate later with that hit because running exonerate with uniref90 is really slow. So, i want to filter proteins to run exonerate afterwards.

        There are around 22 millions proteins in uniref90.fasta.

        I will try both of your suggestion. Let me see how it goes.

        Comment


        • #5
          I was assuming you had a few thousand proteins (or a few tens of thousands of proteins), for example the predicted protein set of one organism. Not 22 millions proteins (!).

          How big is your genome (base pairs)? And is it nicely assembled into a few chromosomes, or in many contigs? How many contigs?

          The relative size of the protein set and the genome size will strongly influence the best approach (e.g. which to use as the query and which as the database), or if BLAST is even suitable.

          Comment


          • #6
            @Peter: Sandesh probably has a few thousand proteins. He has been trying to align them to uniprot ref clusters data.

            @Sandesh: This is probably not the most efficient way to try and annotate a new genome/proteome. As Peter asked above, can you tell us how you arrived at this set of proteins? Is there a related genome available?

            Comment


            • #7
              Actually I am trying to annotate genome size of 65 MB. It has 18 linkage groups with other remaining scaffolds (which may be around 800 small scaffolds). Most of the sequence are in LGs.

              I got the the protein data from http://www.ebi.ac.uk/uniprot/database/download.html.
              Yes uniprot ref clusture. I was using this protein to align using exonerate but took long.

              Maybe I am doing wrong.
              There are other related species for this organisms, which has been sequenced and annotated.

              Please correct me if I am doing wrong.
              Last edited by sandesh; 01-26-2015, 07:16 AM.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X