Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Align two sets of amino acid sequences

    Hi all, I am a microbiology student with little knowledge to bioinformatics and programming outside of Illumina reads alignment and denovo assembly. Recently I was tasked to compare two closely related strains of the same species and to identify unique pathways amongst the two that allow each to specialize in their niche based on the complete sequences present on genbank prior to any wet lab procedures.

    To approach this, I plan to extract all open reading frames from both strains, pull out shared/highly similar ORF and unique ORF, then find the pathway the unique ORFs are involved in to draw conclusion. I've extracted all open reading frames from both organisms using prodigal, based solely on the in frame non-interupted sequence between start and stop codon. Each of strain have about 2000 real and hypothetical AA sequences.

    now I'm stuck trying to extract shared and unique sequences from both organisms. Are there any programs that is suitable for this task? All replies are appreciated!

  • #2
    If these two strains are relatively closely related then you can identify the similarities using BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html). Post-alignment processing will have to be done to extract the information you need from the results.

    You could learn to do some of this but if you are working against a deadline then it may be better to find a programmer friend or your local bioinformatics support facility. They should be able to this for you.

    Comment


    • #3
      CD-HIT-2D may be useful: http://weizhong-lab.ucsd.edu/cdhit_s...?cmd=cd-hit-2d

      Best of all you can try it yourself without waiting for someone's help. You may still need to do some parsing afterwards.

      Comment


      • #4
        Originally posted by GenoMax View Post
        If these two strains are relatively closely related then you can identify the similarities using BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html). Post-alignment processing will have to be done to extract the information you need from the results.

        You could learn to do some of this but if you are working against a deadline then it may be better to find a programmer friend or your local bioinformatics support facility. They should be able to this for you.
        thank you! I checkout out the programs that you suggested, but I ended up generating a fake sets of illumina reads out of both sequences using Simseq https://github.com/jstjohn/SimSe,
        then I used bowtie2 to align them to each other and pulled out reads that dont align, then denovo assemble them into short contigs and extracted their ORF which codes for unique proteins.
        I'm book marking BLAT as it seem like a fairly useful program.

        Edited: bolded out my procedure to make it easier to read
        Last edited by zerhacker; 12-02-2014, 04:49 PM.

        Comment


        • #5
          Long as you were able to get what you needed :-)

          What program did you use to generate the "illumina" reads. Just for the record. For someone running across this thread later-on via a search.

          Comment


          • #6
            Originally posted by GenoMax View Post
            Long as you were able to get what you needed :-)

            What program did you use to generate the "illumina" reads. Just for the record. For someone running across this thread later-on via a search.
            GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.

            I think Simseq works great. but I used a python script wrote by the departments programmer that works similarly.
            Last edited by zerhacker; 12-02-2014, 04:50 PM.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X