Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zerhacker
    Junior Member
    • Nov 2014
    • 3

    Align two sets of amino acid sequences

    Hi all, I am a microbiology student with little knowledge to bioinformatics and programming outside of Illumina reads alignment and denovo assembly. Recently I was tasked to compare two closely related strains of the same species and to identify unique pathways amongst the two that allow each to specialize in their niche based on the complete sequences present on genbank prior to any wet lab procedures.

    To approach this, I plan to extract all open reading frames from both strains, pull out shared/highly similar ORF and unique ORF, then find the pathway the unique ORFs are involved in to draw conclusion. I've extracted all open reading frames from both organisms using prodigal, based solely on the in frame non-interupted sequence between start and stop codon. Each of strain have about 2000 real and hypothetical AA sequences.

    now I'm stuck trying to extract shared and unique sequences from both organisms. Are there any programs that is suitable for this task? All replies are appreciated!
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    If these two strains are relatively closely related then you can identify the similarities using BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html). Post-alignment processing will have to be done to extract the information you need from the results.

    You could learn to do some of this but if you are working against a deadline then it may be better to find a programmer friend or your local bioinformatics support facility. They should be able to this for you.

    Comment

    • GenoMax
      Senior Member
      • Feb 2008
      • 7142

      #3
      CD-HIT-2D may be useful: http://weizhong-lab.ucsd.edu/cdhit_s...?cmd=cd-hit-2d

      Best of all you can try it yourself without waiting for someone's help. You may still need to do some parsing afterwards.

      Comment

      • zerhacker
        Junior Member
        • Nov 2014
        • 3

        #4
        Originally posted by GenoMax View Post
        If these two strains are relatively closely related then you can identify the similarities using BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html). Post-alignment processing will have to be done to extract the information you need from the results.

        You could learn to do some of this but if you are working against a deadline then it may be better to find a programmer friend or your local bioinformatics support facility. They should be able to this for you.
        thank you! I checkout out the programs that you suggested, but I ended up generating a fake sets of illumina reads out of both sequences using Simseq https://github.com/jstjohn/SimSe,
        then I used bowtie2 to align them to each other and pulled out reads that dont align, then denovo assemble them into short contigs and extracted their ORF which codes for unique proteins.
        I'm book marking BLAT as it seem like a fairly useful program.

        Edited: bolded out my procedure to make it easier to read
        Last edited by zerhacker; 12-02-2014, 04:49 PM.

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #5
          Long as you were able to get what you needed :-)

          What program did you use to generate the "illumina" reads. Just for the record. For someone running across this thread later-on via a search.

          Comment

          • zerhacker
            Junior Member
            • Nov 2014
            • 3

            #6
            Originally posted by GenoMax View Post
            Long as you were able to get what you needed :-)

            What program did you use to generate the "illumina" reads. Just for the record. For someone running across this thread later-on via a search.
            GitHub is where people build software. More than 150 million people use GitHub to discover, fork, and contribute to over 420 million projects.

            I think Simseq works great. but I used a python script wrote by the departments programmer that works similarly.
            Last edited by zerhacker; 12-02-2014, 04:50 PM.

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            16 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            49 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            108 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            125 views
            0 reactions
            Last Post SEQadmin2  
            Working...