Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Align two sets of amino acid sequences

    Hi all, I am a microbiology student with little knowledge to bioinformatics and programming outside of Illumina reads alignment and denovo assembly. Recently I was tasked to compare two closely related strains of the same species and to identify unique pathways amongst the two that allow each to specialize in their niche based on the complete sequences present on genbank prior to any wet lab procedures.

    To approach this, I plan to extract all open reading frames from both strains, pull out shared/highly similar ORF and unique ORF, then find the pathway the unique ORFs are involved in to draw conclusion. I've extracted all open reading frames from both organisms using prodigal, based solely on the in frame non-interupted sequence between start and stop codon. Each of strain have about 2000 real and hypothetical AA sequences.

    now I'm stuck trying to extract shared and unique sequences from both organisms. Are there any programs that is suitable for this task? All replies are appreciated!

  • #2
    If these two strains are relatively closely related then you can identify the similarities using BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html). Post-alignment processing will have to be done to extract the information you need from the results.

    You could learn to do some of this but if you are working against a deadline then it may be better to find a programmer friend or your local bioinformatics support facility. They should be able to this for you.

    Comment


    • #3
      CD-HIT-2D may be useful: http://weizhong-lab.ucsd.edu/cdhit_s...?cmd=cd-hit-2d

      Best of all you can try it yourself without waiting for someone's help. You may still need to do some parsing afterwards.

      Comment


      • #4
        Originally posted by GenoMax View Post
        If these two strains are relatively closely related then you can identify the similarities using BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html). Post-alignment processing will have to be done to extract the information you need from the results.

        You could learn to do some of this but if you are working against a deadline then it may be better to find a programmer friend or your local bioinformatics support facility. They should be able to this for you.
        thank you! I checkout out the programs that you suggested, but I ended up generating a fake sets of illumina reads out of both sequences using Simseq https://github.com/jstjohn/SimSe,
        then I used bowtie2 to align them to each other and pulled out reads that dont align, then denovo assemble them into short contigs and extracted their ORF which codes for unique proteins.
        I'm book marking BLAT as it seem like a fairly useful program.

        Edited: bolded out my procedure to make it easier to read
        Last edited by zerhacker; 12-02-2014, 04:49 PM.

        Comment


        • #5
          Long as you were able to get what you needed :-)

          What program did you use to generate the "illumina" reads. Just for the record. For someone running across this thread later-on via a search.

          Comment


          • #6
            Originally posted by GenoMax View Post
            Long as you were able to get what you needed :-)

            What program did you use to generate the "illumina" reads. Just for the record. For someone running across this thread later-on via a search.
            GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.

            I think Simseq works great. but I used a python script wrote by the departments programmer that works similarly.
            Last edited by zerhacker; 12-02-2014, 04:50 PM.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            45 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X