SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Get fasta amino-acid BLAST result aliealexandre Bioinformatics 8 03-25-2015 03:09 AM
evolutionary amino acid position conservation BioSlayer Bioinformatics 2 03-11-2014 04:44 PM
Complexity of amino acid sequence mudit Bioinformatics 0 03-28-2013 02:15 AM
Genomic coordinates for amino acid variation ... how ? niyl_p Bioinformatics 1 05-24-2012 05:30 PM
amino acid sequence from GTF file mhadidi2002 Bioinformatics 0 03-06-2012 05:03 AM

Reply
 
Thread Tools
Old 11-25-2014, 12:31 PM   #1
zerhacker
Junior Member
 
Location: Edmonton

Join Date: Nov 2014
Posts: 3
Default Align two sets of amino acid sequences

Hi all, I am a microbiology student with little knowledge to bioinformatics and programming outside of Illumina reads alignment and denovo assembly. Recently I was tasked to compare two closely related strains of the same species and to identify unique pathways amongst the two that allow each to specialize in their niche based on the complete sequences present on genbank prior to any wet lab procedures.

To approach this, I plan to extract all open reading frames from both strains, pull out shared/highly similar ORF and unique ORF, then find the pathway the unique ORFs are involved in to draw conclusion. I've extracted all open reading frames from both organisms using prodigal, based solely on the in frame non-interupted sequence between start and stop codon. Each of strain have about 2000 real and hypothetical AA sequences.

now I'm stuck trying to extract shared and unique sequences from both organisms. Are there any programs that is suitable for this task? All replies are appreciated!
zerhacker is offline   Reply With Quote
Old 11-25-2014, 03:00 PM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,060
Default

If these two strains are relatively closely related then you can identify the similarities using BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html). Post-alignment processing will have to be done to extract the information you need from the results.

You could learn to do some of this but if you are working against a deadline then it may be better to find a programmer friend or your local bioinformatics support facility. They should be able to this for you.
GenoMax is offline   Reply With Quote
Old 11-25-2014, 03:14 PM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,060
Default

CD-HIT-2D may be useful: http://weizhong-lab.ucsd.edu/cdhit_s...?cmd=cd-hit-2d

Best of all you can try it yourself without waiting for someone's help. You may still need to do some parsing afterwards.
GenoMax is offline   Reply With Quote
Old 12-02-2014, 02:26 PM   #4
zerhacker
Junior Member
 
Location: Edmonton

Join Date: Nov 2014
Posts: 3
Default

Quote:
Originally Posted by GenoMax View Post
If these two strains are relatively closely related then you can identify the similarities using BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html). Post-alignment processing will have to be done to extract the information you need from the results.

You could learn to do some of this but if you are working against a deadline then it may be better to find a programmer friend or your local bioinformatics support facility. They should be able to this for you.
thank you! I checkout out the programs that you suggested, but I ended up generating a fake sets of illumina reads out of both sequences using Simseq https://github.com/jstjohn/SimSe,
then I used bowtie2 to align them to each other and pulled out reads that dont align, then denovo assemble them into short contigs and extracted their ORF which codes for unique proteins.
I'm book marking BLAT as it seem like a fairly useful program.

Edited: bolded out my procedure to make it easier to read

Last edited by zerhacker; 12-02-2014 at 03:49 PM.
zerhacker is offline   Reply With Quote
Old 12-02-2014, 02:58 PM   #5
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,060
Default

Long as you were able to get what you needed :-)

What program did you use to generate the "illumina" reads. Just for the record. For someone running across this thread later-on via a search.
GenoMax is offline   Reply With Quote
Old 12-02-2014, 03:37 PM   #6
zerhacker
Junior Member
 
Location: Edmonton

Join Date: Nov 2014
Posts: 3
Default

Quote:
Originally Posted by GenoMax View Post
Long as you were able to get what you needed :-)

What program did you use to generate the "illumina" reads. Just for the record. For someone running across this thread later-on via a search.
https://github.com/jstjohn/SimSe
I think Simseq works great. but I used a python script wrote by the departments programmer that works similarly.

Last edited by zerhacker; 12-02-2014 at 03:50 PM.
zerhacker is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:52 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO