SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Multiple Alignment software for huge amount of peptide sequences/cysteine framework LucasVS Bioinformatics 3 03-28-2012 07:15 PM
ISAS Alignment Software BioWizard Bioinformatics 22 06-24-2011 03:48 PM
Annotation alignment software???? targetbcell Bioinformatics 2 05-04-2011 03:15 AM
ask for gapped (indel) alignment software polyhedron General 11 03-23-2011 03:01 AM
alignment software and ref sequence mlee Bioinformatics 3 01-18-2010 04:25 AM

Reply
 
Thread Tools
Old 02-27-2009, 12:56 PM   #1
found
Junior Member
 
Location: Miami

Join Date: Feb 2009
Posts: 8
Question So many software for alignment!!!

Hi,

I am new to the area of mRNA-Seq data analysis, actually also not "old" to the bioinformatics.
Recently, I have read a paper involved in the mapping of mRNA-Seq data to the human genome by ELAND. I am very interested in the mapping result and want to do some data mining for my own purpose. And the author only gives out the original mRNA-Seq data.
I have browsed this forum for days and learned a lot. first thing is that ELAND is not available for me. I found that someone have listed so many software for short read sequence alignment and I feel really confused about that which one is suitable for me. That's why I am here to get some suggestions from experienced expert.

My object is to map all of these mRNA-Seq to a database which includes the human genome and my own set of sequences(56bp). At most two mismatches and no gaps are allowed. Since a small part of sequences my contain symbol '.' such as "ATCTAT.CGTACG.GCTAGTGGTGAAGG", the software should not filter '.' out and consider it as a mismatch during alignment.

Can you recommend a software to implement this purpose. Of course, both accuracy and speed are important. I think accuracy is more important if they can not be satisfied simultaneously.

Last edited by found; 02-27-2009 at 01:03 PM.
found is offline   Reply With Quote
Old 03-01-2009, 03:58 AM   #2
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default

IMHO, I think method with Burrows-Wheeler transform are now the most optimal.

Bowtie is quick and easy...
bwa in the MAQ package is more flexible and extensive.

Hope this helps.
doxologist is offline   Reply With Quote
Old 03-01-2009, 03:59 AM   #3
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default

oh.. by the way... the prev recommendations is for data with Illumina and 454. If you want to analyze Solid data, different software must be used for colorspace. Visit the solid thread for more info there.
doxologist is offline   Reply With Quote
Old 03-01-2009, 04:41 PM   #4
found
Junior Member
 
Location: Miami

Join Date: Feb 2009
Posts: 8
Default

Thanks l lot. it is Illumina data.
when you mentioned "Burrows-Wheeler transform", is it a tool, or an algorithm? or it is just the full name of "bwa in the MAQ package"?

Last edited by found; 03-01-2009 at 04:45 PM.
found is offline   Reply With Quote
Old 03-03-2009, 12:20 AM   #5
jkbonfield
Senior Member
 
Location: Cambridge, UK

Join Date: Jul 2008
Posts: 146
Default

Burrows-Wheeler transform is an algorithm, originally used for data compression in tools like bzip. The interest for bioinformatics is that with more recent tweaks (something called "FM Index") it can be both a compression tool and also an indexing tool.

This indexing tool is what makes it good for short-read aligners, bwa and bowtie being two such examples. Whether this method is optimal for you depends on the size of the reference genome you're comparing against.

As for why there are so many tools, well it's harder than you'd think. Sure it's easy to simply align data, but to do it for read-pair data with short indels taking into account multiple matches and computing probabilities that the sequence has been misaligned (etc) all adds up to a complex task. This has lead to a lot of competition between groups. I expect in time the number will dwindle as we get winners and losers.
jkbonfield is offline   Reply With Quote
Old 03-03-2009, 06:35 AM   #6
doxologist
Member
 
Location: USA

Join Date: Jan 2009
Posts: 96
Default

I agree... there would eventually be a few dominant tools for each type of information space... which would take the best algorithms and be the most data-friendly. The initiatives for common format would make this process much more efficient.

I think the future added value would no longer just be alignment, but what's downstream. SNP detection, paired ends, indel detection, handling large numbers of samples, etc.
doxologist is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:53 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO