Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
alignment mir531 General 0 12-27-2011 08:59 PM
QC of alignment henning General 0 07-05-2011 11:52 AM
Are there any alignment programs that take SNPs into account during alignment? sdarko Bioinformatics 2 06-04-2011 05:09 AM
gap alignment and local alignment? mingkunli Illumina/Solexa 3 02-19-2009 11:13 AM

Thread Tools
Old 05-28-2012, 03:44 AM   #1
Senior Member
Location: germany

Join Date: Oct 2009
Posts: 140
Default do we need alignment ?

I recently discovered to compare sequences by their matching subsequences
- say - of length 12 for nucleotides.
This has the advantage that you needn't align the sequences,
which may take quite some time (usinf MAFFT) for large sets
and can't be done from batch.
You can also store many subsequences in an array and quickly scan
whole databases for matches

Now I wonder how good/reliable that method is to measure
genetic similarity ? Is it generally being done to replace alignment ?

I'm mainly doing it for influenza
gsgs is offline   Reply With Quote
Old 05-28-2012, 06:46 AM   #2
Senior Member
Location: Boston area

Join Date: Nov 2007
Posts: 747

Whether you need any analysis is defined by what question you are asking. Since you haven't stated a hypothesis, it's rather challenging to answer whether a given approach would answer that hypothesis.

In any case, many sequence database search tools employ some variant on what you are describing. The Burrows-Wheeler transform that lies at the heart of Bowtie, BWA and many other short read aligners is a scheme for efficiently searching a very large table of subsequences -- simply storing subsequences in an array breaks down very quickly.
krobison is offline   Reply With Quote
Old 05-28-2012, 08:34 AM   #3
Senior Member
Location: germany

Join Date: Oct 2009
Posts: 140

multiple purposes. Currently I was trying to compute the distance-graph
(<x% nucleotide-differences ?) and then trying to find a small dominating
set, such than any other sequence is closer than (100-x)% to one
in the dominating set. Hopefully easier to handle than the big databases
while it still contains essentially all sequences.

Other applications/problems/questions: search for recombinations,
for periods of low mutation (graph: differences vs. time-difference)
or nucleotide vs. amino acid mutations

track mutations, where they appear first,
mutation pictures

for many of the purposes, i.e. the distance graph maybe the
alignment isn't needed ? But I'm not sure how "good" the
subsequence-distances are


the pic below shows that it is a bit less sensitive,
but acceptable

avian influenza segment1 pairs
amino-acid differences vs. nucleotide-differences
Attached Images
File Type: gif bits1c.gif (41.6 KB, 0 views)

Last edited by gsgs; 06-04-2012 at 08:24 AM.
gsgs is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 09:32 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO