![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Short Read Micro re-Aligner (beta release) | nilshomer | Bioinformatics | 27 | 04-17-2014 09:29 AM |
short read aligner with 3 mismatch and one gap allowed | NicoBxl | Bioinformatics | 2 | 11-09-2011 11:26 AM |
The best short read aligner | Deutsche | Bioinformatics | 4 | 04-14-2011 08:12 PM |
Short Read Micro re-Aligner Paper | nilshomer | Literature Watch | 0 | 10-29-2010 10:59 AM |
Very Short Read aligner | Rupinder | Bioinformatics | 1 | 06-02-2009 08:10 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Kuala Lumpur, Malaysia Join Date: Mar 2008
Posts: 126
|
![]()
Hi,
I've been working on a short read aligner and would like to find some beta testers. The suite includes single end and paired end read aligners. Some features are:
If anyone is interested in getting a copy for testing you can contact me novoalign <at> gmail .... Beta version is for X86-64 Linux 64 bit. Cheers, Colin Last edited by sparks; 06-17-2008 at 12:51 AM. |
![]() |
![]() |
![]() |
#2 |
Member
Location: Vancouver, Canada Join Date: Feb 2008
Posts: 44
|
![]()
Hi Colin.
This sounds cool. Can you just confirm for us whether or not you plan to make this aligner open source? |
![]() |
![]() |
![]() |
#3 |
Senior Member
Location: Kuala Lumpur, Malaysia Join Date: Mar 2008
Posts: 126
|
![]()
Hi myrna,
At the moment it's not open source but it will be free for open projects and non-profit organisations. I might make it open source if I had some funding. Colin Last edited by sparks; 06-16-2008 at 08:13 AM. |
![]() |
![]() |
![]() |
#4 |
Junior Member
Location: AZ Join Date: May 2008
Posts: 1
|
![]()
I have done some testing of sparks’ program Novoalign.
This program seems to be incredibly fast. It requires only about 6 GB of physical RAM for aligning to human genome. Using simulated reads with no mismatches the program gives the same results as SOAP, however Novoalign is more than 100x faster (half million reads has taken only over half minute ![]() I have tested some real SOLID reads translated to base space as well. Novoalign was very fast again relative to SOAP, 50 000 reads in 3 min. I used the trimming feature to help with alignment of reads that were mistranslated due to read errors. The results of uniquely mapped SOLID reads from Novoalign and SOAP were 99.96 % identical. I would like to know whether ELAND which is supposed to be the fastest aligner would beat Novoalign ![]() Last edited by tree; 07-07-2008 at 09:40 PM. |
![]() |
![]() |
![]() |
#5 |
NGS specialist
Location: Malaysia Join Date: Apr 2008
Posts: 249
|
![]()
I've been using novoalign as well and my bet is that ELAND should be faster than novoalign at default because novoalign will spend a little more time looking for those extra mismatches and gaps. At a threshold of 60 novoalign should be as fast as ELAND or perhaps a bit faster. ELAND achieves better performance because it indexes reads and does a fast scan of the genome.
Perhaps somebody would be willing to try it out. Take a few million paired-end/single-end reads and see how novoalign at threshold 60 would do in comparison to ELAND on the same server specification. |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Boston Join Date: Feb 2008
Posts: 693
|
![]()
I have just tried novo*. A wonderful software. As previously, I only tried it on human chrX. It is as fast as eland. I kind of believe novo* should be faster on the whole human genome as indexing will be more efficient than on chrX.
(Sorry, I was wrong previously and so remove the paragraph. Quite amazing to me. And as I was wrong, novo* looks even superior.) I think it is very important for novo* to support multithreading; otherwise parallelization would be a big problem. Novopair does work for me and it improves overall alignment accuracy. However, novopair is overoptimistic about the alignment accuracy. The error rate of Q150 alignments is 0.05%. This error rate is good enough, but it would be better to improve this more or less. This may be of more theoretical concern. In all, novo* is really a good set of programs. It is fast and integrates the advantages of most existing programs. I just hope the author could get funding and make it an open source project. PS: So far as I know, only SOLiD's own software and shrimp fully supports color alignment. Maq does partially. Both novo* and soap do not support color alignments. Note that it is not right to do SOLiD alignment in the nucleotide space. Last edited by lh3; 07-15-2008 at 03:41 AM. |
![]() |
![]() |
![]() |
#7 |
NGS specialist
Location: Malaysia Join Date: Apr 2008
Posts: 249
|
![]()
see next...
Last edited by zee; 07-15-2008 at 03:24 AM. |
![]() |
![]() |
![]() |
#8 |
NGS specialist
Location: Malaysia Join Date: Apr 2008
Posts: 249
|
![]()
Thanks for comments Ih3. We're working on improving accuracy. Something to be aware of with novo is the alignment threshold, the "-t" parameter. Setting this very high e.g. -140, for single-end a alignment will report more false positives (FP) . It's always tricky working out the right default threshold. Setting it too high will escalate FP, and it's too low e.g. > -60, then you dont pick up enough.
I think the author will be aware of these technicalities and this sort of feedback will help to improve the software. The foreseeable plans are to keep it open for just about everybody in the research community. |
![]() |
![]() |
![]() |
#9 |
Senior Member
Location: Kuala Lumpur, Malaysia Join Date: Mar 2008
Posts: 126
|
![]()
Hi Li Heng,
Thanks for your kind comments. Performance slows on larger genomes as more possible alignment locations are evaluated for each read. Additional memory helps here as it makes the index more specific and while it can be run on an 8GB RAM server (Full Human) a 16G or 32G server is going to be 4 or 5 times faster. With regard multithreading the index is memory mapped and it's quite possible to run multiple copies of novoalign (same target genome) without any increase in memory. That said multithreading wouldn't be too difficult as search classes are all designed to handle it. I need to see if there is a real demand. The quality calculation is similar in principle to maq, it is Bayesian Posterior probability that the alignment is wrong. Some factors are estimated and one possible problem is that I rate the reference genome at 2bits of entropy/base, this may be the cause of the high qualities. I deliberately haven't done SOLID as I'd like to it properly or not at all. That said, if someone wants to try I suggest converting the reference genome to colour space rather than the reads to nucleotide space. |
![]() |
![]() |
![]() |
#10 |
Senior Member
Location: Kuala Lumpur, Malaysia Join Date: Mar 2008
Posts: 126
|
![]()
Just one more point, even though novoalign uses a k-mer index of the genome it is not a seeded alignment ala Blast/Blat/Shrimp. It's an iterative alignment that will match the read against k-mers in the index using a combinatorial approach (with gaps).
|
![]() |
![]() |
![]() |
#11 |
Senior Member
Location: Boston Join Date: Feb 2008
Posts: 693
|
![]()
see below...
Last edited by lh3; 07-15-2008 at 06:11 AM. |
![]() |
![]() |
![]() |
#12 |
Senior Member
Location: Boston Join Date: Feb 2008
Posts: 693
|
![]()
Lately I could vaguely see the possibility that how this can be done. But I am still keen to see the details if you publish the algorithm some day. Nice work!
|
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: Kuala Lumpur, Malaysia Join Date: Mar 2008
Posts: 126
|
![]()
Think blastp type seeding with qualities replacing blossum matrix and add gaps.
|
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: Kuala Lumpur, Malaysia Join Date: Mar 2008
Posts: 126
|
![]()
I've been back and looked at or error rate on simulated reads and it's typically around 0.005% without selecting for quality. We've used maq simulate modified to insert longer indels and paf_utils (great tools) but we also had to modify this to allow a few extra bases uncertainty in alignment location as novo aligners are much more likely to add a few gaps into an alignment than perhaps maq does.
|
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: Kuala Lumpur, Malaysia Join Date: Mar 2008
Posts: 126
|
![]()
Hi all,
I've just put an update to novoalign & novopaired. This update improves quality scores for novopaired and also fixes a illegal instruction fault reported by one user. You can download at www.novocraft.com I've also changed the license term so it's free for any non-profit even if you don't publish in open journals. Colin |
![]() |
![]() |
![]() |
#16 |
Member
Location: Vancouver, Canada Join Date: Feb 2008
Posts: 44
|
![]()
Hi Colin.
I have been working with Novoalign a bit and am finding it useful in picking up indels and SNPs missed by other aligners. I am wondering if it can also pick up structural aberrations that I have missed using other approaches. Is there an update on the timelines for the following features, mentioned in the documentation: "novostruct Uses paired end alignments to identify locations where the individual being sequenced is structurally different to the reference sequences. This could be inter sequence variations such as large insertions, deletions and inversions or inter sequence variations. Jul'08 novoasm Using results from novoalign and novopair calls SNPs and short indels. ACE format output is provided for viewing of alignments. Aug '08 novodensity Read density analysis for copy number, expression level and, peak detection. Aug '08" ? Thanks, Ryan Last edited by myrna; 08-11-2008 at 01:39 PM. |
![]() |
![]() |
![]() |
#17 |
NGS specialist
Location: Malaysia Join Date: Apr 2008
Posts: 249
|
![]()
Hey Myrna,
If you're interested in knowing more about what we're doing with SNP/Assembly, see http://www.novocraft.com/wiki/tiki-v...desc&forumId=1 |
![]() |
![]() |
![]() |
#18 |
Member
Location: Vancouver, Canada Join Date: Feb 2008
Posts: 44
|
![]()
Thanks for the link, this was just what I needed. I will give the Novoalign->Eland->Maq conversion a try. What do you see as the largest problem/concern caused by the loss of mapping scores in doing this conversion? Do you think there would be some way to scale the Novoalign scores to Maq's mapping quality scale such that you could include them?
|
![]() |
![]() |
![]() |
#19 |
NGS specialist
Location: Malaysia Join Date: Apr 2008
Posts: 249
|
![]()
This is an area we're trying to perfect at the moment. Basically you gotta know that novoalign mapping quality scores are meant to be as close to maq mapping qualities as we hope to get. Therefore scaling may not be necessary if we can show that low quality novoalign mapping qualities are the same as those for maq , and vice versa for maq.
The .map file is the key here because it contains this information and we're neglecting these by using eland format ![]() The good news is that because we're mapping more with novoalign you have more SNPs being called. We hope to have this format conversion with quality scores ready by next week. Perhaps you can send me a private msg and I can provide you with some charts showing how these mapping qualities compare between novoalign and maq?? |
![]() |
![]() |
![]() |
#20 |
Member
Location: Vancouver, Canada Join Date: Feb 2008
Posts: 44
|
![]()
I would think that using the export file format as an intermediate (instead of the eland format) would allow you to get around the base (and mapping) quality issue. Heng Li, have you (or anyone else) attempted to convert novo* outputs into native Maq alignment files?
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|