SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Short Read Micro re-Aligner (beta release) nilshomer Bioinformatics 27 04-17-2014 08:29 AM
short read aligner with 3 mismatch and one gap allowed NicoBxl Bioinformatics 2 11-09-2011 10:26 AM
The best short read aligner Deutsche Bioinformatics 4 04-14-2011 07:12 PM
Short Read Micro re-Aligner Paper nilshomer Literature Watch 0 10-29-2010 09:59 AM
Very Short Read aligner Rupinder Bioinformatics 1 06-02-2009 07:10 PM

Reply
 
Thread Tools
Old 08-18-2008, 11:16 PM   #21
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Hey Myrna,
It's ready to try out. Pls see

http://seqanswers.com/forums/showthr...=1326#post1326
zee is offline   Reply With Quote
Old 08-27-2008, 10:49 PM   #22
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Hi Myrna,
We've aded a function to maq that converts native novo... report formats into maq map format. The source code is available in our forum. This conversion maintains the quality values and also converts gapped alignments, which is not possible if conversion is done from the Eland report format.
With this conversion you can use maq to do the assemblies and call SNPs and Indels. You can even use maq indelpe on single end reads aligned by novoalign and then converted to maq.
Our plans for our own assembly, SNP caller etc are running a bit behind.
Cheers, Colin
sparks is offline   Reply With Quote
Old 09-03-2008, 12:31 AM   #23
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Thumbs up Mulithreading now supported in NovoCraft Aligners

Multithreading has been added to novoalign and novopaired. The results look really good.

We ran some tests on our new multithreaded version to evaluate alignment performance on a small set of 200K Illumina reads versus the Human Genome NCBI36. The 200x36x37-071207_EAS51_0064-s_2_1.fastq and 200x36x37-071207_EAS51_0064-s_2_2.fastq FASTQ-formatted files were downloaded from the ftp://ftp.ncbi.nih.gov/pub/TraceDB/S...A000271/fastq/ FTP site. The first 200,000 reads in these files were used.
A linux server with eight 2.33 Ghz CPU Cores and 32Gb RAM were used. Time was monitored from the elapsed time figure in novopaired/novoalign output reports using UNIX tail.


CPU usage was monitored and it was found that using 8/8 cores didnt improve performance much over using 7/8 cores.

There appears to be a significant gain in performance of the multithreaded versions of novopaired and novoalign ( figure 1).



Table 1: Performance of multithreaded novoalign and novopaired on 200,000 Illumina reads searched against the NCBI36 Human Genome





Columns 4 and 5 are % of time taken with 1 CPU therefore 4 Cores takes 1/4 time of using 1 CPU, and 7 cores 14.8% (table 1). Each alignment process consumed at most 16.1Gb (52% RAM).
zee is offline   Reply With Quote
Old 09-10-2008, 11:42 AM   #24
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

I finally got to use novoalign and use novo2maq to make SNP calls. It seems the depth of coverage I see on SNP calls from novo aligned data is much lesser than that from MAQ.. almost 1/3.

Why would that be?

Of the 4 million reads in the lane, novoalign mapped only 1.6 million (all default params)
bioinfosm is offline   Reply With Quote
Old 09-10-2008, 01:46 PM   #25
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Bioinfosm , that's interesting. I'd expect that you would firstly find more high mapping quality reads with novoalign and that would improve the depth. However, if it's doing the opposite then it is something we'll need to look at.

If you've run the same data with MAQ then I assume you're using fastq-formatted reads.
I'm interested to see what the `maq mapstat' output for the novoalign and maq .map files are.
Something else to look at is when you did novo2maq did it convert the headers correctly. This is easily checked with maq mapview.

Could you perhaps send me a tail of the novoalign output and version as well?


Quote:
Originally Posted by bioinfosm View Post
I finally got to use novoalign and use novo2maq to make SNP calls. It seems the depth of coverage I see on SNP calls from novo aligned data is much lesser than that from MAQ.. almost 1/3.

Why would that be?

Of the 4 million reads in the lane, novoalign mapped only 1.6 million (all default params)
zee is offline   Reply With Quote
Old 09-10-2008, 05:09 PM   #26
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Hi Bioinfosm,
Further to Zees request could you include a head of the novoalign output as well as the tail.

Can you email directly to support at novocraft dot com

Thanks, Colin
sparks is offline   Reply With Quote
Old 09-11-2008, 11:29 AM   #27
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Thanks for the response. There was something with the headers which I noticed and correcting that gave me a lot more reads mapped by novoalign compared to maq. However, the qualities of some of them are pretty low, along with lots of flags when looking at the mapstat output.

I will email that data to support for further analysis...
btw whats your homopolymer filter?
bioinfosm is offline   Reply With Quote
Old 09-11-2008, 03:37 PM   #28
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

The homopolymer filter picks up reads that are all A's or all C's etc. i.e. the same base called in every position in the read. Some Illumina read files have a significiant percentage of these. They can be caused by dust on the slide or by camera picking up the edge of a lane.
sparks is offline   Reply With Quote
Old 09-11-2008, 05:51 PM   #29
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

With regard the flag values, the novo2maq module was incorrectly setting paired end flags on single end reads. I've posted and updated source file in our support forum at www.novocraft.com
sparks is offline   Reply With Quote
Old 09-13-2008, 06:37 AM   #30
myrna
Member
 
Location: Vancouver, Canada

Join Date: Feb 2008
Posts: 44
Default Flags

Oh no! I was just reveling in the fact that novo2maq did set flags as paired in single end data. This has allowed me to run indelpe and find some very convincing indels. Not sure how many of them are real, but looking at the coverage a lot are convincing by eye. Without the ability to run indelpe, many of these sites are mistakenly called SNPs. Is there still an option to pull the indels from a novoalign output? I suppose as long as flag 130 is still set it should work fine. I understand the rationale that Maq only trusts indels from paired data (and only does gapped alignment when reads are anchored by a mate), but I would like to get Colin's opinion about whether we can trust indels from single end reads (and if so, what mapping quality thresholds?)

Thanks,

Ryan

Last edited by myrna; 09-13-2008 at 07:02 AM.
myrna is offline   Reply With Quote
Old 09-13-2008, 06:44 AM   #31
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

I havent tried this, but have you attempted to run indelpe on single-end mapping results from novoalign converted to .map format?
Novoalign's mapping quality's are not recalculated for paired-end and you should see this from the `map mapstat` output.
I think Colin will be able to shed more light on this.

Quote:
Originally Posted by myrna View Post
Oh no! I was just reveling in the fact that novo2maq did set flags as paired in single end data. This glitch allowed me to run indelpe and find some very convincing indels. Not sure how many of them are real, but looking at the coverage a lot are convincing by eye. Without the ability to run indelpe, many of these sites are mistakenly called SNPs. Is there another option to pull the indels from a novoalign output? I understand the rationale that Maq only trusts indels from paired data, but I would like to get Colin's opinion about whether we can trust indels from single end reads (and if so, what mapping quality thresholds?)

Thanks,

Ryan
zee is offline   Reply With Quote
Old 09-13-2008, 09:57 AM   #32
myrna
Member
 
Location: Vancouver, Canada

Join Date: Feb 2008
Posts: 44
Default indelpe on single end data

Quote:
Originally Posted by zee View Post
I havent tried this, but have you attempted to run indelpe on single-end mapping results from novoalign converted to .map format?
Novoalign's mapping quality's are not recalculated for paired-end and you should see this from the `map mapstat` output.
I think Colin will be able to shed more light on this.
I have done this and it seemed to work well (which was quite satisfying). I just want to be sure I can trust them or if I should pre-filter the alignments at some mapping quality threshold before converting them to .map format. Do you have any sense of the sensitivity and specificity at different coverages?

Thanks.
myrna is offline   Reply With Quote
Old 09-13-2008, 10:02 AM   #33
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

I have seen some papers use MAQ and filter out anything below mapping quality of 10 and then they do further analysis. With novoalign you should have good quality matches using this sort of filter.
For Assembly and SNP calling it's better to use a high quality threshold, again anything over 10 should suffice, but I'm sure other users could add more insight.
If many of your indels are in this high quality range then it should be reliable. You could always confirm by doing other things like multiple sequence alignment of those regions, pileup, etc.
zee is offline   Reply With Quote
Old 09-13-2008, 10:16 AM   #34
myrna
Member
 
Location: Vancouver, Canada

Join Date: Feb 2008
Posts: 44
Default Pileup

On a separate yet related note, does anyone know what is done with flag-130 reads (gapped alignments) when a pileup file is made? It looks as if they are being included without being gapped (which makes sense since the pileup format does not have a way of representing gaps, though maybe it should?). However with the much larger number of gapped alignments in the novoalign output, this seems to be giving me problems when trying to identify SNPs from the pileup file. Has anyone else observed this?

Thanks
myrna is offline   Reply With Quote
Old 09-14-2008, 03:35 PM   #35
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

Hi myrna,
I think you can trust indel calls on single end reads and here's why...
I think we have allow for them in alignment as they are real. Looking at Craig Venters genome we can see that short indels are fairly frequent. See Table 6. http://biology.plosjournals.org/perl...50254&id=12379

Also from an information content point of view a single base indel isn't much harder to align than a single base mismatch. Consider a 32bp read with one mismatch. The mismatch can be at any of 32 position in the read and take any of the other 3 bases so there are 3*32 = 96 (6.6 bits of information consumed) possible sequences that match with one mismatch. Now consider an insert of one base. It could be any of 32 positions and take any of 4 bases so there are 4*32 = 128 (7 bits of information consumed) possible sequences that match with a one base insert. Not much difference.
With short reads on human size genome you should be able to detect indels and snps at least in high complexity sequence (and easily on smaller genomes) Obviously it won't work in repeats but the alignment quality in maq and novo... should cover that.
The novo2maq conversion will extract gapped alignments (status 130) from single end reads and you can run indelpe against a converted file.
With regard quality, it depends on cover and sample. If cover is fairly high (>10) and sample is from one diploid individual, then I'd only accept reads with quality > 10 and then then I'd also apply a quality filter to SNPs and Indels based on Bayesian posterior probability.

Last edited by sparks; 09-14-2008 at 03:41 PM. Reason: Added bit about quality
sparks is offline   Reply With Quote
Old 09-14-2008, 08:00 PM   #36
lh3
Senior Member
 
Location: Boston

Join Date: Feb 2008
Posts: 693
Default

I also think that when properly handled, it is possible to find reliable indels from single-ended reads. However, you need to careful postprocess the indelpe results. Here are the reasons:

Firstly, my experience is with short reads you will miss about half of indels that are close to short tandem repeat, while with long reads you will have little problem to detect most of them. And so probably we are expecting an 1:10 indel-to-substitution ratio from short read alignment with high depth rather than 1:5, and this is what I have seen on real data with PE reads. Secondly, I know a group who has tried to find indels from single-ended reads with soap, but in the end, they decided to drop all such indels when they did experimental validation. Probably they could improve their method, but this also shows that you should be careful to find indels with single ended reads. Thirdly, even if you simulate reads without any indels, you will find a lot of alignments with indels, especially >3bp indels, while you will find much less from paired end alignment. You need to properly filter results to get accurate results. Fourthly, Phil Green comments in his new cross_match documentation that finding indels longer than 2bp needs particular care. Although this is partly due to the limitation of the new algorithm in cross_match, he would not give such comments unless he thinks this confers some truth.
lh3 is offline   Reply With Quote
Old 09-15-2008, 01:21 AM   #37
sparks
Senior Member
 
Location: Kuala Lumpur, Malaysia

Join Date: Mar 2008
Posts: 126
Default

I agree with Heng Li that indel calling is prone to problems but I think it can be done with appropriate care.
I have a 1 lane (single end) of data from a a 1Mbp region of human (pooled from multiple individuals). Just using indelpe on movo2map file and then selecting indels with high cover on both strands we get ~100 indels. It remains to be seen if these validate but they look pretty convincing.

here's one example (best viewed at fixed pitch font)
AACTCCTAGAGTGTGCTGTACCCAGAAGAAGACAGAATGGCAGGGTATCC (reference)
AaCTCCTAGAGTGTGCTGTACCCGGAAGA CA
AACtCCTAGAGtGTGCTGTACCAAGAAGA CA
ACTCCTAGaGTGtGCTGTACccaGaaGa cAgaat
TCCTaGAGtGTGCTGTACCcaGaaGA cAGaatggc
...
ccAGaAGa CAGAAtGGCAGGGTATCCTTTGGTCT
AGA CAGAatGGCAGGGTATCCTTTGGT
AGA CAGAATGGCAGGGtATcCTTTggtcTGtaaTt

Quite a few of the indels are in short 3-6bp homopolymers, PCR will tell if they are valid..

Last edited by sparks; 09-15-2008 at 02:01 AM. Reason: Added example
sparks is offline   Reply With Quote
Old 09-19-2008, 05:49 AM   #38
rs705
Junior Member
 
Location: USA

Join Date: Sep 2008
Posts: 6
Default

You mention that novoalign is free to non-profits. Do you intend to sell it to commercial companies and if so can you give an estimate of the cost?
rs705 is offline   Reply With Quote
Old 09-19-2008, 06:53 AM   #39
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Commercial licenses are available for a small fee. We offer single server and site wide licenses and these are quite competitive.
Anybody is free to mail sales - at - novocraft - dot - com for a pricing quote and a list of the extra features available.

Last edited by zee; 09-19-2008 at 11:01 AM.
zee is offline   Reply With Quote
Old 11-03-2008, 08:05 AM   #40
valeu
Member
 
Location: Paris

Join Date: Sep 2008
Posts: 69
Default

Hi Colin!

I run Novoalign with "-r None", then with "-r Random" option. I got the same alignment in the two cases. Could you please tell me what I did wrong?

Thanks in advance,
Valentina
valeu is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:40 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO