SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
ERANGE and other packages for RNAseq analysis warrenemmett RNA Sequencing 9 07-02-2013 12:58 PM
Software packages capable of aligning roughly 9000 bp josecolquitt Bioinformatics 4 05-18-2010 04:17 AM
DNAnexus free account: next-gen sequence analysis in the cloud DNAnexus Vendor Forum 0 04-27-2010 10:46 PM
Sequence Analysis Software Developer Cofactor Genomics Industry Jobs! 0 01-27-2010 09:02 AM
Companies offering next gen sequence analysis services gavin.oliver Bioinformatics 8 01-12-2010 04:27 AM

Closed Thread
 
Thread Tools
Old 10-20-2008, 07:58 PM   #81
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Just saw this paper about SOCS (short oligonucleotides in color space), looking forward to trying it against corona-lite and maq.

Documentation says it's multithreaded and RAM used can be set by user. It's great to see tools for dealing with colorspace directly...
ECO is offline  
Old 10-21-2008, 04:05 AM   #82
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

I like bowtie, it has great performance using the BW transform search routine but still not as mature in features (indels, PE ) as Novoalign.
I think both packages will progress quite nicely with enhanced features as this field moves.
A Good job by the Bowtie developers.
zee is offline  
Old 11-02-2008, 06:38 AM   #83
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Just added Slider to the first post.
ECO is offline  
Old 11-04-2008, 10:31 AM   #84
mendelism
Junior Member
 
Location: Durham

Join Date: Nov 2008
Posts: 1
Default

Thanks for doing this, ECO! This is a huge help to those of us just getting started on NG sequencing.

I see that most of the discussion is focused around genomic alignment and variant discovery, but I'm interested in methods for analyzing transcriptome sequence data, e.g. for quantitation and/or ID of alternative transcripts. There's a tool called QPALMA that's designed specifically for alignment of spliced sequences from short reads, and includes sequence quality information in generating the alignments. Anyone here have any experience with QPALMA?
mendelism is offline  
Old 11-05-2008, 12:33 PM   #85
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

BFAST?
https://secure.genome.ucla.edu/index.php/BFAST

Anyone has experience with it?
bioinfosm is offline  
Old 11-06-2008, 09:02 PM   #86
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by bioinfosm View Post
BFAST?
https://secure.genome.ucla.edu/index.php/BFAST

Anyone has experience with it?
I am the author of BFAST. Let me know if you have any questions. Please see the site about obtaining the source code (available for academicu se). Also, I will be giving a talk on Friday November 14 morning about BFAST at the Annual Meeting for American Society of Human Genetics. I will be in Philadelphia that week, so if you are interested in meeting to discuss sequence alignment, let me know.

Nils Homer
nilshomer is offline  
Old 11-20-2008, 06:54 AM   #87
lparsons
Member
 
Location: NJ

Join Date: Nov 2008
Posts: 28
Default

I'm not sure if we want to add base calling algorithms/software to the list, but I just came across an interesting one: Rolexa

How many people have experimented with alternate base-calling software? Or are people generally content with the quality of the sequences with the manufacturer supplied software (we are using Illumina in particular)?
lparsons is offline  
Old 11-20-2008, 07:10 AM   #88
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,358
Default

Quote:
Originally Posted by lparsons View Post
I'm not sure if we want to add base calling algorithms/software to the list, but I just came across an interesting one: Rolexa
Sure, why not. Rolexa, Alta-cyclic...any others out there?
ECO is offline  
Old 11-20-2008, 07:16 AM   #89
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Quote:
Originally Posted by nilshomer View Post
I am the author of BFAST. Let me know if you have any questions. Please see the site about obtaining the source code (available for academicu se). Also, I will be giving a talk on Friday November 14 morning about BFAST at the Annual Meeting for American Society of Human Genetics. I will be in Philadelphia that week, so if you are interested in meeting to discuss sequence alignment, let me know.

Nils Homer
Thanks Nils. Could you possible post your presentation somewhere?
I probably wont get a chance to look at the tool for a while...

thanks.
bioinfosm is offline  
Old 11-20-2008, 01:08 PM   #90
rs705
Junior Member
 
Location: USA

Join Date: Sep 2008
Posts: 6
Default

DNASTAR has changed the name of its integrated tool from SeqMan Genome Assembler to SeqMan NGen. it also works on Vista since that it what we are using with it.
rs705 is offline  
Old 11-20-2008, 04:59 PM   #91
JKing
Junior Member
 
Location: Wisco

Join Date: May 2008
Posts: 9
Default SeqMan NGen

Update: I visit this site quite a bit to understand the tools available and where this technology is taking us, but I haven't actually posted in some time. Here's an update if you are interested in DNAStar development.

SeqMan Genome Assembler was an in-house name during development of the assembly program. SeqMan NGen is the name going forward, as it really is an engine for providing these assemblies for end users, who need no special computer specs like 64 bit operating systems and lots of RAM for subsequent assembly analysis. Normal computers do the end user job. Assemblies include siRNA targeting, ChIP-Seq, mRNA alignment to genomic templates, etc., so "Genome Assembler" was a limiting name for the program.

The last few posts dealt with strategies for sequence aligners to compensate for under reporting SNPs in areas heavy with mutation. This gets to the heart of the difference between aligners like MAQ and ELAND and actual contig assemblers that produce .ace files or their equivalent. An aligner program throws one read against the template, records where it sticks, and then proceeds to the next read. The big problem is that if there are more than two differences between any read and the reference, the read is thrown out. The output is a big text file.

NGen performs several passes during the assembly process (and quickly). The first pass does something resembling what aligners do, in that it takes care of the easy reads. In subsequent passes the assembly is completely de novo. All reads are incorporated in the context of the existing reads of the experimental strain. There is no limit to the number of differences between the reference sequence and any given read of the experimental strain, as it is a de novo assembly that disregards the template entirely. No reads are thrown out. There could be eight true SNP differences between your strain and your reference strain within a 35 bp span, for instance, and those SNPs will be reported and can be visually confirmed in the alignment view.

The end user can also filter out false SNPs based on quality score, percent of SNPs in reads at each locus, depth of coverage, and known vs. novel SNPs, using the normal SeqMan interface. SNP reporting also includes subsequent silent or non-silent amino acid mutations at specific aa positions at the protein level. The end user actually has a fairly easy job of discerning those SNPs that matter. The strategy for dealing with large indels or transpositions is exquisite, and you will just have to contact us for that, as it is beyond the scope of a board post.

Aligners like MAQ are actually very effective if one uses a reference sequence that is "the answer", but that is not necessarily the case in many projects. We are actually introducing a MAQ-like aligner in a couple of weeks for next-gen RNA-Seq comparative gene expression analysis, and the results feed directly into the tools traditionally used for microarray analysis like scatter plots and heat maps. Of course, RNA-Seq is orders of magnitude more sensitive and accurate than microarray.

For sequence assembly, nothing beats an actual assembly rather than a read-by-read alignment text file. Due to computer limitations, aligners that throw reads one at a time at a reference sequence are a necessary evil right now for higher level eukaryotes, but that will soon change and end users will soon be able to visualize actual assemblies at any position along the genome.
JKing is offline  
Old 11-20-2008, 10:46 PM   #92
cgb
Member
 
Location: Cambridge

Join Date: May 2008
Posts: 50
Default

a few brief comments on this post :

the short read aligners like MaQ and eland dont 'throw reads one at a time" - in fact they do very efficient batch based inexact matching and some fancy maths to determine the best match where there is an ambiguity. running at around, say, 5000 reads per second per CPU (with <1GB RAM/CPU) against a human genome. This was a very significant computational challenge that many people said was impossible, and is now entirely feasible on small computers using these algorithms.

more than two mismatches isnt a problem. if you had more than two on the majority of your reads, then your sequencer isn't working and you should send it back - because that would equate to something like a 8-10% error rate on average. Actual runs have sub 1% error rates (generally) and thus very few of the 25-35 mers have more than 0-2 errors. In fact very few reads have more than 2 mismatches and in the case of MaQ they aren't thrown away. The number of poor matches chucked by Eland on a normal run is in the <1-4% range and often many of these reads arise because of data collection/imaging artifacts (or they are contaminants) i.e they arent from the sample hence chucking high error reads also has some benefits in terms of false +ves.

'Even 8 SNPs in a single 25mer'. Do we know how often that occurs in the human genome ? it must be under 0.001%. There are more errors, and missing sections, in the reference itself to worry about -

If you are saying that a sort read assembly on a big genome is going to give better coverage, better consensus and better mutation detection than a resequencing run - I think that has yet to be shown and the assembly problems of short read sequencing dwarf the minor side-effects of read mapping. Im a tad skeptical that this can be done on a 'smaller' computer and quicker than re-aligning the same coverage level of short read data with the modern algorithms.

For sure - on genomes without a reference you will ideally assemble.

Last edited by cgb; 11-20-2008 at 10:51 PM.
cgb is offline  
Old 11-21-2008, 06:25 AM   #93
JKing
Junior Member
 
Location: Wisco

Join Date: May 2008
Posts: 9
Default You're right...

Agreed, I was too simplistic in my critique of MAQ for a next-generation sequencing board. Aligners can exceed 2 mismatches in certain situations, but in general the rule is that >2 mismatches lead to a statistically insignificant match.

For human genotyping purposes, where the reference sequence is essentially the answer give or take a few SNPs, alignment algorithms are a very efficient approach. I didn't mean to come off negatively in any way regarding them.

However, every genome will eventually be sequenced, and there are only references for a small fraction of worldwide species. Heck, there are only references for a small fraction of E. coli strains, and that is the most studied bacteria. There are obvious limitations using short reads in a de novo fashion to tackle these genomes. This approach allows one to use the best available reference, and the end result of the assembly takes one further than an alignment algorithm could achieve.

Plus, the ability to actually visualize the assembly at every locus provides a certain level of confidence.
JKing is offline  
Old 12-04-2008, 12:44 AM   #94
dan
wiki wiki
 
Location: Cambridge, England

Join Date: Jul 2008
Posts: 266
Default Put the info in this thread into a wiki?

Hi,

Can we put the info in this thread into a Wiki page to allow better structuring of the data?

Its a bit of a monster thread, and it isn't clear where in the thread important info will come up...

Is there a SEQanswers Wiki?

If not I can suggest:

* Somewhere on http://wiki.bioinformatics.org/Wiki_Main_Page
* http://bioinformatist.org/index.php/Main_Page

Or just Wikipedia (I'm sure there is a suitable location).

Dan.
__________________
Homepage: Dan Bolser
MetaBase the database of biological databases.
dan is offline  
Old 12-17-2008, 11:34 AM   #95
Wolfgang Gerlach
Junior Member
 
Location: Germany

Join Date: Dec 2008
Posts: 1
Default more software

Hi all,

I have here two programs that might fit into the software list. Maybe somebody can add it to the list ?


-----------------
The SWIFT suit is a software collection for fast index-based sequence comparison. It contains the following programs: SWIFT fast local alignment search, guaranteeing to find epsilon-matches between two sequences; SWIFT BALSAM a very fast program to find semiglobal non-gapped alignments based on k-mer seeds.
----------
Link:
http://bibiserv.techfak.uni-bielefeld.de/swift/


best
Wolfgang
Wolfgang Gerlach is offline  
Old 12-18-2008, 07:43 AM   #96
joa_ds
Member
 
Location: belgium

Join Date: Dec 2008
Posts: 52
Cool

Hi, has anyone used Pyrobayes?

It just seems a bit weird that it only needs ssf files to improve quality?


I cant get a hold to the article (no nature license here ). What is the program based on?
joa_ds is offline  
Old 01-14-2009, 08:02 AM   #97
xuer
Member
 
Location: germany

Join Date: Sep 2008
Posts: 17
Default

vmatch is also Good!
xuer is offline  
Old 01-22-2009, 06:13 AM   #98
francesco.vezzi
Member
 
Location: Udine (Italy)

Join Date: Jan 2009
Posts: 50
Default

Hi to everybody,
It is a long time that I'm reading this interesting discussion on new generation sequence technology but until now I have never posted....
I just start the Phd and this morning I was reading an article that compares the performance of short read assemblers (in particular Edena and Velvet), and also in this article I found a reference to another tool: ALLPATHS. It is more or less one year that my interest is mainly focus on de novo assembly with short reads, and the first article that I read about this topic was the article on ALLPATHS. The problem is that I'm not able to find a site where is possible download this tool, everybody say that is the best tool but it seems impossible to find....
Can somebody help me?
francesco.vezzi is offline  
Old 01-22-2009, 09:46 AM   #99
Stegger
Member
 
Location: Copenhagen

Join Date: Nov 2008
Posts: 21
Default

Quote:
Originally Posted by francesco.vezzi View Post
Can somebody help me?
Hi,
not sure if you have seen the following link but found it through a google search:
http://www.broad.mit.edu/events/reco...rJ-RECOMB2.pdf
but there are two email adresses in that publication you could try and email?
Sorry if you have already tried that

Stegger
Stegger is offline  
Old 01-22-2009, 12:13 PM   #100
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Francesco,

There is a source code download available through the supplementary materials page for the publication http://genome.cshlp.org/content/18/5/810/suppl/DC1 however this is an old file an likely out of date.

This site (http://www.broad.mit.edu/crd/wiki/index.php/Main_Page) serves as the main portal for the Broad Institutes software projects but there is no download link for ALLPATHS there.
kmcarr is offline  
Closed Thread

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:12 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO