![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
dspchip: a digital signal processing approach to chip-seq analysis | dawe | Bioinformatics | 0 | 02-16-2011 10:27 AM |
PubMed: Massively Parallel Signature Sequencing and Bioinformatics Analysis Identifie | Newsbot! | Literature Watch | 0 | 04-27-2010 03:00 AM |
MOSAIK - parallel protein sequence guided assembly? | liborm | Bioinformatics | 0 | 02-08-2010 01:17 PM |
Parallel, tag-directed assembly of locally derived short sequence reads. | krobison | Literature Watch | 0 | 01-31-2010 08:24 PM |
PubMed: ABySS: A parallel assembler for short read sequence data. | Newsbot! | Literature Watch | 0 | 03-03-2009 06:00 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Philadelphia Join Date: Feb 2009
Posts: 10
|
![]()
Hello,
I'm fairly new here and have been trying to get our systems configured properly for NGS analysis. I'm primarily concerned with ABi CS data, but will also be involved quite heavily with Solexa as well. Corona has its own built-in tools for configuring they're applications to run on top of Torqure PBS for processing on a cluster, this seems to work quite well. I've been searching for other options and am not finding very much. Solexa's GAPipeline appears to have some basic tools for parallelization, but we're not big fans of ELAND and would prefer to use MAQ or Bowtie for alignments. These two tools don't seem to have much information on methods for batch job submission. I'm hoping to get some feedback from anyone with more experience, in ways to either parallelize MAQ, Bowtie, etc... or for ways to, at least, break up the jobs so that they can be submitted in a naively parallel fashion. Thanks in advance! |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Vancouver, Canada Join Date: Feb 2008
Posts: 236
|
![]()
I'm probably the wrong person to attempt to answer your question, but as far as I know, we just run each lane through maq one at a time, then use mapmerge to assemble libraries back together. Thus, we often have eight maq jobs running at a time on the cluster, for each machine in operation. Again, I'm not the person who submits the jobs, so other people can probably provide more information than I can.
Sequence alignment theoretically belongs to the class of algorithms known as embarrassingly parallelizable... each sequence could theoretically be aligned by a separate computer and then recombined. The question should just be what is the optimal number of reads to align by each instance... and that I dont' know. (-:
__________________
The more you know, the more you know you don't know. —Aristotle |
![]() |
![]() |
![]() |
#3 |
Member
Location: Philadelphia Join Date: Feb 2009
Posts: 10
|
![]()
Hm. The idea of separating lanes is good. I am familiar with most embarrassingly parallel methods for sequence analysis, but was hoping there might be some established methods specifically for NGS that have been developed. I am particularly interested in setting up a few processing pipelines that can be triggered (relatively automatically) and then run across our cluster system, then packaged up for post processing and results delivery.
Tools like the corona pipeline are ideal because they are pre-configured to do so off the bat. MAQ would require some initial configuration and some scripts here and there to accomplish this. I guess a generic tool for parallelizing things may be too much to ask for, but aside from splitting up lanes, or splitting up each individual alignment task, I'm wondering what else might be able to work? Bowtie has methods for splitting up across multiple cores, using the '-p' option, and I would hope that this can somehow be leveraged to cross multiple systems as well. But that's where I start to get lost, and find myself trying to figure out the code at a much lower level, which is going to take me a very long time to solve... |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: Baltimore, MD Join Date: Sep 2008
Posts: 200
|
![]()
Hi jperin,
With respect to Bowtie, the -p option allows you to parallelize Bowtie in the sense of using multiple threads (which are hopefully mapped to multiple processor cores) on a single machine. For parallelizing across machines, I do not really have a pre-fab set of scripts for that. As an aside, I'm currently doing some work on getting Bowtie to work in a Cloud Computing framework, specifically using Hadoop. This would allow Bowtie to be parallelized across any cluster that has Hadoop installed, including Amazon's EC2 service. That's not ready for prime time yet, though. Thanks, Ben |
![]() |
![]() |
![]() |
#5 |
Member
Location: US Join Date: Feb 2008
Posts: 13
|
![]()
A few comments here.
Here is a nice trick posted by Quang. Hi Victor, We use "maq fastq2bfq -n 1000000 ..." to split the reads. .... Q More here. http://groups.google.com/group/sge-l...f3a6f6b501240c |
![]() |
![]() |
![]() |
#6 | |
Rick Westerman
Location: Purdue University, Indiana, USA Join Date: Jun 2008
Posts: 1,104
|
![]() Quote:
I could be running Corona lite improperly in which case let me know! But my experience is that Corona does not employ anything more than the same-old-same-old embarrassingly parallel methods. |
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|