SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bowtie beginner question... milesgr General 6 03-14-2012 06:21 AM
Simple Bowtie question nilmot13 Bioinformatics 5 03-07-2012 02:03 AM
bowtie-inspect question dicty Bioinformatics 0 12-02-2010 04:45 PM
question about bowtie -e parameter ranel Bioinformatics 3 08-17-2010 07:55 AM
Bowtie Index Question Wei-HD Bioinformatics 5 07-29-2010 08:15 AM

Reply
 
Thread Tools
Old 03-25-2011, 07:46 AM   #1
quantrix
Member
 
Location: Pennsylvania

Join Date: Jan 2011
Posts: 21
Default Bowtie and Clustering question.

Hi Group,
I am a relative newbie tying to come upto speed. So I managed to assemble a 20 core cluster and am just beginning to figure out how to work the bioinformatics assembly algorithms. So my scenario is this

1) I currently have a WES raw data file measuring 5 GB. I have a quality score file which is approximately 12 GB.

2) I have a four node AMD cluster with 32 GB RAM. I installed and configured Rocks software on the same.

3) I have been looking into Bowtie to do the analysis on this cluster.

Some questions which come to my mind are as follows

1) How and where do I start?

2) Is it possible to install bowtie on the ROCKS cluster such that I can use the 4 nodes to run the analysis in parallel?

3) For this single massive file of 5 GB raw reads, how do I go about doing the assembly?

4) With bowtie, am I restricted to using only ONE node on which to run the analysis on?

5) OR, can I split my raw reads of file X4 and farm out each file to each one of the nodes and do the assembly and then do a final assembly of all the four assembled files?

6) Has anyone installed Galaxy tools on a ROCKS cluster? Could you share your experiences of the same?

I realize these are very basic and fundamental questions. But I would highly appreciate an answer. Hopefully I will be able to answer these questions on the forum in the near future.
Regards
Quantrix
quantrix is offline   Reply With Quote
Old 03-25-2011, 11:40 AM   #2
hpcguy
Junior Member
 
Location: Ohio,USA

Join Date: Mar 2011
Posts: 4
Default

Howdy, I'm new here, but I do parallel for a living. (hpc type)

I can't speak to some of the things you've asked, but I have installed pMap and bowtie for customers for things such as this. I'd recommend pMap for simplicity. Either way you should be able to get every core on every node working with bowtie in parallel. IO will more than likely be your limiting factor then.

http://bmi.osu.edu/hpc/software/pmap/pmap.html

http://bowtie-bio.sourceforge.net/crossbow/index.shtml

pmap is MPI based, so if you have an interconnect (eth, ib, quadrics,myri, etc) and some type of MPI installed you should be good. pMap supports BWA, SOAP, Bowtie, GSNAP, MAQ and RMAP.

crossbow is Hadoop based. I can't say I've seen hadoop on rocks (not a fan of rocks myself, but it is an excellent way to start with clusters) but it is possible. I'd be REALLY surprised if no one has ever done it as there are some rather decent sized clusters out there (TACC, PNNL) using rocks. I'd search for a hadoop roll. I'd be willing to bet it's out there.

hpc
hpcguy is offline   Reply With Quote
Old 03-25-2011, 11:40 AM   #3
tnabtaf
Member
 
Location: Oregon

Join Date: Jan 2011
Posts: 53
Default

There is some discussion of running Galaxy on ROCKS in this Galaxy-dev thread from this January.
tnabtaf is offline   Reply With Quote
Old 03-25-2011, 01:16 PM   #4
quantrix
Member
 
Location: Pennsylvania

Join Date: Jan 2011
Posts: 21
Default

Hi hpcguy and Tnab,
Thanks for the replies. I shall look into pMap right away. It sounds like one possible solution for me to start exploring.

@hpcguy,
You say you are not a fan of Rocks. I have had to wrestle with quite a few issues in getting it upto speed due to a combination of factors. However, it is running smoothly now. I was wondering if I should not go ahead and use something like plain CentOS and install other stuff separately. What is your take on this? Do you have a favorite and why? I was also looking into Ubuntu with Kerrighed as one option. (Ubuntu enterprise maybe?)
Problem is there is not very much out there in terms of leads of how to go about clustering. If at all.
quantrix is offline   Reply With Quote
Old 09-10-2013, 01:53 PM   #5
taber13
Junior Member
 
Location: Houston, Tx.

Join Date: Aug 2013
Posts: 1
Default

the following is an example of how to run bowtie on multiple nodes... will require splitting the .fastq file, then reassembling the .sam in the end.
First see how many reads you have.

"cat yourfile.fastq | echo $((`wc -l`/4))"

the result was = 14901431, so create two jobs in this case to run on two different nodes
of the rocks cluster. I created a few .sh scripts... and just keep editing them for each different job. "nano bowtie_script_1.sh"... then edit as follows:

#!/bin/bash
#
#$ -S /bin/bash
bowtie -m 1 -S -p 4 -s 0 --qupto 7450715 share/apps/bowtie-1.0.0/indexes/hg19 yourfile.fastq

second job will have different start and finish... split as many times as nodes you want to run it on.. this example uses 2 nodes.
second script: "nano bowtie_script_2.sh"... then edit as follows:
#!/bin/bash
#
#$ -S /bin/bash
bowtie -m 1 -S -p 4 -s 7450715 --qupto 14901431 share/apps/bowtie-1.0.0/indexes/hg19 yourfile.fastq

If you have bowtie installed correctly, you can then run the following:

qsub bowtie_script_1.sh
qsub bowtie_script_2.sh

this will result in two files in .SAM format

bowtie_script_1.sh.o##
bowtie_script_2.sh.o##

you would then need to join the two outputs into one .SAM file.

"cat bowtie_script_1.sh.o## <(grep -v '^@' bowtie_script_2.sh.o##) > merged_sam.sam"

Install of bowtie...

to make it available to all of your compute nodes, install it into the /export/apps/ folder, which will make it available to all of your nodes.

then edit the "/etc/skel/.bash_profile" PATH to include ":/share/apps/bowtie-1.0.0"

if you run these jobs using qsub.. if it error's out, it will create an error file in your home directory.. which will point you into the right direction.

good luck.
taber13 is offline   Reply With Quote
Old 09-13-2013, 06:59 AM   #6
gmarco
Member
 
Location: Spain

Join Date: Oct 2012
Posts: 36
Default

Quote:
Originally Posted by hpcguy View Post
Howdy, I'm new here, but I do parallel for a living. (hpc type)

I can't speak to some of the things you've asked, but I have installed pMap and bowtie for customers for things such as this. I'd recommend pMap for simplicity. Either way you should be able to get every core on every node working with bowtie in parallel. IO will more than likely be your limiting factor then.

http://bmi.osu.edu/hpc/software/pmap/pmap.html

http://bowtie-bio.sourceforge.net/crossbow/index.shtml

pmap is MPI based, so if you have an interconnect (eth, ib, quadrics,myri, etc) and some type of MPI installed you should be good. pMap supports BWA, SOAP, Bowtie, GSNAP, MAQ and RMAP.

crossbow is Hadoop based. I can't say I've seen hadoop on rocks (not a fan of rocks myself, but it is an excellent way to start with clusters) but it is possible. I'd be REALLY surprised if no one has ever done it as there are some rather decent sized clusters out there (TACC, PNNL) using rocks. I'd search for a hadoop roll. I'd be willing to bet it's out there.

hpc
I suppose pMap will work flawlessly on a Rocks cluster based on SGE right?
It supports bowtie, does it also supports bowtie2?

Thanks.
gmarco is offline   Reply With Quote
Old 10-08-2013, 05:36 AM   #7
hpcguy
Junior Member
 
Location: Ohio,USA

Join Date: Mar 2011
Posts: 4
Default

Howdy. To all the folks that have sent me Private Messages about this: please set up your mailbox such that I can reply. I cannot answer your questions without a way to reach you. thanks.

H
hpcguy is offline   Reply With Quote
Old 10-08-2013, 05:50 AM   #8
hpcguy
Junior Member
 
Location: Ohio,USA

Join Date: Mar 2011
Posts: 4
Default

Rocks is fantastic when a group/person/dept is starting out. No bones about it. Fantastic. Roll it out on a single rack in 10 min if you just give it a go. Be up and running apps in 15 min (with data being available). Not much beats this. Even AWS takes more work to configure. I've personally installed it and had a 2 rack cluster up and running from turn on in under 30 minutes and was running batch jobs. But the cluster was NEVER supposed to run another application ever again.

The problem becomes as soon as there is a move into a more intermediate need/area. Rocks does not lend itself to being as flexible as needed for simplicity in advanced work. Moving to stock CentOS or Scientific Linux, RHEL, Ubuntu LTS,etc becomes a large step that can be intimidating but long term most folks that I've spoke or worked with look back and say they were glad they made the move.

I would recommend making the change to something else when you feel Rocks just is too restrictive or you need more than you can find in the normal Rolls, etc.

Quote:
Originally Posted by quantrix View Post
Hi hpcguy and Tnab,
Thanks for the replies. I shall look into pMap right away. It sounds like one possible solution for me to start exploring.

@hpcguy,
You say you are not a fan of Rocks. I have had to wrestle with quite a few issues in getting it upto speed due to a combination of factors. However, it is running smoothly now. I was wondering if I should not go ahead and use something like plain CentOS and install other stuff separately. What is your take on this? Do you have a favorite and why? I was also looking into Ubuntu with Kerrighed as one option. (Ubuntu enterprise maybe?)
Problem is there is not very much out there in terms of leads of how to go about clustering. If at all.
hpcguy is offline   Reply With Quote
Reply

Tags
bowtie, galaxy, rocks cluster

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:02 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO