Seqanswers Leaderboard Ad

**malachig** · 11-17-2010, 03:15 PM

Yes, and its not the only problematic name i have seen in use in the forums. I guess we should avoid such usernames and/or avoid using them in the text of post replies so that they do not pollute search results...

**bioinfosm** · 11-17-2010, 03:18 PM

makes sense

**Lee Sam** · 11-17-2010, 08:18 PM

Originally posted by malachig View Post

Our cluster uses Sun Grid Engine (sge). Submitting jobs to the cluster is accomplished using a wrapper for the 'qsub' utility of sge. Basically the submission command is just pointing to a batch file containing bash commands (one job per line). I assume this is a somewhat common theme in cluster job submission. If this is the case for you, it shouldn't be too hard to modify the 'createAnalysisCommands' step. You would just need to modify all the lines containing 'mqsub' to match the submission style of your cluster and then when you run createAnalysisCommands use the option '--cluster_commands=1'

Originally posted by obig View Post

I guess there are too many different cluster configurations for alexa-seq to anticipate. So, simple bash files are produced which can be run serially (for very small libraries) or submitted to your cluster according to its protocols. You will probably have to work with your cluster administrator to get things running optimally.

Our cluster here (lawrencium) uses PBS Torque Resource manager and Moab job scheduler. And, with some work, I have been able to submit Alexa-seq jobs to it. I have processed four projects with over 100 libraries to date. So, it is doable. Instead of trying to edit all those parts of the alexa-seq pipeline code that produce job batch files and submission commands, I created a simple perl script which takes an alexa-seq job batch file (essentially just an sh file with one "task/command" per line) and produces the submission files compatible with our scheduler. I strongly recommend this strategy. Changing the alexa-seq code will be a lot more work. What I do is run the alexa-seq pipeline as instructed for steps 0 to 5B. Step 5C (submitMapBatch.sh) is the first step that requires submitting to a cluster. That sh file contains a whole bunch of bash commands for additional sh files (e.g., blast_vs_intergenics.sh). It is those files which should be submitted to a cluster, not the parent submitMapBatch.sh file. You can do them individually or cat them into combined files. I create one combined batch file for all libraries separated only by feature type (repeats, transcripts, etc) because they have different memory and runtime requirements. I can thus optimize cluster submission parameters for each of the 6 feature types. This is necessary because our cluster uses wallclock estimates and task number to determine job priority in the queue. Maybe your cluster has a more simple setup and this step will be unnecessary for you. Once I have combined the bash files I run my submitjobs.pl script on it and wait for it to finish. In later steps, whenever alexa says to submit some jobs to a cluster, the bash file typically contains the tasks/commands (instead of additional bash commands as above). I just run my submitjobs.pl script on each of those bash files. Check .output and .error files for problems and then proceed to the next step.

For each project, once the alexa-seq .commands file is produced, I make a new copy of this file and edit it to add my own commands that are necessary for job submission. This file can then be used as a template for running future projects.

Thanks for the replies. It sounds like the Lawrencium is set up almost exactly like our flux cluster here at UMich. The only issue that we have is that our head nodes don't have our data drives mounted (for a variety of reasons) and have a 15-minute max job limit, so would be a little constrained to do the preprocessing on our main analysis server and then scp the data over to the cluster for the alignments and then scp it all back. I appreciate the insights on all this.

**obig** · 11-17-2010, 08:27 PM

I have the same issue with my data and analysis servers not being mounted on the head nodes. I've found rsync useful for this. I was given a decent size data folder accessible by the cluster. I do some serial steps, rsync to the cluster-mounted server, run parallel jobs and then rsync back. But, a 15-minute max job limit will be a problem. Some jobs probably take longer than that.

**Lee Sam** · 11-17-2010, 08:30 PM

Originally posted by obig View Post

I have the same issue with my data and analysis servers not being mounted on the head nodes. I've found rsync useful for this. I was given a decent size data folder accessible by the cluster. I do some serial steps, rsync to the cluster-mounted server, run parallel jobs and then rsync back. But, a 15-minute max job limit will be a problem. Some jobs probably take longer than that.

Sorry, I meant that programs running on the head nodes themselves (only small 8p servers) can't exceed 15min. Thanks for the advice on rsync.

**obig** · 11-17-2010, 08:34 PM

Oh. Sorry I see what you were saying now. This is indeed the exact situation I have then. I found that rsyncing back and forth was sometimes pretty slow (many files). So, I found myself even submitting many of the individual serial jobs to the cluster and then just rsyncing back at the end. If I could go back and do it over though, I probably would have pushed harder to get a decent box installed and mounted on the cluster for serial processing steps. Its just much easier this way and such boxes are not that expensive these days.

**malachig** · 11-18-2010, 12:13 AM

Yeah, I would agree with that. It really invaluable to have a few decent boxes (high memory ones even better) that have access to all the same storage mounts as the cluster nodes themselves...

**mp_bio** · 06-28-2011, 07:32 AM

database issue

Hi! We currently play with alexa-seq VM image and try to run a demo analysis. The first attempt failed due to that permanent error message 'DBD::mysql::st execute failed: Table 'ALEXA_hs_53_36o.Gene' doesn't exist at /home/alexa-seq/ALEXA/alexa_seq/utilities/ALEXA_DB.pm line 196.'
We did all steps exactly is it described in the DEMO.txt file except for the '#4.) Import annotation database' step. We downloaded hs_53_36o.tar.gz manually and moved it to the /home/alexa-seq/ALEXA/sequence_databases/
After that step4's command was executed '/home/alexa-seq/ALEXA/alexa_seq/alternativeExpressionDatabase/installAnnotationDb.pl
--annotation_dir=/home/alexa-seq/ALEXA/sequence_databases/ --db_build=hs_53_36o --server=localhost --user=alexa-seq --password=alexa-seq'
and so on.
Do you have any ideas what could went wrong and caused presumable absence of the ALEXA_hs_53_36o.Gene in the right place?
Thanks!

**demis001** · 04-22-2012, 04:28 AM

parseRepeats.sh terminate on the first blast.gz file. Would you please help?

Begin parsing 137 blast results files

Parsing ..../final/blast_results/A/694_Lane1/repeats/blast_0000.gz for blast results
Multiple Paired Reads - Unambiguous. Subject ID: AluSx|SINE1/7SL|Primates READ1: 694_1_1101_4534_2459_R1 READ2: 694_1_1101_4534_2459_R2$VAR1 = {};
$VAR1 = undef;

Topics	Statistics	Last Post
Evaluating Genome Sequencing for ECMO Patients in the NICU by seqadmin Started by seqadmin, 12-17-2024, 10:28 AM	0 responses 26 views 0 likes	Last Post by seqadmin 12-17-2024, 10:28 AM
New Genetic Toolkit Refines Studies on Gene Function and Disease by seqadmin Started by seqadmin, 12-13-2024, 08:24 AM	0 responses 43 views 0 likes	Last Post by seqadmin 12-13-2024, 08:24 AM
Study Links Brain Mechanism to Emotional Responses in Animals and Humans by seqadmin Started by seqadmin, 12-12-2024, 07:41 AM	0 responses 29 views 0 likes	Last Post by seqadmin 12-12-2024, 07:41 AM
Study Identifies Ribosomal RNA Fingerprints as Early Cancer Biomarkers by seqadmin Started by seqadmin, 12-11-2024, 07:45 AM	0 responses 42 views 0 likes	Last Post by seqadmin 12-11-2024, 07:45 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News