Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Yes, and its not the only problematic name i have seen in use in the forums. I guess we should avoid such usernames and/or avoid using them in the text of post replies so that they do not pollute search results...

    Comment


    • #17
      makes sense
      --
      bioinfosm

      Comment


      • #18
        Originally posted by malachig View Post
        Our cluster uses Sun Grid Engine (sge). Submitting jobs to the cluster is accomplished using a wrapper for the 'qsub' utility of sge. Basically the submission command is just pointing to a batch file containing bash commands (one job per line). I assume this is a somewhat common theme in cluster job submission. If this is the case for you, it shouldn't be too hard to modify the 'createAnalysisCommands' step. You would just need to modify all the lines containing 'mqsub' to match the submission style of your cluster and then when you run createAnalysisCommands use the option '--cluster_commands=1'
        Originally posted by obig View Post
        I guess there are too many different cluster configurations for alexa-seq to anticipate. So, simple bash files are produced which can be run serially (for very small libraries) or submitted to your cluster according to its protocols. You will probably have to work with your cluster administrator to get things running optimally.

        Our cluster here (lawrencium) uses PBS Torque Resource manager and Moab job scheduler. And, with some work, I have been able to submit Alexa-seq jobs to it. I have processed four projects with over 100 libraries to date. So, it is doable. Instead of trying to edit all those parts of the alexa-seq pipeline code that produce job batch files and submission commands, I created a simple perl script which takes an alexa-seq job batch file (essentially just an sh file with one "task/command" per line) and produces the submission files compatible with our scheduler. I strongly recommend this strategy. Changing the alexa-seq code will be a lot more work. What I do is run the alexa-seq pipeline as instructed for steps 0 to 5B. Step 5C (submitMapBatch.sh) is the first step that requires submitting to a cluster. That sh file contains a whole bunch of bash commands for additional sh files (e.g., blast_vs_intergenics.sh). It is those files which should be submitted to a cluster, not the parent submitMapBatch.sh file. You can do them individually or cat them into combined files. I create one combined batch file for all libraries separated only by feature type (repeats, transcripts, etc) because they have different memory and runtime requirements. I can thus optimize cluster submission parameters for each of the 6 feature types. This is necessary because our cluster uses wallclock estimates and task number to determine job priority in the queue. Maybe your cluster has a more simple setup and this step will be unnecessary for you. Once I have combined the bash files I run my submitjobs.pl script on it and wait for it to finish. In later steps, whenever alexa says to submit some jobs to a cluster, the bash file typically contains the tasks/commands (instead of additional bash commands as above). I just run my submitjobs.pl script on each of those bash files. Check .output and .error files for problems and then proceed to the next step.

        For each project, once the alexa-seq .commands file is produced, I make a new copy of this file and edit it to add my own commands that are necessary for job submission. This file can then be used as a template for running future projects.
        Thanks for the replies. It sounds like the Lawrencium is set up almost exactly like our flux cluster here at UMich. The only issue that we have is that our head nodes don't have our data drives mounted (for a variety of reasons) and have a 15-minute max job limit, so would be a little constrained to do the preprocessing on our main analysis server and then scp the data over to the cluster for the alignments and then scp it all back. I appreciate the insights on all this.

        Comment


        • #19
          I have the same issue with my data and analysis servers not being mounted on the head nodes. I've found rsync useful for this. I was given a decent size data folder accessible by the cluster. I do some serial steps, rsync to the cluster-mounted server, run parallel jobs and then rsync back. But, a 15-minute max job limit will be a problem. Some jobs probably take longer than that.

          Comment


          • #20
            Originally posted by obig View Post
            I have the same issue with my data and analysis servers not being mounted on the head nodes. I've found rsync useful for this. I was given a decent size data folder accessible by the cluster. I do some serial steps, rsync to the cluster-mounted server, run parallel jobs and then rsync back. But, a 15-minute max job limit will be a problem. Some jobs probably take longer than that.
            Sorry, I meant that programs running on the head nodes themselves (only small 8p servers) can't exceed 15min. Thanks for the advice on rsync.

            Comment


            • #21
              Oh. Sorry I see what you were saying now. This is indeed the exact situation I have then. I found that rsyncing back and forth was sometimes pretty slow (many files). So, I found myself even submitting many of the individual serial jobs to the cluster and then just rsyncing back at the end. If I could go back and do it over though, I probably would have pushed harder to get a decent box installed and mounted on the cluster for serial processing steps. Its just much easier this way and such boxes are not that expensive these days.

              Comment


              • #22
                Yeah, I would agree with that. It really invaluable to have a few decent boxes (high memory ones even better) that have access to all the same storage mounts as the cluster nodes themselves...

                Comment


                • #23
                  database issue

                  Hi! We currently play with alexa-seq VM image and try to run a demo analysis. The first attempt failed due to that permanent error message 'DBD::mysql::st execute failed: Table 'ALEXA_hs_53_36o.Gene' doesn't exist at /home/alexa-seq/ALEXA/alexa_seq/utilities/ALEXA_DB.pm line 196.'
                  We did all steps exactly is it described in the DEMO.txt file except for the '#4.) Import annotation database' step. We downloaded hs_53_36o.tar.gz manually and moved it to the /home/alexa-seq/ALEXA/sequence_databases/
                  After that step4's command was executed '/home/alexa-seq/ALEXA/alexa_seq/alternativeExpressionDatabase/installAnnotationDb.pl
                  --annotation_dir=/home/alexa-seq/ALEXA/sequence_databases/ --db_build=hs_53_36o --server=localhost --user=alexa-seq --password=alexa-seq'
                  and so on.
                  Do you have any ideas what could went wrong and caused presumable absence of the ALEXA_hs_53_36o.Gene in the right place?
                  Thanks!

                  Comment


                  • #24
                    parseRepeats.sh terminate on the first blast.gz file. Would you please help?

                    Begin parsing 137 blast results files

                    Parsing ..../final/blast_results/A/694_Lane1/repeats/blast_0000.gz for blast results
                    Multiple Paired Reads - Unambiguous. Subject ID: AluSx|SINE1/7SL|Primates READ1: 694_1_1101_4534_2459_R1 READ2: 694_1_1101_4534_2459_R2$VAR1 = {};
                    $VAR1 = undef;

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 08:47 AM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    57 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X