Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Myrna 1.2.1 on Hadoop problems

    Please help ......

    Have installed Myrna 1.2.1 on Hadoop cluster (0.20) with Bowtie 1.0.0, R/BioConductor and SRA toolkits. All required env variables have been set. Install tests pass fine. When running the Yeast (small) job, ends in an error in the Step 2 (Align). I have tired the same test with Bowtie 0.12.8, with the same result.

    I have seen similar errors when I was not using the right version of Bowtie before for Crossbow (0.12.8), but release notes for Myrna 1.2.1 mentions BT 1.0.0.


    Excerpt from the logs:


    lrwxrwxrwx 1 mapred mapred 84 Aug 20 11:08 job.jar -> /hdfs/hd1/hadoop/mapred/taskTracker/root/jobcache/job_201307291643_0023/jars/job.jar
    drwxr-xr-x 2 mapred mapred 4096 Aug 20 11:08 tmp
    Align.pl: Read first line of stdin:
    FN:SRR014339.fastq;LB:rrp6_lsm1_pat1-1;RN:@SRR014339.4500001 CGCAAGTCATCAGCTTGCGTTGATTACGTCCCAGAT `````_`T```Z`[P`OQ`PQZPMNPKDDDKC@A>F
    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 143
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:575)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper_aroundBody2(MapTask.java:444)
    at org.apache.hadoop.mapred.MapTask$AjcClosure3.run(MapTask.java:1)
    at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
    at com.intel.bigdata.management.agent.HadoopTaskAspect.doPhaseCall(HadoopTaskAspect.java:166)
    at com.intel.bigdata.management.agent.HadoopTaskAspect.ajc$inlineAccessMethod$com_intel_bigdata_management_agent_HadoopTaskAspect$com_intel_bigdata_management_agent_HadoopTaskAspect$doPhaseCall(HadoopTaskAspect.java:1)
    at com.intel.bigdata.management.agent.HadoopTaskAspect.aroundMap(HadoopTaskAspect.java:38)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run_aroundBody0(MapTask.java:377)
    at org.apache.hadoop.mapred.MapTask$AjcClosure1.run(MapTask.java:1)
    at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
    at com.intel.bigdata.management.agent.HadoopTaskAspect.aroundTaskRun(HadoopTaskAspect.java:95)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:351)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
    at org.apache.hadoop.mapred.Child.main(Child.java:260)

  • #2
    More details from stderr, if it helps.

    Align.pl: s3cmd: found: , given:
    Align.pl: jar: found: /usr/java/latest/bin/jar, given:
    Align.pl: hadoop: found: /usr/lib/hadoop/libexec/../bin/hadoop, given:
    Align.pl: wget: found: /usr/bin/wget, given:
    Align.pl: s3cfg:
    Align.pl: bowtie: found: , given: /usr/local/bin/bowtie-1.0.0/bowtie
    Align.pl: partition len: 1000000
    Align.pl: ref: HDFS:///myrna-refs/yeast_ensembl_67.jar
    Align.pl: quality: solexa64
    Align.pl: truncate at: 0
    Align.pl: discard mate: 0
    Align.pl: discard reads < truncate len: 0
    Align.pl: SAM passthrough: 0
    Align.pl: Straight through: 0
    Align.pl: globals directory: HDFS:///myrna/intermediate/8776/globals
    Align.pl: pool replicates?: 0
    Align.pl: pool trim length: 0
    Align.pl: pool technical replicates?: 0
    Align.pl: local index path:
    Align.pl: counters:
    Align.pl: dest dir: /tmp/myrna-1O7L22aMMT
    Align.pl: bowtie args: --partition -1000000 --mm -t --hadoopout --startverbose -m 1
    Align.pl: ls -al
    Align.pl: total 40
    drwxr-xr-x 3 mapred mapred 4096 Aug 22 11:04 .
    drwxr-xr-x 3 mapred mapred 4096 Aug 22 11:04 ..
    lrwxrwxrwx 1 mapred mapred 89 Aug 22 11:04 .job.jar.crc -> /hdfs/hd1/hadoop/mapred/taskTracker/root/jobcache/job_201307291643_0029/jars/.job.jar.crc
    lrwxrwxrwx 1 mapred mapred 83 Aug 22 11:04 AWS.pm -> /hdfs/hd1/hadoop/mapred/taskTracker/root/jobcache/job_201307291643_0029/jars/AWS.pm
    lrwxrwxrwx 1 mapred mapred 88 Aug 22 11:04 Counters.pm -> /hdfs/hd1/hadoop/mapred/taskTracker/root/jobcache/job_201307291643_0029/jars/Counters.pm
    lrwxrwxrwx 1 mapred mapred 83 Aug 22 11:04 Get.pm -> /hdfs/hd1/hadoop/mapred/taskTracker/root/jobcache/job_201307291643_0029/jars/Get.pm
    lrwxrwxrwx 1 mapred mapred 85 Aug 22 11:04 Tools.pm -> /hdfs/hd1/hadoop/mapred/taskTracker/root/jobcache/job_201307291643_0029/jars/Tools.pm
    lrwxrwxrwx 1 mapred mapred 84 Aug 22 11:04 Util.pm -> /hdfs/hd1/hadoop/mapred/taskTracker/root/jobcache/job_201307291643_0029/jars/Util.pm
    lrwxrwxrwx 1 mapred mapred 84 Aug 22 11:04 job.jar -> /hdfs/hd1/hadoop/mapred/taskTracker/root/jobcache/job_201307291643_0029/jars/job.jar
    drwxr-xr-x 2 mapred mapred 4096 Aug 22 11:04 tmp
    Align.pl: Read first line of stdin:
    FN:SRR014339.fastq;LB:rrp6_lsm1_pat1-1;RN:@SRR014339.4500001 CGCAAGTCATCAGCTTGCGTTGATTACGTCCCAGAT `````_`T```Z`[P`OQ`PQZPMNPKDDDKC@A>F
    java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 143
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:575)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:135)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper_aroundBody2(MapTask.java:444)
    at org.apache.hadoop.mapred.MapTask$AjcClosure3.run(MapTask.java:1)
    at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
    at com.intel.bigdata.management.agent.HadoopTaskAspect.doPhaseCall(HadoopTaskAspect.java:166)
    at com.intel.bigdata.management.agent.HadoopTaskAspect.ajc$inlineAccessMethod$com_intel_bigdata_management_agent_HadoopTaskAspect$com_intel_bigdata_management_agent_HadoopTaskAspect$doPhaseCall(HadoopTaskAspect.java:1)
    at com.intel.bigdata.management.agent.HadoopTaskAspect.aroundMap(HadoopTaskAspect.java:38)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run_aroundBody0(MapTask.java:377)
    at org.apache.hadoop.mapred.MapTask$AjcClosure1.run(MapTask.java:1)
    at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149)
    at com.intel.bigdata.management.agent.HadoopTaskAspect.aroundTaskRun(HadoopTaskAspect.java:95)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:351)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1132)
    at org.apache.hadoop.mapred.Child.main(Child.java:260)

    Comment


    • #3
      This is what I see when I run the install test:

      [root@idh-dev03 myrna-1.2.1]# ./myrna_hadoop --test
      Searching for 'bowtie' binary...
      Specified via --bowtie?....no
      $MYRNA_BOWTIE_HOME specified?....YES (/usr/local/bin/bowtie-1.0.0)
      Runnable?....YES
      Searching for 'Rscript' binary...
      Specified via --Rhome?....no
      $MYRNA_RHOME specified?....YES (/usr/local/bin/myrna-1.2.1/R/bin/)
      Runnable?....no
      Checking /usr/local/bin/myrna-1.2.1/bin...
      Scanning directory: /usr/local/bin/myrna-1.2.1/bin/linux32
      Scanning directory: /usr/local/bin/myrna-1.2.1/bin/linux64
      Scanning directory: /usr/local/bin/myrna-1.2.1/bin/mac32
      Scanning directory: /usr/local/bin/myrna-1.2.1/bin/mac64
      I'm searching for R or Rscript, so scanning directory: /usr/local/bin/myrna-1.2.1/R/bin/Rscript
      Checking whether R has appropriate R/Bioconductor packages...
      [1] "Found required package lmtest"
      [1] "Found required package multicore"
      [1] "Found required package IRanges"
      [1] "Found required package geneplotter"
      [1] "All packages found"
      Settling on /usr/local/bin/myrna-1.2.1/R/bin/Rscript
      Searching for 'fastq-dump' binary...
      Specified via --fastq-dump?....no
      $MYRNA_SRATOOLKIT_HOME specified?....YES (/usr/local/bin/sratoolkit.2.3.2-5-centos_linux64/bin)
      Runnable?....YES
      Summary:
      bowtie: INSTALLED at /usr/local/bin/bowtie-1.0.0/bowtie
      R: INSTALLED with RHOME at /usr/local/bin/myrna-1.2.1/R/bin/Rscript
      Hadoop note: executables must be runnable via the SAME PATH on all nodes.
      PASSED install test

      Comment


      • #4
        Well, update to my thread, not much help here it seems.......

        My problem was solved by running the job on a 3 node cluster as opposed to the 1 node cluster I was initially testing on (all nodes have same specs). Maybe the workload is not suitable for a single node cluster, I am not sure.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 08:47 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        60 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        54 views
        0 likes
        Last Post seqadmin  
        Working...
        X