Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Crossbow 1.0.0 help please

    I am very very new with Crossbow and all its tools.

    Following Crossbow 1.0.0 manual instructions I installed it and all the required tools. I am running Ubuntu in a 4 gig laptop.

    I would like to run it in a single node without Hadoop for the moment.

    Per the manual, the following are the commands that I am using and the error that I received.


    michael@michael-laptop:~/crossbow_1/crossbow-1.0.0-beta4/example/e_coli$
    perl $CROSSBOW_HOME/cb_local.pl -input=small.manifest -preprocess
    -pre-output=preproc_small -reference=$CROSSBOW_REFS/e_coli
    -output=output_small -cpus=1
    Died at /home/michael/crossbow_1/crossbow-1.0.0-beta4/cb_emr.pl line 1290.

    Any help will be appreciated.

    Michael

  • #2
    Hi Michael,

    Hmmm... Where did you get that version of Crossbow? I didn't release any versions between 0.1.3 and 1.0.4 .

    At any rate, please try the latest version (1.0.4) available from the crossbow page:



    And let me know if there's still a problem,
    Ben

    Comment


    • #3
      Thank very much for your help.

      I downloaded version 1.0.4, installed it and all corresponding programs, run it in a single computer using e_coli, and everything worked fine. Then I created a Virtual Machine (ubuntu) and repeated the same step with the same results.

      Now I am trying to run the same job using Hadoop (cb_hadoop), but I think I am missing at least one step.

      Following the Crossbow manual I run cb_hadoop getting:

      michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop
      Must specify -reference

      then I run:

      cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar

      which is the location of the jar files for e_coli, then I got this error:

      -------------------
      michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar
      Crossbow expects 'bowtie' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/bowtie on the workers
      Crossbow expects 'soapsnp' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/soapsnp on the workers

      Crossbow job
      ------------
      Hadoop streaming commands in: /tmp/crossbow/invoke.scripts/cb.22704.hadoop.sh
      Running...
      ==========================
      Stage 1 of 3. Align
      ==========================
      Sun Aug 15 17:54:31 EDT 2010
      packageJobJar: [/home/michael/crossbow_1.0.4/crossbow-1.0.4/Get.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/Util.pm, /home/michael/crossbow_1.0.4/
      crossbow-1.0.4/Tools.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/AWS.pm] [] /tmp/streamjob3580240183983830958.jar tmpDir=null
      10/08/15 17:54:32 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
      10/08/15 17:54:32 ERROR streaming.StreamJob: Error Launching job : Incomplete HDFS URI, no host: hdfs:/crossbow/intermediate/22704/align
      Streaming Job Failed!
      Non-zero exitlevel from Align streaming job
      michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$
      -------------------

      Could you please tell where can I find documentation about what step(s) I am missing?

      My goal is to run crossbow using multiple Virtual Machines using hadoop.

      Thank you

      Michael

      Comment


      • #4
        Hi Michael,

        Originally posted by Michael Robinson View Post
        Following the Crossbow manual I run cb_hadoop getting:

        michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop
        Must specify -reference

        then I run:

        cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar

        which is the location of the jar files for e_coli, then I got this error:

        -------------------
        michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar
        Crossbow expects 'bowtie' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/bowtie on the workers
        Crossbow expects 'soapsnp' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/soapsnp on the workers

        Crossbow job
        ------------
        Hadoop streaming commands in: /tmp/crossbow/invoke.scripts/cb.22704.hadoop.sh
        Running...
        ==========================
        Stage 1 of 3. Align
        ==========================
        Sun Aug 15 17:54:31 EDT 2010
        packageJobJar: [/home/michael/crossbow_1.0.4/crossbow-1.0.4/Get.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/Util.pm, /home/michael/crossbow_1.0.4/
        crossbow-1.0.4/Tools.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/AWS.pm] [] /tmp/streamjob3580240183983830958.jar tmpDir=null
        10/08/15 17:54:32 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
        10/08/15 17:54:32 ERROR streaming.StreamJob: Error Launching job : Incomplete HDFS URI, no host: hdfs:/crossbow/intermediate/22704/align
        Streaming Job Failed!
        Non-zero exitlevel from Align streaming job
        michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$
        -------------------
        You'll have to specify input and output directories using --input and --output as well. Depending on your version of Hadoop and how it's set up, you may need to specify HDFS URLs that include your namenode's address and port; e.g.: -input= hdfs://localhost:9000/my/input.

        Hope this helps,
        Ben

        Comment


        • #5
          Crossbow 1.1.0 with Hadoop 0.20.2 Help

          Hi,

          I am a newbie.

          I have Hadoop 0.20.2 running on a multi-node cluster, one server two nodes

          Following Crossbow 1.1.0 installation instructions in the manual, I installed it in the server and tested it. no problems.
          Now I want to install it (Bowtie and SOAPsnp) in the nodes following the same instructions:

          "If you plan to run on a Hadoop cluster, you may need to manually copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes. You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share). You can also skip this step if Hadoop is installed in pseudo distributed mode, meaning that the cluster really consists of one node whose CPUs are treated as distinct slaves."

          Could you please tell me: when you say "copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes." how are they related to the server install, do you mean an exact path as the Crossbow path in the server?

          Could you give an example of "You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share)."

          Also, testing previous Crossbow versions I needed to install other programs such as R, bioconductor, samtools, etc, are those programs not needed anymore?


          Thank you

          Michael

          Comment


          • #6
            Hi Michael,

            Originally posted by Michael Robinson View Post
            I have Hadoop 0.20.2 running on a multi-node cluster, one server two nodes

            Following Crossbow 1.1.0 installation instructions in the manual, I installed it in the server and tested it. no problems.
            Now I want to install it (Bowtie and SOAPsnp) in the nodes following the same instructions:

            "If you plan to run on a Hadoop cluster, you may need to manually copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes. You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share). You can also skip this step if Hadoop is installed in pseudo distributed mode, meaning that the cluster really consists of one node whose CPUs are treated as distinct slaves."

            Could you please tell me: when you say "copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes." how are they related to the server install, do you mean an exact path as the Crossbow path in the server?
            Yes, it's best to install 'bowtie' and 'soapsnp' at the same path on all nodes, including the server. It's not strictly necessary to install those tools on the server at all, but if you don't the "cb_hadoop --test" command will fail when run from the server.

            Could you give an example of "You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share)."
            All I really mean is that you can set up an NFS share so that all computers in the cluster "see" the same files in certain directories. E.g. you might set up your cluster so that the '/share/crossbow' directory contains a Crossbow install and is NFS-shared across all nodes in the cluster. If you do so, the path '/share/crossbow/bin/linux64/bowtie', for example, will be present on all nodes and you can specify that path using the --bowtie option.

            Also, testing previous Crossbow versions I needed to install other programs such as R, bioconductor, samtools, etc, are those programs not needed anymore?
            You don't need samtools, no. You never needed R/Bioconductor for Crossbow - just for Myrna (a different though similar tool).

            Hope this helps,
            Ben

            Comment


            • #7
              Crossbow 1.1.0 with Hadoop 0.20.2 Help

              Hi Ben,

              I am impressed how fast you replied.

              Thanks very much

              Michael

              Comment


              • #8
                Hi Ben,

                I went the NFS route I think is best because I will only need to modify the server with future updates of Crossbow. I can see the Crossbow folders from the client. thanks

                I also added to my .profile on the server and the nodes
                export $CROSSBOW_HOME=location where I installed Crossbow

                Now I have a new challenge. when I run cb_hadoop --test i get "program not found"

                I can see cb_hadoop and I can also do a cat on it and read the code.

                hadoop@Hadoop-Server:~/crossbow/crossbow$ ls
                ?? contrib ??H@@ ReduceWrap.pl
                Align.pl Copy.pl LICENSE reftools
                AWS.pm Counters.pl LICENSE_APACHE2 soapsnp
                bin Counters.pm LICENSE_ARTISTIC Soapsnp.pl
                BinSort.pl crossbow-1.1.0.zip LICENSE_GPL2 Tools.pm
                cb_emr CrossbowIface.pm LICENSE_GPL3 TUTORIAL
                CBFinish.pl crossbow-manual-v1-1-0.odt LICENSES Util.pm
                cb_hadoop doc MANUAL VERSION
                cb_local example MapWrap.pl Wrap.pm
                CheckDirs.pl Get.pm NEWS
                hadoop@Hadoop-Server:~/crossbow/crossbow$


                I can see cb_hadoop and I can also do a cat on it and read the code.


                Please tell me what I am doing wrong?

                Thanks

                Michael

                Comment


                • #9
                  I found the solutions to the cb_hadoop error
                  I needed to add to my path the location where I install hadoop

                  I am running the crossbow using the e_coli data sample

                  Thanks

                  Comment


                  • #10
                    Hi Ben,

                    Sorry to hijack this thread but seeing as you have already answered questions in here I was wondering if it is possible to get bowtie to produce SAM output within the crossbow pipeline. Whenever I pass the '--sam' flag to bowtie using the '--bowtie-args' flag I get a segmentation fault during the align step.

                    Thanks!

                    Comment


                    • #11
                      Hi Ben
                      I've installed crossbow on a sun 64 bit server runnng fedora 11 and I'm getting this error
                      i.e no shellscript was produced
                      Got any idea what I've done wrong???

                      Rob
                      [rtgood1@imokurok CROSSBOW_HOME]$ cb_local --input=RAL306.fq --preprocess --reference=$CROSSBOW_REFS/d_mel --output=testcb --all-haploids --cpus=2
                      print() on closed filehandle JSON at /home/rtgood1/Data/CROSSBOW_HOME/CrossbowIface.pm line 1329.
                      print() on closed filehandle SH at /home/rtgood1/Data/CROSSBOW_HOME/CrossbowIface.pm line 1331.
                      print() on closed filehandle HADOOP at /home/rtgood1/Data/CROSSBOW_HOME/CrossbowIface.pm line 1333.

                      Crossbow job
                      ------------
                      Local commands in: /tmp/crossbow/invoke.scripts/cb.28975.sh
                      Running...
                      sh: /tmp/crossbow/invoke.scripts/cb.28975.sh: No such file or directory

                      [rtgood1@imokurok tmp]$ cd crossbow/
                      [rtgood1@imokurok crossbow]$ ls
                      invoke.scripts
                      [rtgood1@imokurok crossbow]$ cd invoke.scripts/
                      [rtgood1@imokurok invoke.scripts]$ ls
                      [rtgood1@imokurok invoke.scripts]$

                      Comment


                      • #12
                        crossbow error

                        I got some errors while running crossbow.
                        I've tried both cb_local and cb_hadoop with example ecoli dataset provided by crossbow.

                        cmd and parameter:

                        "cb_local --input=reads --output=out_small --reference=e_coli --all-haploid"

                        Its giving following error:


                        Align.pl: Retrived 0 counters from previous stages
                        * Align.pl: Read first line of stdin:
                        * @SRR014475.1 :1:1:108:111
                        * Bad number of read tokens ; expected 3 or 5:
                        * @SRR014475.1 :1:1:108:111
                        ******
                        Fatal error 1.1.0:M140: Aborting because child with PID 15271 exited abnormally



                        Any Suggestion?

                        Comment


                        • #13
                          Similar error in Hadoop - can make it work there

                          Well, another newbie here, to this stuff at least, but not to IT, so take my suggestions FWIW - on the other hand, I have got it to work all thru the 4 stages so..

                          I'm using Crossbow 1.1.1 btw.

                          I tried preprocess in both single machine and Hadoop modes and got this

                          Bad number of read tokens ; expected 3 or 5:

                          error in both modes as well. The output ahead and after that message was different for me though:
                          Mine said:

                          Written 8909572 spots

                          From that it was easy to figure out what's happening. In Hadoop mode, for me, the input gut bacteria ( is that right?) file is broken up in 21 files, 18 are legit with data, 2 are empty but still benign, but one file, part_00002 didn't have proper data in it, it had that above text string. So, 20 tasks worked just fine but the one trying to process that part_00002 file failed. So I just deleted that file, edited the shell script to pick up at that point, and voila in hadoop mode it went all the way to the end.

                          I'm doing everything with keep-all option so the intermediate files are all kept, and I used dry-run mode so that shell-scripts that run things are all kept so I can peek at them and edit them as needed.

                          Now for me, its on to the next step and to figure out what this all means in the biology aspect :-)

                          Enjoy.

                          -Shantanu
                          Last edited by karve; 02-17-2011, 09:41 AM.

                          Comment


                          • #14
                            Here is the command i am using:

                            $CROSSBOW_HOME/cb_local --input=small.manifest --preprocess --reference=/home/abi/bioinfo/crossbow/crossbow-1.2.0/crossbow-1.2.0/CROSSBOW_REFS/e_coli --output=output_small --all-haploids --cpus=1 --preprocess-output=preprocess_output --keep-all --fastq-dump=/home/abi/bioinfo/sratoolkit/sratoolkit.2.3.1-centos_linux64/bin/fastq-dump

                            (I tried it for version 1.1.1 as well) .

                            I get problems with SRAtoolkit, though I do have it in the path specified in the command line. And I have tested my SRAtoolkit to work well.

                            ******
                            * Copy.pl: Retrived 0 counters from previous stages
                            * Copy.pl: Line: ftp://ftp-trace.ncbi.nih.gov/sra/sra...14475.lite.sra 0
                            * Copy.pl: Not a comment line
                            * Copy.pl: Doing unpaired entry SRR014475.lite.sra
                            * Copy.pl: Fetching ftp://ftp-trace.ncbi.nih.gov/sra/sra...14475.lite.sra SRR014475.lite.sra 0
                            * reporter:counter:Short read preprocessor,Read data fetched,0
                            * fastq-dump could not be found in SRATOOLKIT_HOME or PATH; please specify --sraconv
                            ******
                            Fatal error 1.1.1:M140: Aborting because child with PID 17272 exited abnormally

                            When requesting support, please include the full output printed here.
                            If a child process was the cause of the error, the output should
                            include the relevant error message from the child's error log. You may
                            be asked to provide additional files as well.
                            Non-zero exitlevel from Preprocess stage

                            Comment


                            • #15
                              Okay, I fixed that error. I changed the code TOOLS.PM at relevant point.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-27-2024, 06:37 PM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-27-2024, 06:07 PM
                              0 responses
                              11 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              69 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X