SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
Crossbow alignment format alpo Genomic Resequencing 1 10-10-2013 06:19 PM
SAM output from bowtie/crossbow sethnr Bioinformatics 0 02-03-2012 02:56 AM
Doing things with CrossBow output ? karve Genomic Resequencing 4 03-08-2011 08:31 AM
Running crossbow on Hadoop kumar.manoj412 General 0 07-18-2010 04:25 AM
Crossbow + Hadoop er.surendersharma Bioinformatics 2 07-14-2010 07:37 PM

Reply
 
Thread Tools
Old 07-22-2010, 09:08 AM   #1
Michael Robinson
Junior Member
 
Location: Miami

Join Date: Jul 2010
Posts: 7
Default Crossbow 1.0.0 help please

I am very very new with Crossbow and all its tools.

Following Crossbow 1.0.0 manual instructions I installed it and all the required tools. I am running Ubuntu in a 4 gig laptop.

I would like to run it in a single node without Hadoop for the moment.

Per the manual, the following are the commands that I am using and the error that I received.


michael@michael-laptop:~/crossbow_1/crossbow-1.0.0-beta4/example/e_coli$
perl $CROSSBOW_HOME/cb_local.pl -input=small.manifest -preprocess
-pre-output=preproc_small -reference=$CROSSBOW_REFS/e_coli
-output=output_small -cpus=1
Died at /home/michael/crossbow_1/crossbow-1.0.0-beta4/cb_emr.pl line 1290.

Any help will be appreciated.

Michael
Michael Robinson is offline   Reply With Quote
Old 07-22-2010, 10:16 AM   #2
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Hi Michael,

Hmmm... Where did you get that version of Crossbow? I didn't release any versions between 0.1.3 and 1.0.4 .

At any rate, please try the latest version (1.0.4) available from the crossbow page:

http://bowtie-bio.sourceforge.net/crossbow/index.shtml

And let me know if there's still a problem,
Ben
Ben Langmead is offline   Reply With Quote
Old 08-15-2010, 04:46 PM   #3
Michael Robinson
Junior Member
 
Location: Miami

Join Date: Jul 2010
Posts: 7
Default

Thank very much for your help.

I downloaded version 1.0.4, installed it and all corresponding programs, run it in a single computer using e_coli, and everything worked fine. Then I created a Virtual Machine (ubuntu) and repeated the same step with the same results.

Now I am trying to run the same job using Hadoop (cb_hadoop), but I think I am missing at least one step.

Following the Crossbow manual I run cb_hadoop getting:

michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop
Must specify -reference

then I run:

cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar

which is the location of the jar files for e_coli, then I got this error:

-------------------
michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar
Crossbow expects 'bowtie' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/bowtie on the workers
Crossbow expects 'soapsnp' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/soapsnp on the workers

Crossbow job
------------
Hadoop streaming commands in: /tmp/crossbow/invoke.scripts/cb.22704.hadoop.sh
Running...
==========================
Stage 1 of 3. Align
==========================
Sun Aug 15 17:54:31 EDT 2010
packageJobJar: [/home/michael/crossbow_1.0.4/crossbow-1.0.4/Get.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/Util.pm, /home/michael/crossbow_1.0.4/
crossbow-1.0.4/Tools.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/AWS.pm] [] /tmp/streamjob3580240183983830958.jar tmpDir=null
10/08/15 17:54:32 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
10/08/15 17:54:32 ERROR streaming.StreamJob: Error Launching job : Incomplete HDFS URI, no host: hdfs:/crossbow/intermediate/22704/align
Streaming Job Failed!
Non-zero exitlevel from Align streaming job
michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$
-------------------

Could you please tell where can I find documentation about what step(s) I am missing?

My goal is to run crossbow using multiple Virtual Machines using hadoop.

Thank you

Michael
Michael Robinson is offline   Reply With Quote
Old 08-16-2010, 07:53 AM   #4
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Hi Michael,

Quote:
Originally Posted by Michael Robinson View Post
Following the Crossbow manual I run cb_hadoop getting:

michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop
Must specify -reference

then I run:

cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar

which is the location of the jar files for e_coli, then I got this error:

-------------------
michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$ cb_hadoop.pl -reference=$CROSSBOW_REFS/e_coli.jar
Crossbow expects 'bowtie' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/bowtie on the workers
Crossbow expects 'soapsnp' to be at path /home/michael/crossbow_1.0.4/crossbow-1.0.4/bin/linux32/soapsnp on the workers

Crossbow job
------------
Hadoop streaming commands in: /tmp/crossbow/invoke.scripts/cb.22704.hadoop.sh
Running...
==========================
Stage 1 of 3. Align
==========================
Sun Aug 15 17:54:31 EDT 2010
packageJobJar: [/home/michael/crossbow_1.0.4/crossbow-1.0.4/Get.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/Util.pm, /home/michael/crossbow_1.0.4/
crossbow-1.0.4/Tools.pm, /home/michael/crossbow_1.0.4/crossbow-1.0.4/AWS.pm] [] /tmp/streamjob3580240183983830958.jar tmpDir=null
10/08/15 17:54:32 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
10/08/15 17:54:32 ERROR streaming.StreamJob: Error Launching job : Incomplete HDFS URI, no host: hdfs:/crossbow/intermediate/22704/align
Streaming Job Failed!
Non-zero exitlevel from Align streaming job
michael@HOST:~/crossbow_1.0.4/crossbow-1.0.4$
-------------------
You'll have to specify input and output directories using --input and --output as well. Depending on your version of Hadoop and how it's set up, you may need to specify HDFS URLs that include your namenode's address and port; e.g.: -input= hdfs://localhost:9000/my/input.

Hope this helps,
Ben
Ben Langmead is offline   Reply With Quote
Old 10-19-2010, 04:32 PM   #5
Michael Robinson
Junior Member
 
Location: Miami

Join Date: Jul 2010
Posts: 7
Default Crossbow 1.1.0 with Hadoop 0.20.2 Help

Hi,

I am a newbie.

I have Hadoop 0.20.2 running on a multi-node cluster, one server two nodes

Following Crossbow 1.1.0 installation instructions in the manual, I installed it in the server and tested it. no problems.
Now I want to install it (Bowtie and SOAPsnp) in the nodes following the same instructions:

"If you plan to run on a Hadoop cluster, you may need to manually copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes. You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share). You can also skip this step if Hadoop is installed in pseudo distributed mode, meaning that the cluster really consists of one node whose CPUs are treated as distinct slaves."

Could you please tell me: when you say "copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes." how are they related to the server install, do you mean an exact path as the Crossbow path in the server?

Could you give an example of "You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share)."

Also, testing previous Crossbow versions I needed to install other programs such as R, bioconductor, samtools, etc, are those programs not needed anymore?


Thank you

Michael
Michael Robinson is offline   Reply With Quote
Old 10-19-2010, 04:49 PM   #6
Ben Langmead
Senior Member
 
Location: Baltimore, MD

Join Date: Sep 2008
Posts: 200
Default

Hi Michael,

Quote:
Originally Posted by Michael Robinson View Post
I have Hadoop 0.20.2 running on a multi-node cluster, one server two nodes

Following Crossbow 1.1.0 installation instructions in the manual, I installed it in the server and tested it. no problems.
Now I want to install it (Bowtie and SOAPsnp) in the nodes following the same instructions:

"If you plan to run on a Hadoop cluster, you may need to manually copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes. You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share). You can also skip this step if Hadoop is installed in pseudo distributed mode, meaning that the cluster really consists of one node whose CPUs are treated as distinct slaves."

Could you please tell me: when you say "copy the bowtie and soapsnp files to the same path on each of your Hadoop cluster nodes." how are they related to the server install, do you mean an exact path as the Crossbow path in the server?
Yes, it's best to install 'bowtie' and 'soapsnp' at the same path on all nodes, including the server. It's not strictly necessary to install those tools on the server at all, but if you don't the "cb_hadoop --test" command will fail when run from the server.

Quote:
Could you give an example of "You can avoid this step by installing bowtie and soapsnp on a filesystem shared by all Hadoop nodes (e.g. an NFS share)."
All I really mean is that you can set up an NFS share so that all computers in the cluster "see" the same files in certain directories. E.g. you might set up your cluster so that the '/share/crossbow' directory contains a Crossbow install and is NFS-shared across all nodes in the cluster. If you do so, the path '/share/crossbow/bin/linux64/bowtie', for example, will be present on all nodes and you can specify that path using the --bowtie option.

Quote:
Also, testing previous Crossbow versions I needed to install other programs such as R, bioconductor, samtools, etc, are those programs not needed anymore?
You don't need samtools, no. You never needed R/Bioconductor for Crossbow - just for Myrna (a different though similar tool).

Hope this helps,
Ben
Ben Langmead is offline   Reply With Quote
Old 10-19-2010, 05:21 PM   #7
Michael Robinson
Junior Member
 
Location: Miami

Join Date: Jul 2010
Posts: 7
Default Crossbow 1.1.0 with Hadoop 0.20.2 Help

Hi Ben,

I am impressed how fast you replied.

Thanks very much

Michael
Michael Robinson is offline   Reply With Quote
Old 10-23-2010, 04:59 PM   #8
Michael Robinson
Junior Member
 
Location: Miami

Join Date: Jul 2010
Posts: 7
Default

Hi Ben,

I went the NFS route I think is best because I will only need to modify the server with future updates of Crossbow. I can see the Crossbow folders from the client. thanks

I also added to my .profile on the server and the nodes
export $CROSSBOW_HOME=location where I installed Crossbow

Now I have a new challenge. when I run cb_hadoop --test i get "program not found"

I can see cb_hadoop and I can also do a cat on it and read the code.

hadoop@Hadoop-Server:~/crossbow/crossbow$ ls
?? contrib ??H@@ ReduceWrap.pl
Align.pl Copy.pl LICENSE reftools
AWS.pm Counters.pl LICENSE_APACHE2 soapsnp
bin Counters.pm LICENSE_ARTISTIC Soapsnp.pl
BinSort.pl crossbow-1.1.0.zip LICENSE_GPL2 Tools.pm
cb_emr CrossbowIface.pm LICENSE_GPL3 TUTORIAL
CBFinish.pl crossbow-manual-v1-1-0.odt LICENSES Util.pm
cb_hadoop doc MANUAL VERSION
cb_local example MapWrap.pl Wrap.pm
CheckDirs.pl Get.pm NEWS
hadoop@Hadoop-Server:~/crossbow/crossbow$


I can see cb_hadoop and I can also do a cat on it and read the code.


Please tell me what I am doing wrong?

Thanks

Michael
Michael Robinson is offline   Reply With Quote
Old 10-24-2010, 03:12 PM   #9
Michael Robinson
Junior Member
 
Location: Miami

Join Date: Jul 2010
Posts: 7
Default

I found the solutions to the cb_hadoop error
I needed to add to my path the location where I install hadoop

I am running the crossbow using the e_coli data sample

Thanks
Michael Robinson is offline   Reply With Quote
Old 10-27-2010, 02:26 PM   #10
carze
Junior Member
 
Location: Maryland

Join Date: Nov 2009
Posts: 2
Default

Hi Ben,

Sorry to hijack this thread but seeing as you have already answered questions in here I was wondering if it is possible to get bowtie to produce SAM output within the crossbow pipeline. Whenever I pass the '--sam' flag to bowtie using the '--bowtie-args' flag I get a segmentation fault during the align step.

Thanks!
carze is offline   Reply With Quote
Old 11-03-2010, 09:16 PM   #11
rtgood
Junior Member
 
Location: australia

Join Date: May 2009
Posts: 1
Default

Hi Ben
I've installed crossbow on a sun 64 bit server runnng fedora 11 and I'm getting this error
i.e no shellscript was produced
Got any idea what I've done wrong???

Rob
[rtgood1@imokurok CROSSBOW_HOME]$ cb_local --input=RAL306.fq --preprocess --reference=$CROSSBOW_REFS/d_mel --output=testcb --all-haploids --cpus=2
print() on closed filehandle JSON at /home/rtgood1/Data/CROSSBOW_HOME/CrossbowIface.pm line 1329.
print() on closed filehandle SH at /home/rtgood1/Data/CROSSBOW_HOME/CrossbowIface.pm line 1331.
print() on closed filehandle HADOOP at /home/rtgood1/Data/CROSSBOW_HOME/CrossbowIface.pm line 1333.

Crossbow job
------------
Local commands in: /tmp/crossbow/invoke.scripts/cb.28975.sh
Running...
sh: /tmp/crossbow/invoke.scripts/cb.28975.sh: No such file or directory

[rtgood1@imokurok tmp]$ cd crossbow/
[rtgood1@imokurok crossbow]$ ls
invoke.scripts
[rtgood1@imokurok crossbow]$ cd invoke.scripts/
[rtgood1@imokurok invoke.scripts]$ ls
[rtgood1@imokurok invoke.scripts]$
rtgood is offline   Reply With Quote
Old 12-30-2010, 02:16 AM   #12
av_d
Member
 
Location: Pune, India

Join Date: Sep 2009
Posts: 12
Default crossbow error

I got some errors while running crossbow.
I've tried both cb_local and cb_hadoop with example ecoli dataset provided by crossbow.

cmd and parameter:

"cb_local --input=reads --output=out_small --reference=e_coli --all-haploid"

Its giving following error:


Align.pl: Retrived 0 counters from previous stages
* Align.pl: Read first line of stdin:
* @SRR014475.1 :1:1:108:111
* Bad number of read tokens ; expected 3 or 5:
* @SRR014475.1 :1:1:108:111
******
Fatal error 1.1.0:M140: Aborting because child with PID 15271 exited abnormally



Any Suggestion?
av_d is offline   Reply With Quote
Old 02-17-2011, 09:13 AM   #13
karve
Member
 
Location: Colorado

Join Date: Feb 2011
Posts: 12
Default Similar error in Hadoop - can make it work there

Well, another newbie here, to this stuff at least, but not to IT, so take my suggestions FWIW - on the other hand, I have got it to work all thru the 4 stages so..

I'm using Crossbow 1.1.1 btw.

I tried preprocess in both single machine and Hadoop modes and got this

Bad number of read tokens ; expected 3 or 5:

error in both modes as well. The output ahead and after that message was different for me though:
Mine said:

Written 8909572 spots

From that it was easy to figure out what's happening. In Hadoop mode, for me, the input gut bacteria ( is that right?) file is broken up in 21 files, 18 are legit with data, 2 are empty but still benign, but one file, part_00002 didn't have proper data in it, it had that above text string. So, 20 tasks worked just fine but the one trying to process that part_00002 file failed. So I just deleted that file, edited the shell script to pick up at that point, and voila in hadoop mode it went all the way to the end.

I'm doing everything with keep-all option so the intermediate files are all kept, and I used dry-run mode so that shell-scripts that run things are all kept so I can peek at them and edit them as needed.

Now for me, its on to the next step and to figure out what this all means in the biology aspect :-)

Enjoy.

-Shantanu

Last edited by karve; 02-17-2011 at 09:41 AM.
karve is offline   Reply With Quote
Old 05-06-2013, 03:22 PM   #14
narain
Member
 
Location: Washington DC

Join Date: Aug 2011
Posts: 78
Default

Here is the command i am using:

$CROSSBOW_HOME/cb_local --input=small.manifest --preprocess --reference=/home/abi/bioinfo/crossbow/crossbow-1.2.0/crossbow-1.2.0/CROSSBOW_REFS/e_coli --output=output_small --all-haploids --cpus=1 --preprocess-output=preprocess_output --keep-all --fastq-dump=/home/abi/bioinfo/sratoolkit/sratoolkit.2.3.1-centos_linux64/bin/fastq-dump

(I tried it for version 1.1.1 as well) .

I get problems with SRAtoolkit, though I do have it in the path specified in the command line. And I have tested my SRAtoolkit to work well.

******
* Copy.pl: Retrived 0 counters from previous stages
* Copy.pl: Line: ftp://ftp-trace.ncbi.nih.gov/sra/sra...14475.lite.sra 0
* Copy.pl: Not a comment line
* Copy.pl: Doing unpaired entry SRR014475.lite.sra
* Copy.pl: Fetching ftp://ftp-trace.ncbi.nih.gov/sra/sra...14475.lite.sra SRR014475.lite.sra 0
* reporter:counter:Short read preprocessor,Read data fetched,0
* fastq-dump could not be found in SRATOOLKIT_HOME or PATH; please specify --sraconv
******
Fatal error 1.1.1:M140: Aborting because child with PID 17272 exited abnormally

When requesting support, please include the full output printed here.
If a child process was the cause of the error, the output should
include the relevant error message from the child's error log. You may
be asked to provide additional files as well.
Non-zero exitlevel from Preprocess stage
narain is offline   Reply With Quote
Old 05-07-2013, 09:44 AM   #15
narain
Member
 
Location: Washington DC

Join Date: Aug 2011
Posts: 78
Default

Okay, I fixed that error. I changed the code TOOLS.PM at relevant point.
narain is offline   Reply With Quote
Old 05-07-2013, 09:45 AM   #16
narain
Member
 
Location: Washington DC

Join Date: Aug 2011
Posts: 78
Default

@karve, how much are you successful in analysing the output you got ?
narain is offline   Reply With Quote
Old 05-07-2013, 10:08 AM   #17
narain
Member
 
Location: Washington DC

Join Date: Aug 2011
Posts: 78
Default

FYI, @KARVE, all those manipulation is not needed in the latest release version 1.2.0 of crossbow.
narain is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:55 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO