SEQanswers

Go Back   SEQanswers > Applications Forums > De novo discovery



Similar Threads
Thread Thread Starter Forum Replies Last Post
FastQC, Kmer count, Trimmomatic: no success in trimming, still fail Kmer skmotay RNA Sequencing 6 10-09-2014 06:24 AM
Best way to compare DEG between de novo pipeline and Cufflinks pipeline lucasmiguel Bioinformatics 0 08-28-2014 06:20 PM
Suspiciously low read numbers with shotgun pipeline compared to amplicon pipeline Krisztab 454 Pyrosequencing 0 03-27-2014 07:05 AM
Oases; merging two Oases assembled transcriptomes NGSwork Bioinformatics 4 03-13-2014 07:53 PM
Oases Help dmacmillan De novo discovery 1 07-08-2012 07:04 PM

Reply
 
Thread Tools
Old 02-01-2016, 05:23 AM   #1
mcduryea
Junior Member
 
Location: Lund, Sweden

Join Date: Feb 2015
Posts: 3
Default Oases pipeline fails on second kmer

Hello all,

I am hoping for help running the oases_pipeline.py. I am able to run the pipeline for a subset of my data and for 4 Kmer values using the following code:

Code:
python oases_pipeline.py -m 21 -M 27 -s 2 -o oases_test -d '-fastq -shortPaired -separate trimm_15_F_paired.fq trimm_15_R_paired.fq' -p '-ins_length 160'
This runs successfully and produces output for K 21 through 27. However, when I try to run the following code on my full dataset (538769828 sequences) and for a larger range of K, it fails. This is my input code:

Code:
python oases_pipeline.py -m 21 -M 51 -s 2 -o oases_ALL -d '-fastq -shortPaired -separate ALL_F_paired.fq ALL_R_paired.fq' -p '-ins_length 160'
This command runs successfully for K=21, but then crashes on K=23 with this output:

Code:
[5141.379366] Inputting sequence 66000000 / 538769828

[5163.243577] Inputting sequence 67000000 / 538769828

[5170.824776]  === Sequences loaded in 997.337692 s

[5171.829179] Done inputting sequences

[5171.829187] Destroying splay table

[5173.870477] Splay table destroyed

[5175.177294] Command failed!

[5175.177304] rm -f oases_ALL_23/Sequences

Hash failed
I am at a loss for why it will run for a subset of data and for the first K, but crash on the second.

Many thanks in advance for any input!
mcduryea is offline   Reply With Quote
Old 02-01-2016, 12:17 PM   #2
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

Is your computer running out of memory or disk space with the full data set?
mastal is offline   Reply With Quote
Old 02-02-2016, 01:29 AM   #3
mcduryea
Junior Member
 
Location: Lund, Sweden

Join Date: Feb 2015
Posts: 3
Default

Thanks for the reply. I am running it on a computing node that has 128 GB of RAM, so I thought there should be sufficient memory, but I suppose this could be the case. I haven't received an error about memory, though.

I tried re-running it but changing the step size (s) to 4 and this runs through all the K-mers, but never produces contigs or a merged assembly and then dies mid run. Would this be an indication that it is running out of memory, perhaps?
mcduryea is offline   Reply With Quote
Old 02-02-2016, 02:49 AM   #4
mastal
Senior Member
 
Location: uk

Join Date: Mar 2009
Posts: 667
Default

It looks like it's either running out of memory, or running out of disk space to write the Sequences file to.

How big are the fastq files with your reads, and if you are running this on a cluster, how much disk space are you allowed/have you requested?

I'm not really familiar with Oases. When I've run velvet over multiple kmers, it just makes one Sequences file for the first kmer, and then uses symbolic links to the first Sequences file for the other kmers. The output you posted above looks like it was trying to write a Sequences file for k=23 and failed at that point.

Velvet has an option to make a binary form of the Sequences file (see the manual), I'm not sure whether that works with Oases as well, but that woiuld use less disk space.
mastal is offline   Reply With Quote
Old 02-02-2016, 05:11 AM   #5
mcduryea
Junior Member
 
Location: Lund, Sweden

Join Date: Feb 2015
Posts: 3
Default

Thanks! Yes, I think this is the case! I have 500 GB of storage, but I realized each value of K produces a Sequences file that is ~60 GB and a Roadmaps file that is around ~40 GB, so I believe I am running out of space. I will try to output the data as binary or run it batches. Thanks again!
mcduryea is offline   Reply With Quote
Reply

Tags
oases velvet, oases_pipeline.py

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:39 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO