I'm new to de-novo assembly and hope to get some help with my memory issues. Here is what I'm trying to do:
I want to assemble RNAseq data using velvet/oases. I have one HiSeq2000 lane of 100bp PE reads (approx. 170 M read pairs). This dataset has been randomly splitted into 10%, 20%, 30%....90% subsets. A colleague of mine managed to assemble the 100% dataset with approx. 104 GB RAM. I'm supposed to run the assemblies for the subsets. I managed to run the 10% and 20% samples through velvetg, velveth and oases without problems. The 30% sample i currently in the oases step. However, the 40% subset results in the following error messages:
------
Exited with exit code 1.
Resource usage summary:
CPU time : 27907.75 sec.
Max Memory : 55595 MB
Max Swap : 92723 MB
Max Processes : 4
Max Threads : 52
The output (if any) follows:
velvetg: Can't malloc 540 ShortReadMarkers totalling 10800 bytes: Cannot allocate memory
------
or:
------
Exited with exit code 1.
Resource usage summary:
CPU time : 31115.64 sec.
Max Memory : 40391 MB
Max Swap : 42936 MB
Max Processes : 4
Max Threads : 52
The output (if any) follows:
velvetg: Can't malloc 267 ShortReadMarkers totalling 5340 bytes: Cannot allocate memory
-------
Velveth was run successfully with: ./velveth /Dir 27 -fastq -shortPaired /Dir/X.fastq -short /Dir/Y.fastq
Velvetg settings are: ./velvetg /Dir -ins_length 300 -min_pair_count 2 -read_trkg yes -unused_reads yes
In our AMD-Magny-Cours-Cluster (MEGWARE) Cluster I have 128 GB memory which should be enough (see above). The guys who are responsible for the core facility think that the problem might be that too many small portions of data are produced and that although there is enough memory in the system, it cannot deal with too many small packages (I hope this very unscientific description is clear enough). They suggested the incorporation of Boost (C++) Libraries?
Additional information: The data is quality filtered and trimmed. The short reads are the singletons after trimming/filtering.
I would appreciate any information on this issue.
Thanks a lot in advance!
I want to assemble RNAseq data using velvet/oases. I have one HiSeq2000 lane of 100bp PE reads (approx. 170 M read pairs). This dataset has been randomly splitted into 10%, 20%, 30%....90% subsets. A colleague of mine managed to assemble the 100% dataset with approx. 104 GB RAM. I'm supposed to run the assemblies for the subsets. I managed to run the 10% and 20% samples through velvetg, velveth and oases without problems. The 30% sample i currently in the oases step. However, the 40% subset results in the following error messages:
------
Exited with exit code 1.
Resource usage summary:
CPU time : 27907.75 sec.
Max Memory : 55595 MB
Max Swap : 92723 MB
Max Processes : 4
Max Threads : 52
The output (if any) follows:
velvetg: Can't malloc 540 ShortReadMarkers totalling 10800 bytes: Cannot allocate memory
------
or:
------
Exited with exit code 1.
Resource usage summary:
CPU time : 31115.64 sec.
Max Memory : 40391 MB
Max Swap : 42936 MB
Max Processes : 4
Max Threads : 52
The output (if any) follows:
velvetg: Can't malloc 267 ShortReadMarkers totalling 5340 bytes: Cannot allocate memory
-------
Velveth was run successfully with: ./velveth /Dir 27 -fastq -shortPaired /Dir/X.fastq -short /Dir/Y.fastq
Velvetg settings are: ./velvetg /Dir -ins_length 300 -min_pair_count 2 -read_trkg yes -unused_reads yes
In our AMD-Magny-Cours-Cluster (MEGWARE) Cluster I have 128 GB memory which should be enough (see above). The guys who are responsible for the core facility think that the problem might be that too many small portions of data are produced and that although there is enough memory in the system, it cannot deal with too many small packages (I hope this very unscientific description is clear enough). They suggested the incorporation of Boost (C++) Libraries?
Additional information: The data is quality filtered and trimmed. The short reads are the singletons after trimming/filtering.
I would appreciate any information on this issue.
Thanks a lot in advance!
Comment