Seqanswers Leaderboard Ad

**rhall** · 04-23-2014, 02:24 PM

What version of SMRT Analysis are you using? Early versions of HGAP were limited in genome size (~100Mb) because of this limitation with blasr. SMRT Analysis 2.2 and HGAP.3 will not have this problem as the data can be split into multiple blasr alignments (parameter - number of seed read chunks)

**tabotaab** · 04-24-2014, 01:33 AM

I am using SMRT Analysis 2.2 and HGAP.3 and I set "targetChunks" parameter to 6. But still I receive the same ERROR.
here are the prarameters which I set for "P_PreAssemblerDagcon" module:

<module id="P_PreAssemblerDagcon">
<param name="computeLengthCutoff"><value>true</value></param>
<param name="minLongReadLength"><value>9000</value></param>
<param name="targetChunks"><value>6</value></param>
<param name="splitBestn"><value>11</value></param>
<param name="totalBestn"><value>24</value></param>
<param name="blasrOpts"><value> -noSplitSubreads -minReadLength 200 -maxScore -1000 -maxLCPLength 16 -minMatch 14</value></param>
</module>

and this is the error which I see in smrtpipe.log:

[ERROR] 2014-04-24 10:25:19,992 [smrtpipe.status refreshTargets 413] *** Failed task task://Anonymous/P_PreAssemblerDagcon/hgapAlignForCorrection

I checked the "P_PreAssemblerDagcon/hgapAlignForCorrection.log" file and it seems that the input file is not splitted for blasr:

Successfully found /PacBio/4th_run/data/filtered_subreads.fasta
Successfully found /PacBio/4th_run/filtered_longreads.fasta
Successfully validated input files
[INFO] 2014-04-24T10:25:15 [blasr] started.
ERROR! Reading fasta files greater than 4Gbytes is not supported.

FYI, "filtered_subreads.fasta" file is 14G and "filtered_longreads.fasta" is 11G.

Any idea?

**yueluo** · 04-24-2014, 02:33 AM

I haven't used blasr, but this error seems to be about reading the reference(instead of reading your reads).

I did some googling and saw this :

https://github.com/PacificBiosciences/blasr/issues/2

Something that you may want to know.
First, the maximum reference genome size that blasr supports is 4G.
Second, blasr is designed to align Pacbio reads to genome, not genome to genome, and there is a limit on read length (e.g., <100K). If a read is too long, blasr may consume all the memory and cause problems.

Hope this helps.

**rhall** · 04-24-2014, 07:00 AM

The splitting is not being carried out as I guess you are running in none distributed mode? HGAP for this size genome will have to be run using the '--distribute' parameter to smrtpipe.py. This is not really intuitive when executing on a single server with high number of cores, but it is they only way to get the workflow engine to split the assembly into manageable chunks. SMRT pipe command:

Code:

smrtpipe.py --distribute -D CLUSTER_MANAGER=BASH -D MAX_THREADS=6 -D NPROC=10 ...

**lastnameha** · 07-19-2014, 10:42 PM

tabotaab,

So, did you figure this out?
I have exactly same problem.

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Today, 08:47 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

HGAP error caused by blasr memory limit

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News