thomasblomquist 11-24-2015 04:49 PM

Partial Order Alignment Step
I'm running through Jared and Nick's Nature Methods de novo assembly approach on my Lambda burn-in FAST5 data. Just to get the pipeline up and running and some familiarity using a focused data set.

I've successfully converted FAST5 to FASTA using poretools. Using the nanocorrect pipeline, I've performed the DALIGNER steps, and then now am processing using the Partial Order Alignment step using poaV2.

It is working and the corrected.fasta file is growing... slowly. I've been tailing the file and the output when blasted is giving me 95% accuracy to NCBI refseq Lambda phage sequence. It's been chuggin along for a day. I have a 16 core 3.7 Ghz setup with 64 GB of ram, and plenty of SSD drive space to spare. It's only using a single thread (based on my system process utilization). And it's only sucked up 150 MB of working RAM.

Wondering what others have done to parallelize this step, or what can be done for speed up?

ymc 11-24-2015 09:50 PM

I used PBcR and then nanopolish with 2D reads only and I got good results. Is there a much better pipeline than PBcR+nanopolish?

thomasblomquist 11-25-2015 03:55 AM

Nanocorrect (daligner + poa), is the step preceding the celera assembly and nanopolish. This is to say, PBcR and nanopolish are next once the POA is done... When it gets done.

ymc 11-25-2015 05:57 AM

Thx. Let me give it a try

ymc 11-25-2015 06:04 AM

Ah. Nanocorrect outputs fasta but PBcR requires fastq input. How do u deal with that?

thomasblomquist 12-01-2015 08:21 AM

Interesting, I tried the combined PBcR MHAP pipeline with the oxford.spec and arrived at an assembly in 20 minutes with 98% match to the NCBI ref seq for Lambda.

The DALIGNER, POA and RunCA with the oxford.spec arrived at the assembly after 1.5 days with >99% match to the NCBI ref seq for Lambda.

The major difference seems to be the latter is more accurate in the homopolymer runs.

Still for rapid identification and other purposes, the PBcR MHAP pipeline is more than adequate.


ymc 12-01-2015 11:19 PM

How did you obtain the frg file need for runCA? I suppose you only had one fasta file from the nanocorrect pipeline without any qual file, right?

