igwill 11-06-2018 05:54 AM

Two sets of .fast5 files from Nanopore

I am trying to Nanopolish a draft Nanopore genome I assembled with Canu.
For my indexing step, I have two folders in the data given to me by the sequencing service that contain fast5 files.
Which is the correct to use for indexing Nanopolish?

I have one set, in a folder called fast5, which contains folders 0 to 215, each containing fast5 files that are small (20 kb to 1000 kb).

I have a second set, in another folder albacore-2.2.5-FLO-PRO001-SQK-LSK109-by_dir, which contains folders 1 -215, each containing:
which contains:
containing fast5 files that are larger (33kb to 10,000 kb)

The names of the files in fast5/1 or albacore*by_dir/1/workspace/0 are exactly the same but differ in size.

I'm leaning toward the larger files as their folder shares a dir with the sequencing_summary.txt. But I'm really not sure. Anyone see this before?

Thank you

gringer 04-07-2019 03:07 PM

As far as I'm aware, it doesn't matter. Nanopolish uses the called fastq files together with the signal in the fast5 files, so both should work. I expect the main difference between the two fast5 file types is that one will include the called sequences as an additional folder within the fast5 files.

