View Single Post
Old 04-12-2020, 04:57 PM   #53
Junior Member
Location: Perú

Join Date: Jul 2019
Posts: 6
Default normalized reads file empty

Hi Brian and everyone! I'm using bbnorm but i keep getting a problem.

I ran the command:

$ -Xmx64g t=18 in=pt.raw.fastq out=pt.raw.normalized.fq target=90 mindepth=2

everything looked good but when the proccess ended, I realized the file pt.raw.normalized.fq was empty


I just run the following command:

$ in=pt.raw.fastq out=pt.raw.normalized3.fq target=90 min=2

But at the end, my pt.raw.normalized3.fq file was still empty, like before T-T

I think the problem could be here:

In the second pass says

HTML Code:
Made hash table:        hashes = 3       mem = 65.66 GB         cells = 35.25B          used = 0.000%

Estimated unique kmers:         0

Table creation time:            17.804 seconds.
Started output threads.
Table read time:                0.012 seconds.          0.00 kb/sec
Total reads in:                 0               NaN% Kept
Total bases in:                 0               NaN% Kept
Error reads in:                 0               NaN%
Error type 1:                   0               NaN%
Error type 2:                   0               NaN%
Error type 3:                   0               NaN%
Total kmers counted:            0
Please, can someone tell me what did I do wrong?
Thanks a lot in advance!


I just found one of your comments:
Originally Posted by Brian Bushnell View Post
BBNorm cannot be used for raw PacBio data, as the error rate is too high; it is throwing everything away as the apparent depth is always 1, since all the kmers are unique. Normalization uses 31-mers (by default) which requires that, on average, the error rate is below around 1/40 (so that a large number of kmers will be error-free). However, raw PacBio data has on average a 1/7 error rate, so most programs that use long kmers will not work on it at all. BBMap, on the other hand, uses short kmers (typically 9-13) and it can process PacBio data, but does not do normalization - a longer kmer is needed.

PacBio CCS or "Reads of Insert" that are self-corrected, with multiple passes to drop the error rate below 3% or so, could be normalized by BBNorm. So, if you intentionally fragment your reads to around 3kbp or less, but run long movies, then self-correct, normalization should work.

PacBio data has a very flat coverage distribution, which is great, and means that typically it does not need normalization. But MDA'd single cells have highly variable coverage regardless of the platform, and approaches like HGAP to correct by consensus of multiple reads covering the same locus will not work anywhere that has very low coverage. I think your best bet is really to shear to a smaller fragment size, self-correct to generate "Reads of Insert", and use those to assemble. I doubt normalization will give you a better assembly with error-corrected single-cell PacBio data, but if it did, you would have to use custom parameters to not throw away low-coverage data (namely, "mindepth=0 lowthresh=0"), since a lot of the single-cell contigs have very low coverage. BBNorm (and, I imagine, all other normalizers) have defaults set for Illumina reads.
I will try to normalize my HiFi PacBio (PacBio CCS) reads then

Can I not reduce the kmer length? (default=31)

Last edited by silverfox; 04-12-2020 at 06:25 PM.
silverfox is offline   Reply With Quote