KB* 05-06-2019 05:35 PM

Pooling libs. for Novaseq. How many reads/coverage?

I need to pool 10 multiplexed libraries with different insert sizes (from 245 to 285 bp). Two of the libraries are inputs. The rest are IPs. We ordered 50 million reads for each library, 150 paired ends reads.

The core now is going to mix the libraries. The core asked about the concentration I want my libraries to have at. I first planned to have the libs at equal molar concentration, but now I doubt. We sequence samples of the whole human genome (inputs) and fractions of that genome (IPs). It looks like I have to have higher coverage for the whole genome (inputs). That human genome has multiple genetic aberrations.

Different resources recommend different coverage for ChIP:
100x [1] or 10-40X [2]. The recommended number of reads is 20-40 millions [3].

If I calculate coverage,
C=L_reads*#reads/HGenome_length, where
L_reads equal to 150*2=300bp;
# reads, number of reads, 50*10^6;
HGenome_length - length of human genome is ~3*10^9 bp;
C = (300*50 10^6)/(3 10^9) = 5 :confused:
Assuming I have only 50% of useful reads, it is just 2.5X...

It feels wrong as number of reads is huge.

I can remove 5 of the samples. In theory, it increases the coverage two times.

I can increase input concentration. How how much?

I am confused, please help.


