I am planning a library construction approach for one lane of 100nt paired end Hiseq sequencing. It's a (non-model) population-level study so I'll need to multiplex. I'm hoping to get some confirmation (or not) that I can estimate sequence coverage as follows. My core (Hiseq 2000, v3 chemistry) told me they get 100-200m clusters per lane; I used 150gB for this estimate.
Hiseq output (gb) = (150m clusters * 200nt/cluster)/1g = 30 gb
transcriptome size (gb) = (25000genes * 1500nt)/1g = 0.0375 gb
Coverage estimate (for a library with 12 indexed sub-libraries--each sub-library is itself a pooled sample, btw):
(30gb/12)/0.0375gb = 66.67
So I can expect get ~67x coverage? The thing I'm uncertain about is whether I need to somehow consider the number and distribution of reads, or if this alone means I can expect on average 67x coverage at each locus. Thanks in advance for any input.
Hiseq output (gb) = (150m clusters * 200nt/cluster)/1g = 30 gb
transcriptome size (gb) = (25000genes * 1500nt)/1g = 0.0375 gb
Coverage estimate (for a library with 12 indexed sub-libraries--each sub-library is itself a pooled sample, btw):
(30gb/12)/0.0375gb = 66.67
So I can expect get ~67x coverage? The thing I'm uncertain about is whether I need to somehow consider the number and distribution of reads, or if this alone means I can expect on average 67x coverage at each locus. Thanks in advance for any input.