View Single Post
Old 04-25-2017, 04:24 PM   #17
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707
Default

Hi Tom,

perfect_prob is the average probability of a read being error-free within that interval. It's related to the avg_quality, but calculated independently. Possibly, it would make more sense for me to do this just for the kmer being used to track uniqueness rather than the whole read, but it's easiest this way. The reason I provide it is because low-quality regions in the fastq file will show inflated uniqueness, when uniqueness is tracked using this method.

It looks like you're down to about 70% uniqueness for each individual read, which would be at least ~100x coverage for 150-bp reads... that coverage estimate is weighted by the high-coverage genomes, though.

It's hard to say whether or not to sequence more based on this plot alone. You're obviously still generating more unique reads, but they might simply be giving more coverage to areas you can already assemble well. I think the best course is to assemble and see if you end up with a lot of short, low-coverage contigs (in addition to the high-coverage contigs that you will clearly generate)... in which case you do need to sequence more
Brian Bushnell is offline   Reply With Quote