View Single Post
Old 04-19-2017, 04:01 PM   #4
confurious
Junior Member
 
Location: california

Join Date: Apr 2017
Posts: 9
Default

It's more like an environmental microbiome dataset collections. So no real expectations for genome size/ploidy level.

@Brian, I have found out that at this size, normalization (I use BBnorm) becomes so difficult that it would almost certainly exceed the allowed time amount for my university's cluster (7 days), because I could not finish the job even by down-sampling 10x of total reads. I suppose I could actually normalize each sample (because they were amplified individually), and pool them together and maybe try another round of normalization (in case any "duplications" happen inbetween samples. It seems to me that reads binning would basically achieve something very similar to normalization anyway (algorithmically I can't see it being more time and memory efficient)?

I also was not able to generate a kmer-depth graph when dealing with the multi-TB datasets directly, or perhaps you know of something much more efficient?

Thanks
confurious is offline   Reply With Quote