View Single Post
Old 02-19-2015, 11:30 AM   #12
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Sorry, I somehow missed your post!

1) This is kind of tricky. Typically, though, I would recommend normalizing data independently if it is different (such as different insert size) since it has a different use, and you don't want it all mixed together anyway. If it is the same type - for example, 2x150bp reads with short inserts - then I would normalize it all together regardless of whether it came from a different platform or library, because it will all be used the same way.

2) I recommend pre-processing (adapter trimming, contaminant removal, quality-trimming or filtering) prior to normalization, because those processes all remove spurious kmers that make it harder to determine read depth, and thus improve the normalization results.

3) If you target 40x coverage for a 1.2Gbp genome, BBNorm should output approximately 20*1.2Gbp of data. Normally it will go a little bit over to try to ensure everywhere has at least 40x.

4) Normalizing RNA-seq data can certainly be done prior to assembly. But if you have 2 isoforms of a gene - one that uses exons 1, 2, and 3, and one that only uses exons 1 and 3, and one of them is expressed 100x more highly than the other, then after normalization, the less-expressed isoform may not get assembled, only the more abundant one. So there are definite disadvantages. But, it's worth trying if you get a bad assembly from the raw data.
Brian Bushnell is offline   Reply With Quote