View Single Post
Old 01-17-2017, 02:46 PM   #51
Location: East Coast

Join Date: Jul 2016
Posts: 38

Originally Posted by Brian Bushnell View Post
I don't really know where the high mutation rate comes from in supposedly isolate libraries. Unfortunately, there's no mechanism in Tadpole to fix them - it always halts at a branch and breaks into two contigs, even if it is a single SNP. You can use the "branchmult1" and "branchmult2" flags to adjust this, though. Dropping them to "bm1=6 bm2=2" can often substantially increase continuity. "bm1=6" means it will continue rather than stopping at a branch where the allele ratio of the next base is at least 6:1 rather than at least 20:1 which is the default. With an allele ratio of 1:1 like you have it will never continue, though. Instead, you might try a scaffolding program and gap-filling program that will use paired-end reads to glue the two contigs back together. I have not written one or tested any, but I know several are available.

"rinse" and "shave" can, in some cases, improve continuity by remove very low-depth errors, but that's not the case here. "shave=f" will not do anything, though. "t" and "f" stand for "true" and "false". The default for "shave" is already "false". So, you can try enabling these with "rinse shave" or "rinse=t shave=t" or "rinse=true shave=true", which are all equivalent.

Anyway - believe me, I would also prefer in your case for Tadpole to assemble this as a single contig, but it generally won't do that in the case of such evenly-split alleles. Though maybe with "bm1=1.1 bm2=1.05" it would; I'm not really sure what would happen in that case.
Thank you for the input, Brian. I tried bm1=8 bm2=2 earlier, as per info available on JGI's website. It unfortunately still splits, though I haven't tried bm1=6 yet. Of course, this is an instance where we want the contigs to be merged, but, in other cases such lenient branching parameters wouldn't be ideal. For instance, if in the above example there were more 'SNPs' present on either side of this single mixed call. In that case, we may have a mixed reaction.

It's almost as if there need to be an "if" rule added that takes into account other nearby basecalls. One mixed basecall per, say 1000 bp, isn't so worrisome, but 1 per 10 certainly is. Perhaps this is what scaffolding does?

For now, I'm using CLC to de novo assemble. I'm on a trial license right PI will have to eat the bill if this ends up being the best option. I have a feeling it's gonna cost a pretty penny x_x.

JVGen is offline   Reply With Quote