View Single Post
Old 12-18-2016, 05:04 PM   #1
TomHarrop
Member
 
Location: New Zealand

Join Date: Jul 2014
Posts: 20
Default No peak in BBNorm kmer-frequency histogram

Hi,

I'm working on de novo assembly of an insect genome. Our paired-end libraries were made from 10 ng of sheared DNA using a Rubicon ThruPLEX kit with 9 cycles of PCR. The bioanalyzer trace shows a mean insert size of 476 bp, and we sequenced around 120 million 125 base read-pairs from this library. We're expecting a genome size of around 500 Mbp (but that's a pretty rough guess).

I'm having trouble getting contiguous assemblies. Among others, I've tried edena (L50 = 287 bp as reported by BBTools stats.sh) and velvet (L50 = 482 bp). I'm new to de novo assembly but 9 cycles of PCR sounds like a lot to me, and I'm wondering if the library complexity is too low. Also, fastqc reports a GC content of around 30 % which I know can exacerbate PCR bias.

To troubleshoot, I'm looking at the before and after kmer-frequency histograms generated by BBNorm during normalisation (below). I can't see a peak in either histogram, but I'm not sure what that means. Can anyone help me interpret these plots or suggest further troubleshooting steps?

In case it's relevant, the processing I did before assembly is: quality trimming (Q < 30 at 3 end) and adaptor trimming (TruSeq indexed adaptor and TruSeq universal adaptor) with cutadapt; contaminant filtering using PhiX, sequencing_artifacts and adapters_no_transposase references (BBDuk); normalisation and error correction to target k-mer coverage of 57 (BBNorm).

Please let me know if any more information would help.

Thanks for reading,

Tom
Attached Images
File Type: png khist.png (25.8 KB, 20 views)
Attached Files
File Type: pdf khist.pdf (134.0 KB, 22 views)
TomHarrop is offline   Reply With Quote