Seqanswers Leaderboard Ad

**Brian Bushnell** · 07-28-2017, 11:10 AM

With low-coverage long read data, you can also correct the long reads with the Illumina reads.

It's unlikely that you are completely missing coverage of 400kbp of your genome. Rather, Illumina reads are too short to resolve many types of repeats, so they tend to get collapsed or broken into tiny contigs short enough that they were ignored for the purpose of statistics. It is unlikely that additional short-read coverage would help you (though in order to best determine that, you'd need to post the coverage distribution of the assembly as a result of mapping).

Currently, we use either exclusively Illumina or exclusively PacBio for microbe assemblies so I don't really know much about the current best state of hybrid assemblies, but assembling a bacteria into 1 perfect contig with pure PacBio is pretty easy. That said, 9.2 Mbp is huge so maybe it would take ~4 Smrt cells for a pure PacBio assembly...

P.S. You can often improve a Spades assembly by preprocessing the Illumina data in various ways (error-correction, read merging, read extension, duplicate removal, quality-filtering, etc), which is certainly the cheapest approach. Though it won't give you a single-contig assembly.

**JodyFranke** · 07-28-2017, 01:13 PM

Thanks!

Our output from A5 says we have 460 scaffolds with a median coverage of 38X. The 10th percentile coverage is 20X. Our Spades runs have given a median coverage of 15X when we open files in Bandage. Perhaps we need to do more preprocessing with Spades to get the outputs more consistent between programs. I’m not sure this is the info you asked for about the coverage distribution as a result of mapping.

Unfortuantely, I do not have access to a PacBio system, but there is someone in the department who has done MinION and could help with that. I agree about the hybrid assembly. I have tried to look for a program to do this, but haven’t seen anyone really recommend anything. As I’ve been doing Illumina, and more of the exact same doesn’t sound like it will help, then perhaps a Mate Pair Library would complement what I already have.

**Brian Bushnell** · 07-28-2017, 02:55 PM

If you only have 15X coverage, more coverage would definitely help. If you have 38X... maybe. But coverage estimates from alignment are generally more trustworthy than what assemblers report. E.g.:

Code:

bbmap.sh in=reads.fq ref=assembly.fa covhist=covhist.txt covstats=covstats.txt ambig=all delcov=f

...then you can plot the histogram in Excel and see how much low-coverage area you have (that assembled).

Long mate libraries are also useful in improving continuity, but can be more expensive and complicated to make. I'm not sure about the details; I've only heard that anecdotally (as in, that's the reason we moved away from long-mate libraries).

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Question about next step in a de novo bacterial genome assembly

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News