Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences

Similar Threads
Thread Thread Starter Forum Replies Last Post
Nextera insert sizes larger than expected pjuneja Sample Prep / Library Generation 48 06-15-2016 08:07 AM
Trouble shooting a low quality RNA-seq run/modofications exo RNA Sequencing 2 04-10-2015 12:28 AM
some trouble with SOAPdenovo, my assembly and the BGI output TheSnake Bioinformatics 2 07-15-2013 12:35 AM
Generating larger insert sizes (>300bp) using TruSeq RNA protocol JChase Illumina/Solexa 4 05-16-2013 04:36 AM
assembly with unsatisfying results - use new reads with larger inserts? martin_313 Bioinformatics 4 01-23-2012 12:52 AM

Thread Tools
Old 12-08-2016, 08:48 AM   #1
Junior Member
Location: Fargo ND

Join Date: Nov 2015
Posts: 5
Question trouble shooting PB assembly generating a larger than expected contig

So I've got a newly assembled genome that was sequenced using PacBio sequencing and obtained >100x coverage. It was assembled with HGAP.3 and for the most part looks great except for an exceptionally large contig (the largest actually). it's estimated that our genomes largest chromosome is around 3.5Mb (based on electrokaryograph) but this one large contig is around 5Mb and has low complexity across the entire contig. Blasting this contig doesn't result in any matches of note at NCBI or to our reference genome.

Any suggestions on how I could adjust the assembly to get rid of this large contig that I am fairly sure is not real?
nwfungi is offline   Reply With Quote
Old 12-08-2016, 09:56 AM   #2
Brian Bushnell
Super Moderator
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,707

Why don't you try gene-prediction on it to see what you get? Also, based on mapping, what kind of coverage does this contig have? Note, also, that it could be a bacterial symbiont, so you might want to try prokaryotic gene-calling. Bacteria don't usually have low complexity, though.
Brian Bushnell is offline   Reply With Quote
Old 12-09-2016, 09:35 AM   #3
Senior Member
Location: San Francisco

Join Date: Aug 2012
Posts: 319

Looking at the coverage when remapping all the raw data would be the most telling, does it have 100x coverage, is the coverage even?
I'm actually really intrigued, I've done a lot of HGAP.3 assemblies, but have never seen a 'junk' contig get anywhere near that big. It's possible its just all the low complexity repeats getting overlapped together, but normally this would generate at max 10's of kb of sequence. You can also look at the overlap graph to see what the origin of the contig is .
rhall is offline   Reply With Quote
Old 12-11-2016, 09:42 AM   #4
David Eccles (gringer)
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 833

Do you have Illumina reads that you can map to this contig in local mode (e.g. 'MagicBLAST' or 'Bowtie2 --local')? If not, you could try digitally fragmenting your PacBio reads into short reads and mapping.

Map only to the single contig. As rhall has said, you should get a somewhat even coverage across this contig. Any big jumps in coverage indicate something that needs further investigation. A shift from one coverage level to another might indicate a misassembly, while a blip of extremely high coverage suggests transposon sequence that may be interfering with assembly.
gringer is offline   Reply With Quote

assembly, pacbio, qcing, resequencing

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 10:47 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO