SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
Nextera insert sizes larger than expected pjuneja Sample Prep / Library Generation 48 06-15-2016 09:07 AM
Trouble shooting a low quality RNA-seq run/modofications exo RNA Sequencing 2 04-10-2015 01:28 AM
some trouble with SOAPdenovo, my assembly and the BGI output TheSnake Bioinformatics 2 07-15-2013 01:35 AM
Generating larger insert sizes (>300bp) using TruSeq RNA protocol JChase Illumina/Solexa 4 05-16-2013 05:36 AM
assembly with unsatisfying results - use new reads with larger inserts? martin_313 Bioinformatics 4 01-23-2012 01:52 AM

Reply
 
Thread Tools
Old 12-08-2016, 09:48 AM   #1
nwfungi
Junior Member
 
Location: Fargo ND

Join Date: Nov 2015
Posts: 5
Question trouble shooting PB assembly generating a larger than expected contig

So I've got a newly assembled genome that was sequenced using PacBio sequencing and obtained >100x coverage. It was assembled with HGAP.3 and for the most part looks great except for an exceptionally large contig (the largest actually). it's estimated that our genomes largest chromosome is around 3.5Mb (based on electrokaryograph) but this one large contig is around 5Mb and has low complexity across the entire contig. Blasting this contig doesn't result in any matches of note at NCBI or to our reference genome.

Any suggestions on how I could adjust the assembly to get rid of this large contig that I am fairly sure is not real?
nwfungi is offline   Reply With Quote
Old 12-08-2016, 10:56 AM   #2
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Why don't you try gene-prediction on it to see what you get? Also, based on mapping, what kind of coverage does this contig have? Note, also, that it could be a bacterial symbiont, so you might want to try prokaryotic gene-calling. Bacteria don't usually have low complexity, though.
Brian Bushnell is offline   Reply With Quote
Old 12-09-2016, 10:35 AM   #3
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 312
Default

Looking at the coverage when remapping all the raw data would be the most telling, does it have 100x coverage, is the coverage even?
I'm actually really intrigued, I've done a lot of HGAP.3 assemblies, but have never seen a 'junk' contig get anywhere near that big. It's possible its just all the low complexity repeats getting overlapped together, but normally this would generate at max 10's of kb of sequence. You can also look at the overlap graph to see what the origin of the contig is https://gist.github.com/rhallPB/2d962e700d83270b0109 .
rhall is offline   Reply With Quote
Old 12-11-2016, 10:42 AM   #4
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 799
Default

Do you have Illumina reads that you can map to this contig in local mode (e.g. 'MagicBLAST' or 'Bowtie2 --local')? If not, you could try digitally fragmenting your PacBio reads into short reads and mapping.

Map only to the single contig. As rhall has said, you should get a somewhat even coverage across this contig. Any big jumps in coverage indicate something that needs further investigation. A shift from one coverage level to another might indicate a misassembly, while a blip of extremely high coverage suggests transposon sequence that may be interfering with assembly.
gringer is offline   Reply With Quote
Reply

Tags
assembly, pacbio, qcing, resequencing

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:41 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO