SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
Is it possible to evaluate genome size Using degradated DNA? huan De novo discovery 0 11-22-2017 01:45 AM
Sequel data output and productivity madonjoe Pacific Biosciences 2 06-26-2017 01:57 PM
variant calling from pacBio Sequel data splaisan Bioinformatics 0 03-25-2017 07:20 AM
Sequel System Data Release: Arabidopsis Dataset & Genome Assembly pacbio Pacific Biosciences 3 09-28-2016 02:52 PM
how to evaluate raw data oceanxie Bioinformatics 1 04-15-2011 02:52 AM

Reply
 
Thread Tools
Old 11-22-2017, 02:05 AM   #1
huan
Member
 
Location: China

Join Date: Oct 2010
Posts: 56
Default Is it possible to evaluate genome size with sequel data?

Now we are doing the denovo assembly of marine organism with whole genome sequcing using sequel system. As we all know, the DNA extraction from marine organism is very difficult because of pollution and degradation. So is there any way to evaluate the genome size, heterozygus rate or genome repeat with DNA sequel data?
__________________
happy
huan is offline   Reply With Quote
Old 11-22-2017, 05:04 AM   #2
Markiyan
Senior Member
 
Location: Cambridge

Join Date: Sep 2010
Posts: 115
Lightbulb Use multipass pacbio reads for self error correction and Kmer counting.

First try filtering out the multipass reads, and using those for kmer counting and self error correction.

Make sure to remove any mitochondrial/symbionts reads before doing the kmer counting. (Identify and complete the respective genome(s) first).

Get some good quality PCR-free illumina 2x250 reads or (BGIseq data if it works in your hands) and use it to confirm the kmer counting/self error correction/etc.

Short reads are very helpful for getting the contaminant(s)/symbionts genomes to a good draft stage and for filtering them out from the main dataset.
Usually such approach has to be done in the iterative fashion (with increasing amount of the input data after each iteration).
Markiyan is offline   Reply With Quote
Old 11-22-2017, 04:29 PM   #3
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 343
Default

Markiyan has alluded to it already; Pacbio data are not suitable for genome size estimates based on kmer analyses. The error rates of the uncorrected raw data are too high.
luc is offline   Reply With Quote
Old 11-27-2017, 09:55 AM   #4
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 318
Default

While a kmer analysis is going to be difficult with the raw pacbio data, it is possible to estimate the (effective) genome size from overlap statistics, either for the raw reads, the error corrected preassembled reads or by mapping the raw reads to the assembled contigs.
Run an initial assembly using a small seed read length, then plot the preassembled read overlap histogram.
http://pb-falcon.readthedocs.io/en/l...pread-overlaps

http://pb-falcon.readthedocs.io/en/l...GM2017_BFX.pdf
rhall is offline   Reply With Quote
Old 11-28-2017, 06:37 PM   #5
huan
Member
 
Location: China

Join Date: Oct 2010
Posts: 56
Default

I really appreciate for your help! I will have a try!
__________________
happy
huan is offline   Reply With Quote
Reply

Tags
denovo assembly, genome survey, sequel, smrt

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:09 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO