SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Pacific Biosciences



Similar Threads
Thread Thread Starter Forum Replies Last Post
viewing read map data from Subread/Rsubread akh22 Bioinformatics 5 04-29-2014 05:28 PM
The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote shi Bioinformatics 55 10-25-2013 10:51 PM
raw read data, mira assembler me91 Bioinformatics 1 08-05-2012 04:05 AM
raw sequence short read data sweet_dna_girl Bioinformatics 4 02-15-2012 11:42 PM
how to evaluate raw data oceanxie Bioinformatics 1 04-15-2011 02:52 AM

Reply
 
Thread Tools
Old 01-10-2018, 06:34 PM   #1
swatie
Junior Member
 
Location: Singapore

Join Date: Jan 2018
Posts: 1
Default How to evaluate the Polymerase read and subread statistics from the raw data?

Hi

How can I check before assembly that the PacBio data provided by service provider is good. What I have received from my service provider is the raw data and two types of statistics:
Polymerase read statistics
Subread statistics

Thanks

Last edited by swatie; 01-10-2018 at 06:36 PM. Reason: Want to add type of data
swatie is offline   Reply With Quote
Old 01-11-2018, 10:39 AM   #2
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 314
Default

The subread statistics are the important numbers for assembly. You should have enough coverage in the subread bases, and good subread mean / N50 readlingth.

The subread stats will depend on the library quality (how much long DNA was in the sample) as well as generally sequencing variables.
rhall is offline   Reply With Quote
Old 01-12-2018, 12:40 AM   #3
swatisinha
Junior Member
 
Location: Sinagpore

Join Date: Jan 2018
Posts: 1
Default

Hi, Thank you for the reply.
So the subread statistics are;
Subread bases Subreads Subread N50 Avg. subread length
sample1_cell1 1,420,065,796 163,879 12166 8665
sample1_cell2 1,314,563,766 150,564 12,399 8,730

so the average subread length/ N50 subread for cell1 is 8665/12166 (0.71)
and for cell2 is 8730/12,300 (0.70)
Are these good? What is the threshold to decide these values good or bad ? (average genome size of such strains is 36Mb)

In addition, I ran the RS_Subreads protocol from the SMRT portal from the raw data. The loading P1 values for cell is 77.45% (P0 is 11.06% and P2 is 11.49%) and for cell2 is 73.59% (P0 is 9.48% and P2 is 16.94%). In general, I guess a good P1 is between 30-40%, greater than 40% value of P1 gives a large number of shorter contigs, right ?

( The Chemistry used was P6-C4)
swatisinha is offline   Reply With Quote
Old 01-15-2018, 12:05 PM   #4
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 314
Default

The P1 is high, probably a sign of overloading. This results in low quality data, and limited subread lengths.
It's always worth assembling the data, even if the raw data isn't of the absolute highest quality. Simply run HGAP with 36Mb as the estimated genome size.
With the preassembly stats and assembly results it should be possible to estimate how much the data quality has effected the results.
rhall is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:46 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO