SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
PacBio Library Prep workshop and PacBio SMART-Portal bootcamp - UC Davis - April 2015 DNATECH Events / Conferences 1 04-02-2015 08:33 AM
RS_CeleraAssembler not included in SMRT portal v2.3 macb Pacific Biosciences 5 02-02-2015 07:37 AM
Diary: Assembly in SMRT Portal 2.1.1 with HGAP+CA 8.1 pag Pacific Biosciences 20 09-06-2014 09:20 AM
SMRT portal errors bsp017 Pacific Biosciences 3 05-26-2014 04:57 AM
imprting Raw reads into smrt Portal coldturkey Pacific Biosciences 38 12-04-2013 12:04 PM

Reply
 
Thread Tools
Old 03-30-2015, 03:03 AM   #1
manjari.deshmukh
Member
 
Location: India

Join Date: Mar 2015
Posts: 11
Default PacBio assembly using SMRT portal

Hi all,

I am trying to assemble a bacterial genome with almost 6mb genome size using HGAP 3 from SMRT portal.
I have 10 SMRT cells giving total 64X coverage.
The data has been provided from outside our lab so i dont know the chemistry or sequencing process they used. I run HGAP 3 assembler with mainly default parameter, changing only the genome size and got 240 saffolds which are ver y high. please help me in reducing the scaffold numbers.

Regards

Manjari
manjari.deshmukh is offline   Reply With Quote
Old 03-30-2015, 03:13 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,053
Default

Can you post the results from RS_subread protocol analysis? That would be useful to judge the quality of your data.

Have you tried to do the assembly with just one or two of the SMRTcells that have the longest subreads?
GenoMax is offline   Reply With Quote
Old 03-30-2015, 03:42 AM   #3
manjari.deshmukh
Member
 
Location: India

Join Date: Mar 2015
Posts: 11
Default

Hi GenoMax

Thanks for the quick reply.
I have attached the subreads protocol analysis. No i din't try assembly with the two or three cells.

Manjari
Attached Files
File Type: pdf subread_filter.pdf (61.6 KB, 29 views)
manjari.deshmukh is offline   Reply With Quote
Old 03-30-2015, 03:55 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,053
Default

Is that filter report from one cell or all 10 together (I hope the answer is one)?

If the answer is one then I would say try your assembly with one, two and three (of the best SMRTcells, separate runs) with a 4kb seed (you will have to deselect "automatic minimum length seed calculation" setting).
GenoMax is offline   Reply With Quote
Old 03-30-2015, 04:07 AM   #5
manjari.deshmukh
Member
 
Location: India

Join Date: Mar 2015
Posts: 11
Default

This report is from all 10 cells taken together. Is there any problem with the data?
manjari.deshmukh is offline   Reply With Quote
Old 03-30-2015, 04:09 AM   #6
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,053
Default

That is not a good amount of data from 10 SMRTcells.

Have you run independent RS_Subread filtering on each SMRTcell? Can you identify ones that look better in terms of mean subread length/total reads? Perhaps you can try to select only those for the assembly.

Dr. Hall from PacBio participates on this forum and he may have some suggestions later today.
GenoMax is offline   Reply With Quote
Old 03-30-2015, 09:27 AM   #7
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

Sorry, I missed the new thread.
A conservative estimate would be ~800x for a 6mb genome from 10 cells.
Does the loading report look similar for all cells, can you post an example?
rhall is offline   Reply With Quote
Old 04-01-2015, 02:56 AM   #8
manjari.deshmukh
Member
 
Location: India

Join Date: Mar 2015
Posts: 11
Default

Thanks R Hall for your respond. I have attached the loading report of 4 cells. they are more or less same.
Attached Files
File Type: pdf SMRT_job_report-cell1.pdf (227.0 KB, 24 views)
File Type: pdf SMRT_job_report-cell6.pdf (222.8 KB, 10 views)
File Type: pdf SMRT_job_report-cell8.pdf (228.1 KB, 8 views)
File Type: pdf SMRT_job_report-cell9.pdf (220.5 KB, 8 views)
manjari.deshmukh is offline   Reply With Quote
Old 04-01-2015, 04:04 AM   #9
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,053
Default

@manjari: It is probably apparent by now but these are poor runs. P1 loading should normally be in 35-50% range. Inserts in your libraries appear to be small so you should actually have got a lot more data.

Were these libraries made by size selection (e.g. blue pippin)? Did the sequence provider try doing a clean-up to remove contaminants to see if the yield can be increased? This is a good example where local PacBio FAS would be a good resource to consult.
GenoMax is offline   Reply With Quote
Old 04-01-2015, 04:22 AM   #10
manjari.deshmukh
Member
 
Location: India

Join Date: Mar 2015
Posts: 11
Default

@GenoMax: thanks for quick respond.
So, what should i do now. Can we go ahead with the denovo assembly or should we ask the data provider for details of the run and some more data????? I am lost.
manjari.deshmukh is offline   Reply With Quote
Old 04-01-2015, 05:05 AM   #11
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,053
Default

You can try doing assemblies with the data you have (have you tried to vary any assembly parameters). If you are lucky perhaps one (or more) of the SMRTcells would have the needed critical long fragments that make a good assembly.

Since you appear to be doing this on amazon cloud you may be limited by external constraints (cost) as to what all you want to try. Wait to see if Dr. Hall has any specific recommendations for parameters to try.

Did you make the libraries or did the provider make them? Having a constructive discussion with them about ways to improve the yield may be useful. Making new libraries should also be on the table if you must have a "finished' (or close to) genome.

Last edited by GenoMax; 04-01-2015 at 05:12 AM.
GenoMax is offline   Reply With Quote
Old 04-02-2015, 08:53 AM   #12
rhall
Senior Member
 
Location: San Francisco

Join Date: Aug 2012
Posts: 322
Default

The assembly is a lost cause. Of the 4 cell reports that you posted 3 cells contain little to no sequencing of your sample, and are all control reads. Looking at the 'length between adapters histogram' these cells show a single peak corresponding to the 4kb sequencing control. A sheared library should have a continuous distribution of insert sizes. Cell 1 does have sequencing and not just the control, but not enough data to assemble (only about 1/10th of the expected sequencing yield).
I would not recommend sequencing more of the same library without some sample QC. At this point you should talk to whoever did the sequencing, and if possible get in contact with your PacBio FAS (Field Applications Scientist) to discuss sample QC and loading.
rhall is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:57 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO