Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • PacBio assembly using SMRT portal

    Hi all,

    I am trying to assemble a bacterial genome with almost 6mb genome size using HGAP 3 from SMRT portal.
    I have 10 SMRT cells giving total 64X coverage.
    The data has been provided from outside our lab so i dont know the chemistry or sequencing process they used. I run HGAP 3 assembler with mainly default parameter, changing only the genome size and got 240 saffolds which are ver y high. please help me in reducing the scaffold numbers.

    Regards

    Manjari

  • #2
    Can you post the results from RS_subread protocol analysis? That would be useful to judge the quality of your data.

    Have you tried to do the assembly with just one or two of the SMRTcells that have the longest subreads?

    Comment


    • #3
      Hi GenoMax

      Thanks for the quick reply.
      I have attached the subreads protocol analysis. No i din't try assembly with the two or three cells.

      Manjari
      Attached Files

      Comment


      • #4
        Is that filter report from one cell or all 10 together (I hope the answer is one)?

        If the answer is one then I would say try your assembly with one, two and three (of the best SMRTcells, separate runs) with a 4kb seed (you will have to deselect "automatic minimum length seed calculation" setting).

        Comment


        • #5
          This report is from all 10 cells taken together. Is there any problem with the data?

          Comment


          • #6
            That is not a good amount of data from 10 SMRTcells.

            Have you run independent RS_Subread filtering on each SMRTcell? Can you identify ones that look better in terms of mean subread length/total reads? Perhaps you can try to select only those for the assembly.

            Dr. Hall from PacBio participates on this forum and he may have some suggestions later today.

            Comment


            • #7
              Sorry, I missed the new thread.
              A conservative estimate would be ~800x for a 6mb genome from 10 cells.
              Does the loading report look similar for all cells, can you post an example?

              Comment


              • #8
                Thanks R Hall for your respond. I have attached the loading report of 4 cells. they are more or less same.
                Attached Files

                Comment


                • #9
                  @manjari: It is probably apparent by now but these are poor runs. P1 loading should normally be in 35-50% range. Inserts in your libraries appear to be small so you should actually have got a lot more data.

                  Were these libraries made by size selection (e.g. blue pippin)? Did the sequence provider try doing a clean-up to remove contaminants to see if the yield can be increased? This is a good example where local PacBio FAS would be a good resource to consult.

                  Comment


                  • #10
                    @GenoMax: thanks for quick respond.
                    So, what should i do now. Can we go ahead with the denovo assembly or should we ask the data provider for details of the run and some more data????? I am lost.

                    Comment


                    • #11
                      You can try doing assemblies with the data you have (have you tried to vary any assembly parameters). If you are lucky perhaps one (or more) of the SMRTcells would have the needed critical long fragments that make a good assembly.

                      Since you appear to be doing this on amazon cloud you may be limited by external constraints (cost) as to what all you want to try. Wait to see if Dr. Hall has any specific recommendations for parameters to try.

                      Did you make the libraries or did the provider make them? Having a constructive discussion with them about ways to improve the yield may be useful. Making new libraries should also be on the table if you must have a "finished' (or close to) genome.
                      Last edited by GenoMax; 04-01-2015, 05:12 AM.

                      Comment


                      • #12
                        The assembly is a lost cause. Of the 4 cell reports that you posted 3 cells contain little to no sequencing of your sample, and are all control reads. Looking at the 'length between adapters histogram' these cells show a single peak corresponding to the 4kb sequencing control. A sheared library should have a continuous distribution of insert sizes. Cell 1 does have sequencing and not just the control, but not enough data to assemble (only about 1/10th of the expected sequencing yield).
                        I would not recommend sequencing more of the same library without some sample QC. At this point you should talk to whoever did the sequencing, and if possible get in contact with your PacBio FAS (Field Applications Scientist) to discuss sample QC and loading.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        10 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        9 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        50 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        67 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X