Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • pacbio sequence error correction

    Hi all,

    I have some pacbio long read data, about 10x coverage of a 120M genome. I already have the reference genome. However it is not complete and there are many gaps in it. What I am trying to do is to error correct my pacbio sequence and assemble the genome. Later on I will add more illumina data trying to close the gaps.

    My question about he error correction is: Can I use the incomplete reference genome to error correct my pacbio data? My plan is to convert the genome fasta into pacBioToCA required frg format. And then feed my pacbio data and the genome frg data to the correction pipeline to output error corrected data. My concern is : will pacBioToCA accept relatively long genome scalfold data as high identity sequence to correct my pacbio data?

    Suggestions and help is greatly appreciatedl

    Stuart

  • #2
    I am not able to figure out how I can use the incomplete reference genome for error correction. It looks like FastaToCA converts fastq file to frg file so that it can be used as high identity sequence for error correction. However, the incomplete genome assembly in fasta file. there is no quality score files can be found. How can I get around this?

    many thanks!

    Stuart

    Comment


    • #3
      Perhaps use the pbjelly pipeline to fill gaps? Also, with an appropriate pipeline (quiver: https://github.com/PacificBiosciences/GenomicConsensus) you may not need error correction to call accurate consensus.

      cheers,
      -mark

      Comment


      • #4
        Thanks for the tips! Mark. It looks like it will take me a while to figure this out. However, It sounds like interesting to me when you say I might not need to do error correction for pacbiodate since it it has 15% error rate.

        STuart

        Comment


        • #5
          Some more tips: if you want to use pacBioToCA, the approach would be to use the raw Illumina data as input to the correction step, not the draft assembly. The advantage of going back to the raw data is you may be able to correct assembly errors. The disadvantage is it takes longer to run.

          If you want to keep the assembly as is, you can install SMRT Analysis and use AHA (a hybrid assembler) to scaffold it, provided your the genome is less than about 200 MB. For larger genomes, or to really focus on the gap-filling, you can use pbjelly.

          Finally, the "no error correction" suggestion refers to the new algorithm HGAp: http://www.pacbiodevnet.com/hgap. You'll need more PacBio coverage to go that route. The benefit is you may be able to close more gaps and get a final result that's potentially as accurate as Sanger finishing.

          Comment


          • #6
            Thanks for your tips! jbingham. I am in the process of generating short illumina data for the error correction. I think I don't have enough coverage to try the new algorithm since my pacbio data only gives 3-4 times coverage when look into those data more carefully. The most majority of them are less than 500bp and 1000bp. Longest read is 13kb. I will post my process later.

            Thanks again to Winsettz and jbingham for helping out here!

            Stuart

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            7 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            7 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            66 views
            0 likes
            Last Post seqadmin  
            Working...
            X