Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Should I try hybrid assembly with my PacBio data?

    Hi all,

    I recently had the genome of a bacterial strain I am working with sequenced using both PacBio and Illumina paired end.

    I have managed to assemble the Illumina data into ~200 contigs using Soap2. The PacBio data I got back came assembled into 22 contigs. Which I was a little disappointed with especially because other people in my lab have sequenced the same species but different strains and got their data back as one contig! The original idea was to map the Illumina to the PacBio to look for errors.

    But anyway, now I am not sure what to do with the data I have. The longest four contigs of the PacBio data cover ~97% of my estimated 4.5Mb genome size but all the other contigs do map to the same species when looking at the BLASR output, although some with low coverage. Now I'm not sure what is "real" and I don't want to underestimate the genome size.

    I have read that you can use Pacbio sequences to scaffold Illumina contigs so I am wondering if I should try that? But I can't really find any helpful tutorials/resources on how to do this. I'm not sure about which PacBio data I should use (I have the CCS.fastq, filtered subread fastq and longest subread fastq file). If I need to do anything to the data before using it? Which program to use? etc.

    Any help would be appreciated, even if its just a link to a good resource.

    Thanks in advance!

  • #2
    Rather than try a complex hybrid approach, which is unlikely to be any more successful than the 22 contig Pacbio assembly I would try to diagnose and optimize the Pacbio assembly. How do the preassembly statistics (yield, N50, number of bases) compare to the other assemblies in your lab? Was the subread N50, or the number of bases in the filtered data less than the assemblies that generated single contigs?
    With 22 contigs it is possible to run bridgemapper to order the contigs with the remaining Pacbio reads, overlapping the contigs using minimus2 and validating using resequencing. Think of it as manual finishing. I would then use the illumina reads to check the final base accuracy.
    You mentioned contigs with lower coverage, is it possible that the sample is not perfectly clonal, and you are seeing a minor population that is breaking the assembly?

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Yesterday, 08:47 AM
    0 responses
    16 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    60 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    54 views
    0 likes
    Last Post seqadmin  
    Working...
    X