Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best way to perform assembly with two sets of raw data

    Best way to merge data from two separate sequence runs
    Hi,

    We have performed 2 sequencing runs of a bacterial organism with a genome about 4.4Mb. One was performed a few years ago by BGI and we again sequenced the same organism a few weeks ago. I am trying to determine the best way to go about the assembly.


    1) Merge the 4 fastq files and perform the assembly as normal.

    2) Map the reads of the second assembly to a fasta file of the first assembly.

    3) Align the two individually assembled genomes.

    4) Are there any other methods I haven't thought of.

    Just a side note - from looking at the raw data - I think the first assembly has less contamination then the second run (these organisms are prone to contamination with heterotrophic bacteria.

  • #2
    If the first data set was done a few years ago you would want to check what format the quality values are in. They may be in "illumina" format and would need to be converted to "sanger" quality if you are going to do any combined analysis. (Ref: http://en.wikipedia.org/wiki/FASTQ_format#Encoding)

    Depending on the amount of data available for each of those runs (and time you can spend) you could do all mentioned options in parallel. With a bacterial genome it should not be very time consuming affair.

    Comment


    • #3
      Depending on what type of data each is in terms of SE/PE or 50bp/100bp, etc, it maybe worth it to just completely ignore the old data. Though you don’t mention it, I’d bet you have absurd coverage, and it is possible to “over assemble” a genome with ridiculous coverage.

      Like GenoMax said, you might as well do all of them with a bacterial genome, but without more specifics its hard to recommend which route is likely to be better.

      Comment


      • #4
        Hi All,

        The I have assembled both sets of data individually and it seems the data for the second run is not as good as there is some contamination. So I have a further question.

        If I was going to use my first assemble as a reference, how do I map the reads of the second run to this?

        Also - if I use my first run as a reference, can the contigs be lengthed using the read reads or will my contigs be limited in size to the reference and can no longer be expanded, even with new reads.

        Cheers

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        32 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X