Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best way to perform assembly with two sets of raw data

    Best way to merge data from two separate sequence runs
    Hi,

    We have performed 2 sequencing runs of a bacterial organism with a genome about 4.4Mb. One was performed a few years ago by BGI and we again sequenced the same organism a few weeks ago. I am trying to determine the best way to go about the assembly.


    1) Merge the 4 fastq files and perform the assembly as normal.

    2) Map the reads of the second assembly to a fasta file of the first assembly.

    3) Align the two individually assembled genomes.

    4) Are there any other methods I haven't thought of.

    Just a side note - from looking at the raw data - I think the first assembly has less contamination then the second run (these organisms are prone to contamination with heterotrophic bacteria.

  • #2
    If the first data set was done a few years ago you would want to check what format the quality values are in. They may be in "illumina" format and would need to be converted to "sanger" quality if you are going to do any combined analysis. (Ref: http://en.wikipedia.org/wiki/FASTQ_format#Encoding)

    Depending on the amount of data available for each of those runs (and time you can spend) you could do all mentioned options in parallel. With a bacterial genome it should not be very time consuming affair.

    Comment


    • #3
      Depending on what type of data each is in terms of SE/PE or 50bp/100bp, etc, it maybe worth it to just completely ignore the old data. Though you don’t mention it, I’d bet you have absurd coverage, and it is possible to “over assemble” a genome with ridiculous coverage.

      Like GenoMax said, you might as well do all of them with a bacterial genome, but without more specifics its hard to recommend which route is likely to be better.

      Comment


      • #4
        Hi All,

        The I have assembled both sets of data individually and it seems the data for the second run is not as good as there is some contamination. So I have a further question.

        If I was going to use my first assemble as a reference, how do I map the reads of the second run to this?

        Also - if I use my first run as a reference, can the contigs be lengthed using the read reads or will my contigs be limited in size to the reference and can no longer be expanded, even with new reads.

        Cheers

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin


          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        45 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X