Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to construct chromosomes from scaffolds of a subspecies

    Hi I have a genome of a wild plant subspecies in the form of about 200K scaffolds of various sizes from a few thousand bp to over 50kb. I am trying to assemble these into chromosomes using the the chromosome sequences of the nearest domesticated relative.

    I am using nucmer from the mummer package (http://mummer.sourceforge.net/manual/) and trying to get the settings correct. I plan to use the tilings from nucmer -> show-tiling to construct the chromosome likely using biopython.

    My questions are:
    1) which settings for '-c' [min cluster] and '-l' [min match]? I tried default (-c 65 and -l 20) and this took over 90 hours before i tried different settings.

    I have tried (150 50) (500 100) and (1000 50). There are still a lot of gaps in the alignment and unused scaffold sequences. I understand there are gaps and insertions, but there are places with many thousands of bp missing (over 100k in some places).

    Is it safe (meaningful) to concatenate all scaffolds in order of the tiling from nucmer? Reverse complementing when needed of course.

    2) Are there any other programs designed to construct chromosomes from scaffolds given a reference? This seems like a routine/common task but I have not found much information on this specific problem.

    3) After constructing the new chromosome what is the best way to call SNPs?

    Thanks!

  • #2
    Went with lastz

    I went with lastz from this paper.

    The command I used was:
    Code:
    lastz Chr01.fasta scaffolds.fasta M=254 K=4500 L=3000 Y=15000 C=2 T=2 --format=axt > chr1.axt
    I'm still looking for a good way to call SNPs though.

    Comment


    • #3
      Originally posted by paulbible View Post
      2) Are there any other programs designed to construct chromosomes from scaffolds given a reference? This seems like a routine/common task but I have not found much information on this specific problem.
      Another possible method is to use a tool designed to scaffold contigs by adding a long-mate-pair library. You don't have a LMP library but you can fake one by generating synthetic long-mate-pair reads from the reference. That won't give you full-chromosome-length sequences, but it may substantially improve your continuity. Of course, either method will cause misassemblies if there are major structural differences with the reference.

      As for calling SNPs... not sure what the problem is. If you want to see how your organisms differ from the reference, you should map the reads to the reference; you don't need to assemble it. It won't have any SNPs with respect to itself, other than heterozygous ones, which you would still find with respect to the reference.

      Comment


      • #4
        Thanks for your input. I considered making a lot of synthetic reads and using another tool, but I didn't think that would really get me much closer to a full chromosome sequence. I am basically trying to assemble and annotate a new genome that is a close relative to a known genome. I need to 1) assemble the chromosomes 2) map the ESTs to the chromsomes to determine gene location, and 3) try to determine SNPs if I can. I suppose I can extract this manually by parsing the alignments looking for only small differences is a larger alginment from lastz (or can I?).

        Again I am not working with NGS data (yet). I know there are a lot of tools out there that do similar tasks for NGS data. I figured it would be better to use tools suited to my input data than to artificially try to tailor the data for other tools.

        Comment


        • #5
          Welcome all

          Hi guysWelcome you allThanks for your informative post and it's very useful to usI get some information about construct chromosomes from scaffolds of a subspeciesThanks Advace
          Hadoop

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          22 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          47 views
          0 likes
          Last Post seqadmin  
          Working...
          X