Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GS FLX data analysis software manual

    Hello,

    could anybody explain with a little more detail the next overlap detection parameters available in the GS de novo assembler Application gsAssembler??

    Seed step – The number of bases between seed generation locations used in the exact k-mer matching part of the overlap detection
    Seed length – The number of bases used for each seed in the exact k-mer matching part of the overlap detection (i.e. the “k” value of the k-mer matching)

    Thankyou very much!!

  • #2
    I'm not an expert on assembly, but i'll try to help.

    When doing a overlap analysis you want to know some parameters about how good your overlap is. Is it nice and uniform, or does it have parts only represented by 2 or 3 seeds and parts covered by 100 seeds. But that's only coverage, a bit to much of a simplification of the assembly quality. 30 time coverage with 500-mers is not the same as 30 time coverage with 8-mers. Which is where these 2 parameters come in.

    Seed step is the distance between the start of one overlapping segment with the next. Say you find sequence #1 (a 12-mer for example) starting at base number 1 and you find sequence #2 (also a 12-mer) starting at base number 6, then your seed step would be 5. The distribution of seed step gives you a idea of how uniformly that part of you assembly is represented by actual reads. Ideally you would have a new read start at each new base for the best alignment quality.

    Seed length is the k-mer length you are using. If your assembly would consist of uniform reads, all of the same length, your seed length wouldn't vary across your assembly. But reporting the seed length gives you an idea of the quality of the reads used in that part of your assembly. For instance, if part of your assembly is made up of seeds which are way smaller then a part of your assembly which is just as well covered but by seeds with a much greater length, you can say that the quality of your assembly is better at the site with larger seed length. That's because the quality of your reads is usually better in longer reads, or else they would have been trimmed.

    But the power of these parameters I think is in there combination. Having large seed steps is okay as long as your k-mer length is also large. If your k-mers are small you want small seed steps, or otherwise the total alignment quality is lower.

    I hope my rambling was useful.
    Cheers

    Comment


    • #3
      Very useful.
      Thankyou!

      Comment


      • #4
        Sorry, I'm still confused about these parameters.

        Here is some information I got by asking the same question to Dr. Michael Stiens, Manager Customer Support Genome Sequencing, Roche Diagnostics GmbH...

        Seed Step: It is the number of bases after which the next seed begins on the same read. Each seed is 16 bp in lenght (default) and the seed step is 12bp. So there is an overlap of 4bp between every seed on a read.

        One question would be... Does the seed step parameter define an upper or a lower limit? While I found dePhi's answer to be very interesting (I never thought about the different mapping qualities w.r.t. seed length before) I don't see how it relates to the parameters used. i.e. they talk about a "distribution of seed step" ... so is the parameter the upper limit of that distribution?

        Cheers,
        Homepage: Dan Bolser
        MetaBase the database of biological databases.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advancing Precision Medicine for Rare Diseases in Children
          by seqadmin




          Many organizations study rare diseases, but few have a mission as impactful as Rady Children’s Institute for Genomic Medicine (RCIGM). “We are all about changing outcomes for children,” explained Dr. Stephen Kingsmore, President and CEO of the group. The institute’s initial goal was to provide rapid diagnoses for critically ill children and shorten their diagnostic odyssey, a term used to describe the long and arduous process it takes patients to obtain an accurate...
          12-16-2024, 07:57 AM
        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin



          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has seen remarkable advancements,...
          12-02-2024, 01:49 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-17-2024, 10:28 AM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-13-2024, 08:24 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-12-2024, 07:41 AM
        0 responses
        28 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-11-2024, 07:45 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Working...
        X