Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • drgoettel
    Junior Member
    • Jun 2009
    • 5

    GS FLX data analysis software manual

    Hello,

    could anybody explain with a little more detail the next overlap detection parameters available in the GS de novo assembler Application gsAssembler??

    Seed step – The number of bases between seed generation locations used in the exact k-mer matching part of the overlap detection
    Seed length – The number of bases used for each seed in the exact k-mer matching part of the overlap detection (i.e. the “k” value of the k-mer matching)

    Thankyou very much!!
  • dePhi
    Junior Member
    • Feb 2009
    • 5

    #2
    I'm not an expert on assembly, but i'll try to help.

    When doing a overlap analysis you want to know some parameters about how good your overlap is. Is it nice and uniform, or does it have parts only represented by 2 or 3 seeds and parts covered by 100 seeds. But that's only coverage, a bit to much of a simplification of the assembly quality. 30 time coverage with 500-mers is not the same as 30 time coverage with 8-mers. Which is where these 2 parameters come in.

    Seed step is the distance between the start of one overlapping segment with the next. Say you find sequence #1 (a 12-mer for example) starting at base number 1 and you find sequence #2 (also a 12-mer) starting at base number 6, then your seed step would be 5. The distribution of seed step gives you a idea of how uniformly that part of you assembly is represented by actual reads. Ideally you would have a new read start at each new base for the best alignment quality.

    Seed length is the k-mer length you are using. If your assembly would consist of uniform reads, all of the same length, your seed length wouldn't vary across your assembly. But reporting the seed length gives you an idea of the quality of the reads used in that part of your assembly. For instance, if part of your assembly is made up of seeds which are way smaller then a part of your assembly which is just as well covered but by seeds with a much greater length, you can say that the quality of your assembly is better at the site with larger seed length. That's because the quality of your reads is usually better in longer reads, or else they would have been trimmed.

    But the power of these parameters I think is in there combination. Having large seed steps is okay as long as your k-mer length is also large. If your k-mers are small you want small seed steps, or otherwise the total alignment quality is lower.

    I hope my rambling was useful.
    Cheers

    Comment

    • drgoettel
      Junior Member
      • Jun 2009
      • 5

      #3
      Very useful.
      Thankyou!

      Comment

      • dan
        wiki wiki
        • Jul 2008
        • 194

        #4
        Sorry, I'm still confused about these parameters.

        Here is some information I got by asking the same question to Dr. Michael Stiens, Manager Customer Support Genome Sequencing, Roche Diagnostics GmbH...

        Seed Step: It is the number of bases after which the next seed begins on the same read. Each seed is 16 bp in lenght (default) and the seed step is 12bp. So there is an overlap of 4bp between every seed on a read.

        One question would be... Does the seed step parameter define an upper or a lower limit? While I found dePhi's answer to be very interesting (I never thought about the different mapping qualities w.r.t. seed length before) I don't see how it relates to the parameters used. i.e. they talk about a "distribution of seed step" ... so is the parameter the upper limit of that distribution?

        Cheers,
        Homepage: Dan Bolser
        MetaBase the database of biological databases.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Pathogen Surveillance with Advanced Genomic Tools
          by seqadmin




          The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
          03-24-2025, 11:48 AM
        • seqadmin
          New Genomics Tools and Methods Shared at AGBT 2025
          by seqadmin


          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

          The Headliner
          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
          03-03-2025, 01:39 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 10:17 AM
        0 responses
        7 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-20-2025, 05:03 AM
        0 responses
        49 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-19-2025, 07:27 AM
        0 responses
        59 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-18-2025, 12:50 PM
        0 responses
        50 views
        0 reactions
        Last Post seqadmin  
        Working...