Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Selecting N50 and L50: implication for genome

    Hi Members,

    I'm stuck with N50 and L50.
    Have paired end genome. E. coli isolates. Assembled with SPAdes, and quality using Quast.

    My understanding is based on information spread across different resources, and blogs. For I'm using Quast, I read its manual.


    Based on different sources, blogs, and posts of N50, and L50 I'm looking for a base line or a benchmark to assess genome quality.

    My question:
    If I'm to decide about quality of my genome based on L50 and N50, how would I be doing it?

    1) Minimum L50, Or,
    2) Maximum N50, Or,
    3) Minimum N50, Or,
    4) Maximum L50

    And, most importantly: Why?

    Based on reply by user: Jeremy Leipzig at



    What he suggests I fail to understand.

    If I were to say, then minimum number of Contigs, that is, minimum L50.
    Or Maximum N50 (the length ).

    Of course N50, and L50 cannot be considered as gold standard to decide of genome quality.

    Any help would be greatly appreciated.
    Last edited by bio_informatics; 02-12-2016, 12:36 PM.
    Bioinformaticscally calm

  • #2
    What Jeremy Leipzig at Biostars says:

    you want L50 to be large and N50 to be small, so you wouldn't need many contigs to cover most of the assembly
    Could be confusing. Unfortunately there is some disagreement in the field as to what the 'L' and 'N' mean. I think that most people take 'N' to be the 'length' and 'L' to be the number but there are some people (e.g., Brian of BBMap fame) who use 'N' to be the number of contigs and 'L' to be the weighted median length. Brian's use of 'L'ength and 'N'umber certainly makes sense from an English usage standpoint but does go contrary to the long standing use of 'N' for length.

    So, going back to the quote, it could be that Jeremy was using L and N in the Length and Number sense.


    Going back to your original question and using N as Length, given the choice of a single statistic to optimize I would go with a maximum N50 over a minimum L50. But this single statistic masks so many other considerations (e.g., did my assembly match expected genome size? percentage of unknown bases? and so on) that it is basically worthless by itself.

    Comment


    • #3
      Originally posted by westerman View Post
      What Jeremy Leipzig at Biostars says:



      Could be confusing. Unfortunately there is some disagreement in the field as to what the 'L' and 'N' mean. I think that most people take 'N' to be the 'length' and 'L' to be the number but there are some people (e.g., Brian of BBMap fame) who use 'N' to be the number of contigs and 'L' to be the weighted median length. Brian's use of 'L'ength and 'N'umber certainly makes sense from an English usage standpoint but does go contrary to the long standing use of 'N' for length.

      So, going back to the quote, it could be that Jeremy was using L and N in the Length and Number sense.


      Going back to your original question and using N as Length, given the choice of a single statistic to optimize I would go with a maximum N50 over a minimum L50. But this single statistic masks so many other considerations (e.g., did my assembly match expected genome size? percentage of unknown bases? and so on) that it is basically worthless by itself.
      Hi westerman,
      Thanks much for your reply.
      I see, it makes sense what Jeremy suggested, for the prevalent confusion of 'L' and 'N'.

      I agree merely N, L 50s would be worthless if we overlook other parameters if genome length, number of Ns, etc.
      I've already filtered poor assembled genomes based on genome size. Wanted to zoom in more if I could for another task

      Thank you for your reply.
      Bioinformaticscally calm

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      22 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      50 views
      0 likes
      Last Post seqadmin  
      Working...
      X