Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • edge
    Senior Member
    • Sep 2009
    • 199

    N50 and N90 contig size refer to?

    What is the general explanation of N50 and N90 contig size?
    Regarding the "Instructions for scaffolding MIRA 454 contigs & 25KB paired-end data with BAMBUS.
    Based on the MIRA Assembly Info/Bambus Scaffold info, can I know what is the N50 & N90 contig size refer to?
    How can I obtain this value and how to calculate the N50 & N90 contig size?
    Thanks a lot for all of your explanation and suggestion.
  • sklages
    Senior Member
    • May 2008
    • 628

    #2
    The N50 contig size is a weighted median value and defined as
    the length of the smallest contig S in the sorted list of all
    contigs where the cumulative length from the largest contig to
    contig S is at least 50% of the total length.

    cheers,
    Sven

    Comment

    • edge
      Senior Member
      • Sep 2009
      • 199

      #3
      Hi,

      Thanks for your info.
      Do you have any idea about N90?
      That means the N50 contig size, I just choose and calculate the smallest contig S in the sorted list of all contigs?
      Thanks again for your explanation

      Originally posted by sklages View Post
      The N50 contig size is a weighted median value and defined as
      the length of the smallest contig S in the sorted list of all
      contigs where the cumulative length from the largest contig to
      contig S is at least 50% of the total length.

      cheers,
      Sven

      Comment

      • sklages
        Senior Member
        • May 2008
        • 628

        #4
        Originally posted by edge View Post
        Do you have any idea about N90?
        I'd say,

        The N90 contig size is a weighted median value and defined as
        the length of the smallest contig S in the sorted list of all
        contigs where the cumulative length from the largest contig to
        contig S is at least 90% of the total length.

        :-)

        Sven

        Comment

        • edge
          Senior Member
          • Sep 2009
          • 199

          #5
          Thanks for your suggestion

          I found out that sometimes the maximum contig size will exact same with the N50 contig size.
          Can I know what is the reason ?
          Thanks ya. I'm still new with bioinformatics. Learning process now.thus facing more problem

          Originally posted by sklages View Post
          I'd say,

          The N90 contig size is a weighted median value and defined as
          the length of the smallest contig S in the sorted list of all
          contigs where the cumulative length from the largest contig to
          contig S is at least 90% of the total length.

          :-)

          Sven

          Comment

          • BENM
            Member
            • May 2009
            • 33

            #6
            N50 = length-weighted median.: The size of the smallest contig such that 50% of the length of the genome is contained in contigs of size N50 or greater.
            N90 is 90%.
            If you have done the assembly work, and you have got the contigs in FASTA format, it is easy to calculate the N50 & N90 contig size, for example:
            Code:
            perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if(($count>=$total/2)&&($half==0)){print "N50: $x[$j]\n";$half=$x[$j]}elsif($count>=$total*0.9){print "N90: $x[$j]\n";exit;}}'  contigs.fa
            Last edited by BENM; 10-07-2009, 01:38 AM.

            Comment

            • edge
              Senior Member
              • Sep 2009
              • 199

              #7
              Hi BENM,

              I just try the code that you give it to me.
              It can't work d.
              Do I miss anything or the code got problem?
              After I run the code,the output result is empty d
              Thanks for your help ^^
              Originally posted by BENM View Post
              N50 = length-weighted median.: The size of the smallest contig such that 50% of the length of the genome is contained in contigs of size N50 or greater.
              N90 is 90%.
              If you have done the assembly work, and you have got the contigs in FASTA format, it is easy to calculate the N50 & N90 contig size, for example:
              Code:
              perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if($count>=$total/2){$half=$x[j];print "N50: $x[j]\n" if ($half==0);}elsif($count>=$total*0.9){print "N90: $x[j]\n";exit;}}'  contigs.fa

              Comment

              • BENM
                Member
                • May 2009
                • 33

                #8
                Originally posted by edge View Post
                Hi BENM,

                I just try the code that you give it to me.
                It can't work d.
                Do I miss anything or the code got problem?
                After I run the code,the output result is empty d
                Thanks for your help ^^
                hi edge,

                I am sorry for a little mistake, you can type the below code into a perl script:
                Code:
                #/usr/bin/perl -w
                use strict;
                my ($len,$total)=(0,0);
                my @x;
                while(<>){
                	if(/^[\>\@]/){
                		if($len>0){
                			$total+=$len;
                			push @x,$len;
                		}
                		$len=0;
                	}
                	else{
                		s/\s//g;
                		$len+=length($_);
                	}
                }
                if ($len>0){
                	$total+=$len;
                	push @x,$len;
                }
                @x=sort{$b<=>$a} @x; 
                my ($count,$half)=(0,0);
                for (my $j=0;$j<@x;$j++){
                	$count+=$x[$j];
                	if (($count>=$total/2)&&($half==0)){
                		print "N50: $x[$j]\n";
                		$half=$x[$j]
                	}elsif ($count>=$total*0.9){
                		print "N90: $x[$j]\n";
                		exit;
                	}
                }
                or run this command as before:
                Code:
                perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if(($count>=$total/2)&&($half==0)){print "N50: $x[$j]\n";$half=$x[$j]}elsif($count>=$total*0.9){print "N90: $x[$j]\n";exit;}}' contigs.fa
                Last edited by BENM; 10-07-2009, 01:40 AM.

                Comment

                • edge
                  Senior Member
                  • Sep 2009
                  • 199

                  #9
                  Thanks BENM,
                  It is worked nice now ^^
                  I very thanks for your help.

                  Comment

                  • edge
                    Senior Member
                    • Sep 2009
                    • 199

                    #10
                    Hi BENM,

                    Do you have used MIRA software before?
                    I facing some problem about how they calculate the N50 or N90 about their assembly output result

                    Originally posted by BENM View Post
                    hi edge,

                    I am sorry for a little mistake, you can type the below code into a perl script:
                    Code:
                    #/usr/bin/perl -w
                    use strict;
                    my ($len,$total)=(0,0);
                    my @x;
                    while(<>){
                    	if(/^[\>\@]/){
                    		if($len>0){
                    			$total+=$len;
                    			push @x,$len;
                    		}
                    		$len=0;
                    	}
                    	else{
                    		s/\s//g;
                    		$len+=length($_);
                    	}
                    }
                    if ($len>0){
                    	$total+=$len;
                    	push @x,$len;
                    }
                    @x=sort{$b<=>$a} @x; 
                    my ($count,$half)=(0,0);
                    for (my $j=0;$j<@x;$j++){
                    	$count+=$x[$j];
                    	if (($count>=$total/2)&&($half==0)){
                    		print "N50: $x[$j]\n";
                    		$half=$x[$j]
                    	}elsif ($count>=$total*0.9){
                    		print "N90: $x[$j]\n";
                    		exit;
                    	}
                    }
                    or run this command as before:
                    Code:
                    perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if(($count>=$total/2)&&($half==0)){print "N50: $x[$j]\n";$half=$x[$j]}elsif($count>=$total*0.9){print "N90: $x[$j]\n";exit;}}' contigs.fa

                    Comment

                    • BENM
                      Member
                      • May 2009
                      • 33

                      #11
                      I am using this software, but not familiar. There are *_out.padded.fasta and *_out.unpadded.fasta in the ouput directory of "projectname_d_result". It defined contigs lenth >=500bp are large contigs. So in "projectname_d_info" directory, you can find the information in the file of *_info_assembly.txt.

                      Comment

                      • sklages
                        Senior Member
                        • May 2008
                        • 628

                        #12
                        *_out.unpadded.fasta should be your firend when calculating contig sizes.

                        As BENM mentioned there is a lot of info in the info_assembly.txt

                        Sven

                        Comment

                        • edge
                          Senior Member
                          • Sep 2009
                          • 199

                          #13
                          Hi,

                          Do you know what is the difference of usage of *_out.padded.fasta and *_out.unpadded.fasta?
                          As I know *_out.padded.fasta all are lower capital and *_out.unpadded.fasta all are upper capital. Both of them are the exactly same content.
                          According to *_info_assembly.txt, I try to calculate the figure inside like N50,N90,minimum contig size and maximum contig size,etc based on the *.contig file at "projectname_d_result".
                          Unfortunately, the figure I find out can't match with the *_info_assembly.txt
                          Thus I feel quite confusing about the way they calculated N50,N90,etc at *_info_assembly.txt

                          Originally posted by BENM View Post
                          I am using this software, but not familiar. There are *_out.padded.fasta and *_out.unpadded.fasta in the ouput directory of "projectname_d_result". It defined contigs lenth >=500bp are large contigs. So in "projectname_d_info" directory, you can find the information in the file of *_info_assembly.txt.

                          Comment

                          • edge
                            Senior Member
                            • Sep 2009
                            • 199

                            #14
                            Hi sklages,
                            Thanks for your suggestion.
                            I face some problems when try to find out the N50,N90,minimum contig size and maximum contig size,etc based on the *.contig file at "projectname_d_result".
                            The figure I find out can't match with the *_info_assembly.txt
                            Do you have any idea to calculate the N50,N90,minimum contig size and maximum contig size at *_info_assembly.txt ?

                            Comment

                            • edge
                              Senior Member
                              • Sep 2009
                              • 199

                              #15
                              Hi BENM,
                              If I got a long list of contents:
                              scaff_123 20
                              scaff_223 60
                              scaff_122 1000
                              scaff_125 15
                              scaff_23 30
                              scaff_13 26
                              scaff_230 50
                              scaff_153 500
                              scaff_173 200

                              Based on the column two,
                              Do you have any idea how to calculate the N50 and N90 from this long list of contents?
                              I need to do descending order of this long list of contents before I calculate the N50 and N90,right?
                              Thanks again for your help

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 08:59 AM
                              0 responses
                              7 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...