Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • edge
    Senior Member
    • Sep 2009
    • 199

    N50 and N90 contig size refer to?

    What is the general explanation of N50 and N90 contig size?
    Regarding the "Instructions for scaffolding MIRA 454 contigs & 25KB paired-end data with BAMBUS.
    Based on the MIRA Assembly Info/Bambus Scaffold info, can I know what is the N50 & N90 contig size refer to?
    How can I obtain this value and how to calculate the N50 & N90 contig size?
    Thanks a lot for all of your explanation and suggestion.
  • sklages
    Senior Member
    • May 2008
    • 628

    #2
    The N50 contig size is a weighted median value and defined as
    the length of the smallest contig S in the sorted list of all
    contigs where the cumulative length from the largest contig to
    contig S is at least 50% of the total length.

    cheers,
    Sven

    Comment

    • edge
      Senior Member
      • Sep 2009
      • 199

      #3
      Hi,

      Thanks for your info.
      Do you have any idea about N90?
      That means the N50 contig size, I just choose and calculate the smallest contig S in the sorted list of all contigs?
      Thanks again for your explanation

      Originally posted by sklages View Post
      The N50 contig size is a weighted median value and defined as
      the length of the smallest contig S in the sorted list of all
      contigs where the cumulative length from the largest contig to
      contig S is at least 50% of the total length.

      cheers,
      Sven

      Comment

      • sklages
        Senior Member
        • May 2008
        • 628

        #4
        Originally posted by edge View Post
        Do you have any idea about N90?
        I'd say,

        The N90 contig size is a weighted median value and defined as
        the length of the smallest contig S in the sorted list of all
        contigs where the cumulative length from the largest contig to
        contig S is at least 90% of the total length.

        :-)

        Sven

        Comment

        • edge
          Senior Member
          • Sep 2009
          • 199

          #5
          Thanks for your suggestion

          I found out that sometimes the maximum contig size will exact same with the N50 contig size.
          Can I know what is the reason ?
          Thanks ya. I'm still new with bioinformatics. Learning process now.thus facing more problem

          Originally posted by sklages View Post
          I'd say,

          The N90 contig size is a weighted median value and defined as
          the length of the smallest contig S in the sorted list of all
          contigs where the cumulative length from the largest contig to
          contig S is at least 90% of the total length.

          :-)

          Sven

          Comment

          • BENM
            Member
            • May 2009
            • 33

            #6
            N50 = length-weighted median.: The size of the smallest contig such that 50% of the length of the genome is contained in contigs of size N50 or greater.
            N90 is 90%.
            If you have done the assembly work, and you have got the contigs in FASTA format, it is easy to calculate the N50 & N90 contig size, for example:
            Code:
            perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if(($count>=$total/2)&&($half==0)){print "N50: $x[$j]\n";$half=$x[$j]}elsif($count>=$total*0.9){print "N90: $x[$j]\n";exit;}}'  contigs.fa
            Last edited by BENM; 10-07-2009, 01:38 AM.

            Comment

            • edge
              Senior Member
              • Sep 2009
              • 199

              #7
              Hi BENM,

              I just try the code that you give it to me.
              It can't work d.
              Do I miss anything or the code got problem?
              After I run the code,the output result is empty d
              Thanks for your help ^^
              Originally posted by BENM View Post
              N50 = length-weighted median.: The size of the smallest contig such that 50% of the length of the genome is contained in contigs of size N50 or greater.
              N90 is 90%.
              If you have done the assembly work, and you have got the contigs in FASTA format, it is easy to calculate the N50 & N90 contig size, for example:
              Code:
              perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if($count>=$total/2){$half=$x[j];print "N50: $x[j]\n" if ($half==0);}elsif($count>=$total*0.9){print "N90: $x[j]\n";exit;}}'  contigs.fa

              Comment

              • BENM
                Member
                • May 2009
                • 33

                #8
                Originally posted by edge View Post
                Hi BENM,

                I just try the code that you give it to me.
                It can't work d.
                Do I miss anything or the code got problem?
                After I run the code,the output result is empty d
                Thanks for your help ^^
                hi edge,

                I am sorry for a little mistake, you can type the below code into a perl script:
                Code:
                #/usr/bin/perl -w
                use strict;
                my ($len,$total)=(0,0);
                my @x;
                while(<>){
                	if(/^[\>\@]/){
                		if($len>0){
                			$total+=$len;
                			push @x,$len;
                		}
                		$len=0;
                	}
                	else{
                		s/\s//g;
                		$len+=length($_);
                	}
                }
                if ($len>0){
                	$total+=$len;
                	push @x,$len;
                }
                @x=sort{$b<=>$a} @x; 
                my ($count,$half)=(0,0);
                for (my $j=0;$j<@x;$j++){
                	$count+=$x[$j];
                	if (($count>=$total/2)&&($half==0)){
                		print "N50: $x[$j]\n";
                		$half=$x[$j]
                	}elsif ($count>=$total*0.9){
                		print "N90: $x[$j]\n";
                		exit;
                	}
                }
                or run this command as before:
                Code:
                perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if(($count>=$total/2)&&($half==0)){print "N50: $x[$j]\n";$half=$x[$j]}elsif($count>=$total*0.9){print "N90: $x[$j]\n";exit;}}' contigs.fa
                Last edited by BENM; 10-07-2009, 01:40 AM.

                Comment

                • edge
                  Senior Member
                  • Sep 2009
                  • 199

                  #9
                  Thanks BENM,
                  It is worked nice now ^^
                  I very thanks for your help.

                  Comment

                  • edge
                    Senior Member
                    • Sep 2009
                    • 199

                    #10
                    Hi BENM,

                    Do you have used MIRA software before?
                    I facing some problem about how they calculate the N50 or N90 about their assembly output result

                    Originally posted by BENM View Post
                    hi edge,

                    I am sorry for a little mistake, you can type the below code into a perl script:
                    Code:
                    #/usr/bin/perl -w
                    use strict;
                    my ($len,$total)=(0,0);
                    my @x;
                    while(<>){
                    	if(/^[\>\@]/){
                    		if($len>0){
                    			$total+=$len;
                    			push @x,$len;
                    		}
                    		$len=0;
                    	}
                    	else{
                    		s/\s//g;
                    		$len+=length($_);
                    	}
                    }
                    if ($len>0){
                    	$total+=$len;
                    	push @x,$len;
                    }
                    @x=sort{$b<=>$a} @x; 
                    my ($count,$half)=(0,0);
                    for (my $j=0;$j<@x;$j++){
                    	$count+=$x[$j];
                    	if (($count>=$total/2)&&($half==0)){
                    		print "N50: $x[$j]\n";
                    		$half=$x[$j]
                    	}elsif ($count>=$total*0.9){
                    		print "N90: $x[$j]\n";
                    		exit;
                    	}
                    }
                    or run this command as before:
                    Code:
                    perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if(($count>=$total/2)&&($half==0)){print "N50: $x[$j]\n";$half=$x[$j]}elsif($count>=$total*0.9){print "N90: $x[$j]\n";exit;}}' contigs.fa

                    Comment

                    • BENM
                      Member
                      • May 2009
                      • 33

                      #11
                      I am using this software, but not familiar. There are *_out.padded.fasta and *_out.unpadded.fasta in the ouput directory of "projectname_d_result". It defined contigs lenth >=500bp are large contigs. So in "projectname_d_info" directory, you can find the information in the file of *_info_assembly.txt.

                      Comment

                      • sklages
                        Senior Member
                        • May 2008
                        • 628

                        #12
                        *_out.unpadded.fasta should be your firend when calculating contig sizes.

                        As BENM mentioned there is a lot of info in the info_assembly.txt

                        Sven

                        Comment

                        • edge
                          Senior Member
                          • Sep 2009
                          • 199

                          #13
                          Hi,

                          Do you know what is the difference of usage of *_out.padded.fasta and *_out.unpadded.fasta?
                          As I know *_out.padded.fasta all are lower capital and *_out.unpadded.fasta all are upper capital. Both of them are the exactly same content.
                          According to *_info_assembly.txt, I try to calculate the figure inside like N50,N90,minimum contig size and maximum contig size,etc based on the *.contig file at "projectname_d_result".
                          Unfortunately, the figure I find out can't match with the *_info_assembly.txt
                          Thus I feel quite confusing about the way they calculated N50,N90,etc at *_info_assembly.txt

                          Originally posted by BENM View Post
                          I am using this software, but not familiar. There are *_out.padded.fasta and *_out.unpadded.fasta in the ouput directory of "projectname_d_result". It defined contigs lenth >=500bp are large contigs. So in "projectname_d_info" directory, you can find the information in the file of *_info_assembly.txt.

                          Comment

                          • edge
                            Senior Member
                            • Sep 2009
                            • 199

                            #14
                            Hi sklages,
                            Thanks for your suggestion.
                            I face some problems when try to find out the N50,N90,minimum contig size and maximum contig size,etc based on the *.contig file at "projectname_d_result".
                            The figure I find out can't match with the *_info_assembly.txt
                            Do you have any idea to calculate the N50,N90,minimum contig size and maximum contig size at *_info_assembly.txt ?

                            Comment

                            • edge
                              Senior Member
                              • Sep 2009
                              • 199

                              #15
                              Hi BENM,
                              If I got a long list of contents:
                              scaff_123 20
                              scaff_223 60
                              scaff_122 1000
                              scaff_125 15
                              scaff_23 30
                              scaff_13 26
                              scaff_230 50
                              scaff_153 500
                              scaff_173 200

                              Based on the column two,
                              Do you have any idea how to calculate the N50 and N90 from this long list of contents?
                              I need to do descending order of this long list of contents before I calculate the N50 and N90,right?
                              Thanks again for your help

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              10 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              22 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              28 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              22 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...