Unconfigured Ad

**sklages** · 10-06-2009, 10:14 PM

The N50 contig size is a weighted median value and defined as
the length of the smallest contig S in the sorted list of all
contigs where the cumulative length from the largest contig to
contig S is at least 50% of the total length.

cheers,
Sven

**edge** · 10-06-2009, 10:31 PM

Hi,

Thanks for your info.
Do you have any idea about N90?
That means the N50 contig size, I just choose and calculate the smallest contig S in the sorted list of all contigs?
Thanks again for your explanation

Originally posted by sklages View Post

The N50 contig size is a weighted median value and defined as
the length of the smallest contig S in the sorted list of all
contigs where the cumulative length from the largest contig to
contig S is at least 50% of the total length.

cheers,
Sven

**sklages** · 10-06-2009, 11:56 PM

Originally posted by edge View Post

Do you have any idea about N90?

I'd say,

The N90 contig size is a weighted median value and defined as
the length of the smallest contig S in the sorted list of all
contigs where the cumulative length from the largest contig to
contig S is at least 90% of the total length.

:-)

Sven

**edge** · 10-06-2009, 11:59 PM

Thanks for your suggestion

I found out that sometimes the maximum contig size will exact same with the N50 contig size.
Can I know what is the reason ?
Thanks ya. I'm still new with bioinformatics. Learning process now.thus facing more problem

Originally posted by sklages View Post

I'd say,

The N90 contig size is a weighted median value and defined as
the length of the smallest contig S in the sorted list of all
contigs where the cumulative length from the largest contig to
contig S is at least 90% of the total length.

:-)

Sven

**BENM** · 10-07-2009, 01:01 AM

N50 = length-weighted median.: The size of the smallest contig such that 50% of the length of the genome is contained in contigs of size N50 or greater.
N90 is 90%.
If you have done the assembly work, and you have got the contigs in FASTA format, it is easy to calculate the N50 & N90 contig size, for example:

Code:

perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if(($count>=$total/2)&&($half==0)){print "N50: $x[$j]\n";$half=$x[$j]}elsif($count>=$total*0.9){print "N90: $x[$j]\n";exit;}}'  contigs.fa

**edge** · 10-07-2009, 01:12 AM

Hi BENM,

I just try the code that you give it to me.
It can't work d.
Do I miss anything or the code got problem?
After I run the code,the output result is empty d

Thanks for your help ^^

Originally posted by BENM View Post

N50 = length-weighted median.: The size of the smallest contig such that 50% of the length of the genome is contained in contigs of size N50 or greater.
N90 is 90%.
If you have done the assembly work, and you have got the contigs in FASTA format, it is easy to calculate the N50 & N90 contig size, for example:

Code:

perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if($count>=$total/2){$half=$x[j];print "N50: $x[j]\n" if ($half==0);}elsif($count>=$total*0.9){print "N90: $x[j]\n";exit;}}'  contigs.fa

**BENM** · 10-07-2009, 01:37 AM

Originally posted by edge View Post

Hi BENM,

I just try the code that you give it to me.
It can't work d.
Do I miss anything or the code got problem?
After I run the code,the output result is empty d

Thanks for your help ^^

hi edge,

I am sorry for a little mistake, you can type the below code into a perl script:

Code:

#/usr/bin/perl -w
use strict;
my ($len,$total)=(0,0);
my @x;
while(<>){
	if(/^[\>\@]/){
		if($len>0){
			$total+=$len;
			push @x,$len;
		}
		$len=0;
	}
	else{
		s/\s//g;
		$len+=length($_);
	}
}
if ($len>0){
	$total+=$len;
	push @x,$len;
}
@x=sort{$b<=>$a} @x; 
my ($count,$half)=(0,0);
for (my $j=0;$j<@x;$j++){
	$count+=$x[$j];
	if (($count>=$total/2)&&($half==0)){
		print "N50: $x[$j]\n";
		$half=$x[$j]
	}elsif ($count>=$total*0.9){
		print "N90: $x[$j]\n";
		exit;
	}
}

or run this command as before:

Code:

perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if(($count>=$total/2)&&($half==0)){print "N50: $x[$j]\n";$half=$x[$j]}elsif($count>=$total*0.9){print "N90: $x[$j]\n";exit;}}' contigs.fa

**edge** · 10-07-2009, 02:01 AM

Thanks BENM,
It is worked nice now ^^
I very thanks for your help.

**edge** · 10-07-2009, 02:03 AM

Hi BENM,

Do you have used MIRA software before?
I facing some problem about how they calculate the N50 or N90 about their assembly output result

Originally posted by BENM View Post

hi edge,

I am sorry for a little mistake, you can type the below code into a perl script:

Code:

#/usr/bin/perl -w
use strict;
my ($len,$total)=(0,0);
my @x;
while(<>){
	if(/^[\>\@]/){
		if($len>0){
			$total+=$len;
			push @x,$len;
		}
		$len=0;
	}
	else{
		s/\s//g;
		$len+=length($_);
	}
}
if ($len>0){
	$total+=$len;
	push @x,$len;
}
@x=sort{$b<=>$a} @x; 
my ($count,$half)=(0,0);
for (my $j=0;$j<@x;$j++){
	$count+=$x[$j];
	if (($count>=$total/2)&&($half==0)){
		print "N50: $x[$j]\n";
		$half=$x[$j]
	}elsif ($count>=$total*0.9){
		print "N90: $x[$j]\n";
		exit;
	}
}

or run this command as before:

Code:

perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if(($count>=$total/2)&&($half==0)){print "N50: $x[$j]\n";$half=$x[$j]}elsif($count>=$total*0.9){print "N90: $x[$j]\n";exit;}}' contigs.fa

**BENM** · 10-07-2009, 02:31 AM

I am using this software, but not familiar. There are *_out.padded.fasta and *_out.unpadded.fasta in the ouput directory of "projectname_d_result". It defined contigs lenth >=500bp are large contigs. So in "projectname_d_info" directory, you can find the information in the file of *_info_assembly.txt.

**sklages** · 10-07-2009, 06:25 AM

*_out.unpadded.fasta should be your firend when calculating contig sizes.

As BENM mentioned there is a lot of info in the info_assembly.txt

Sven

**edge** · 10-07-2009, 04:32 PM

Hi,

Do you know what is the difference of usage of *_out.padded.fasta and *_out.unpadded.fasta?
As I know *_out.padded.fasta all are lower capital and *_out.unpadded.fasta all are upper capital. Both of them are the exactly same content.
According to *_info_assembly.txt, I try to calculate the figure inside like N50,N90,minimum contig size and maximum contig size,etc based on the *.contig file at "projectname_d_result".
Unfortunately, the figure I find out can't match with the *_info_assembly.txt

Thus I feel quite confusing about the way they calculated N50,N90,etc at *_info_assembly.txt

Originally posted by BENM View Post

I am using this software, but not familiar. There are *_out.padded.fasta and *_out.unpadded.fasta in the ouput directory of "projectname_d_result". It defined contigs lenth >=500bp are large contigs. So in "projectname_d_info" directory, you can find the information in the file of *_info_assembly.txt.

**edge** · 10-07-2009, 05:02 PM

Hi sklages,
Thanks for your suggestion.
I face some problems when try to find out the N50,N90,minimum contig size and maximum contig size,etc based on the *.contig file at "projectname_d_result".
The figure I find out can't match with the *_info_assembly.txt
Do you have any idea to calculate the N50,N90,minimum contig size and maximum contig size at *_info_assembly.txt ?

**edge** · 10-07-2009, 05:17 PM

Hi BENM,
If I got a long list of contents:
scaff_123 20
scaff_223 60
scaff_122 1000
scaff_125 15
scaff_23 30
scaff_13 26
scaff_230 50
scaff_153 500
scaff_173 200

Based on the column two,
Do you have any idea how to calculate the N50 and N90 from this long list of contents?
I need to do descending order of this long list of contents before I calculate the N50 and N90,right?
Thanks again for your help

Topics	Statistics	Last Post
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, Today, 08:59 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 Today, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 21 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 14 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM

Unconfigured Ad

N50 and N90 contig size refer to?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News