SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   N50 and N90 contig size refer to? (http://seqanswers.com/forums/showthread.php?t=2766)

edge 10-08-2009 12:13 AM

Yup...
Seem like we facing the same problem :(
Maybe need ask and sharing the experience from other users:)
I will share with you if I find out the solution to match all the figure with the
*_info_contigstats.txt :)

Quote:

Originally Posted by sklages (Post 9107)
I do calculate N50 using *_info_contigstats.txt (which gives the same as results as if I use the contigs.fasta file).
This gives the same as the N50 calculated in *_info_assembly.txt (Section "All Contigs"!).

Btw, .. the padded fasta output contains the sequences with pads (if there are pads). The unpadded sequence has all pads been removed and is usually used for further analysis (but this depends on what you are doing).

cheers,
Sven

Quote:

Originally Posted by sklages (Post 9110)
Well, you are right. I just checked for this as well. N50 and "largest contig" are "correct" (in a sense how I calculated it), all other numbers differ slightly from what I was counting ...

No idea why ..
Sven


BENM 10-08-2009 03:32 AM

1 Attachment(s)
Quote:

Originally Posted by edge (Post 9101)
Hi BENM,
If I got a long list of contents:
scaff_123 20
scaff_223 60
scaff_122 1000
scaff_125 15
scaff_23 30
scaff_13 26
scaff_230 50
scaff_153 500
scaff_173 200

Based on the column two,
Do you have any idea how to calculate the N50 and N90 from this long list of contents?
I need to do descending order of this long list of contents before I calculate the N50 and N90,right?
Thanks again for your help :)

hi edge,

I have written a PERL script for stat. length and gc content of FASTA/FASTQ file, hope it ould help you.

edge 10-08-2009 05:51 PM

Hi BENM,

Thanks a lot for your script.
I just run it d.
It worked excellent and fast.
I and sklages facing the same problem of the output result of MIRA.
Still finding the solution now.
Thanks again for your explanation and script :)

BENM 10-08-2009 06:58 PM

Quote:

Originally Posted by edge (Post 9142)
Hi BENM,

Thanks a lot for your script.
I just run it d.
It worked excellent and fast.
I and sklages facing the same problem of the output result of MIRA.
Still finding the solution now.
Thanks again for your explanation and script :)

hi Edge,

The "calengc.pl" has a bug in stat. gc content, it has been corrected.

edge 10-08-2009 07:04 PM

Hi BENM,
Thanks for your remind. Can I know what you mean by "The "calengc.pl" has a bug in stat. gc content, it has been corrected. "?
You are a perl programmer expert?
Seem like the perl script you write can deal with data quite fast :)

BENM 10-08-2009 07:22 PM

Hi Edge,

I don't think I am PERL programmer expert, just a junior learner.
It is the same as python or C/C++, or other program language, I think the most important is algorithm or thinking idea. Just PERL can be written easily.
I have another bug in length stat. Sorry for my slack.

edge 10-08-2009 07:25 PM

I very appreciate for your script.
I also prefer perl too.
In between, I feel awk and sed sometimes also quite useful as well.
Can I know what is the bug or error that you have been written at gc content and length stat?

milo0615 07-13-2015 03:13 PM

Hi BENM,

Did you post the corrected 'calengc.pl' script?

Quote:

Originally Posted by BENM (Post 9143)
hi Edge,

The "calengc.pl" has a bug in stat. gc content, it has been corrected.



All times are GMT -8. The time now is 09:21 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.