SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Unix one-line command for N50 size seqseq Bioinformatics 22 08-07-2014 04:02 AM
velvet N50 bioenvisage De novo discovery 12 07-19-2013 05:50 AM
Anyone can refer good and not too expensive exome sequencing facility? memento Core Facilities 12 02-24-2012 01:41 PM
N50 less than 2000 sarbashis Illumina/Solexa 4 09-07-2011 03:06 AM
SRMA Problem SAMRecord contig does not match the current reference sequence contig gavin.oliver Bioinformatics 5 07-05-2011 05:28 AM

Reply
 
Thread Tools
Old 10-08-2009, 12:13 AM   #21
edge
Senior Member
 
Location: China

Join Date: Sep 2009
Posts: 199
Default

Yup...
Seem like we facing the same problem
Maybe need ask and sharing the experience from other users
I will share with you if I find out the solution to match all the figure with the
*_info_contigstats.txt

Quote:
Originally Posted by sklages View Post
I do calculate N50 using *_info_contigstats.txt (which gives the same as results as if I use the contigs.fasta file).
This gives the same as the N50 calculated in *_info_assembly.txt (Section "All Contigs"!).

Btw, .. the padded fasta output contains the sequences with pads (if there are pads). The unpadded sequence has all pads been removed and is usually used for further analysis (but this depends on what you are doing).

cheers,
Sven
Quote:
Originally Posted by sklages View Post
Well, you are right. I just checked for this as well. N50 and "largest contig" are "correct" (in a sense how I calculated it), all other numbers differ slightly from what I was counting ...

No idea why ..
Sven
edge is offline   Reply With Quote
Old 10-08-2009, 03:32 AM   #22
BENM
Member
 
Location: PRC

Join Date: May 2009
Posts: 33
Default

Quote:
Originally Posted by edge View Post
Hi BENM,
If I got a long list of contents:
scaff_123 20
scaff_223 60
scaff_122 1000
scaff_125 15
scaff_23 30
scaff_13 26
scaff_230 50
scaff_153 500
scaff_173 200

Based on the column two,
Do you have any idea how to calculate the N50 and N90 from this long list of contents?
I need to do descending order of this long list of contents before I calculate the N50 and N90,right?
Thanks again for your help
hi edge,

I have written a PERL script for stat. length and gc content of FASTA/FASTQ file, hope it ould help you.
Attached Files
File Type: pl calengc.pl (4.2 KB, 194 views)

Last edited by BENM; 12-15-2009 at 12:59 AM.
BENM is offline   Reply With Quote
Old 10-08-2009, 05:51 PM   #23
edge
Senior Member
 
Location: China

Join Date: Sep 2009
Posts: 199
Default

Hi BENM,

Thanks a lot for your script.
I just run it d.
It worked excellent and fast.
I and sklages facing the same problem of the output result of MIRA.
Still finding the solution now.
Thanks again for your explanation and script
edge is offline   Reply With Quote
Old 10-08-2009, 06:58 PM   #24
BENM
Member
 
Location: PRC

Join Date: May 2009
Posts: 33
Default

Quote:
Originally Posted by edge View Post
Hi BENM,

Thanks a lot for your script.
I just run it d.
It worked excellent and fast.
I and sklages facing the same problem of the output result of MIRA.
Still finding the solution now.
Thanks again for your explanation and script
hi Edge,

The "calengc.pl" has a bug in stat. gc content, it has been corrected.
BENM is offline   Reply With Quote
Old 10-08-2009, 07:04 PM   #25
edge
Senior Member
 
Location: China

Join Date: Sep 2009
Posts: 199
Default

Hi BENM,
Thanks for your remind. Can I know what you mean by "The "calengc.pl" has a bug in stat. gc content, it has been corrected. "?
You are a perl programmer expert?
Seem like the perl script you write can deal with data quite fast
edge is offline   Reply With Quote
Old 10-08-2009, 07:22 PM   #26
BENM
Member
 
Location: PRC

Join Date: May 2009
Posts: 33
Default

Hi Edge,

I don't think I am PERL programmer expert, just a junior learner.
It is the same as python or C/C++, or other program language, I think the most important is algorithm or thinking idea. Just PERL can be written easily.
I have another bug in length stat. Sorry for my slack.
BENM is offline   Reply With Quote
Old 10-08-2009, 07:25 PM   #27
edge
Senior Member
 
Location: China

Join Date: Sep 2009
Posts: 199
Default

I very appreciate for your script.
I also prefer perl too.
In between, I feel awk and sed sometimes also quite useful as well.
Can I know what is the bug or error that you have been written at gc content and length stat?
edge is offline   Reply With Quote
Old 07-13-2015, 03:13 PM   #28
milo0615
Member
 
Location: Walnut, California

Join Date: Dec 2012
Posts: 39
Default

Hi BENM,

Did you post the corrected 'calengc.pl' script?

Quote:
Originally Posted by BENM View Post
hi Edge,

The "calengc.pl" has a bug in stat. gc content, it has been corrected.
milo0615 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:46 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO