SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
no cuffcompare .stats with more than 6 samples sudders Bioinformatics 5 05-07-2014 02:29 AM
Velvet assembly stats NGS_New_User Bioinformatics 1 08-02-2013 02:58 PM
plinkseq v stats pepsimax Bioinformatics 0 11-28-2012 07:23 AM
stats from sam/bam chrishah Bioinformatics 7 10-23-2012 10:40 AM
Simple stats question jvanleuven Bioinformatics 4 05-14-2012 06:53 AM

Reply
 
Thread Tools
Old 09-12-2013, 04:37 PM   #1
AdrianP
Senior Member
 
Location: Ottawa

Join Date: Apr 2011
Posts: 130
Default Stats on assembly

Hello,

I was wonder if there is, or if it is possible to make a script that would quickly evaluate a given fasta file with contigs.

Something that VelvetOptimiser spits out:

Total number of contigs: 49644
n50: 546
length of longest contig: 21951
Total bases in contigs: 25417931
Number of contigs > 1k: 3773
Total bases in contigs > 1k: 6821103

I know you can get the n50 from a simple perl script:
Code:
perl -e 'my ($len,$total)=(0,0);my @x;while(<>){if(/^[\>\@]/){if($len>0){$total+=$len;push@x,$len;};$len=0;}else{s/\s//g;$len+=length($_);}}if ($len>0){$total+=$len;push @x,$len;}@x=sort{$b<=>$a}@x; my ($count,$half)=(0,0);for (my $j=0;$j<@x;$j++){$count+=$x[$j];if(($count>=$total/2)&&($half==0)){print "N50: $x[$j]\n";$half=$x[$j]}elsif($count>=$total*0.9){print "N90: $x[$j]\n";exit;}}' CONTIGS.fasta
Anyone have any idea if such a thing exists? I am not a big scripter myself so I can't help myself now
AdrianP is offline   Reply With Quote
Old 09-12-2013, 05:03 PM   #2
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 431
Default

The assemblathon_stats.pl script from here is what I am using:
https://github.com/ucdavis-bioinform...thon2-analysis

It requires the FAlite.pm module from the same page.
luc is offline   Reply With Quote
Old 09-12-2013, 11:56 PM   #3
maasha
Senior Member
 
Location: Denmark

Join Date: Apr 2009
Posts: 153
Default

You can use Biopieces (www.biopieces.org):


https://code.google.com/p/biopieces/...embled_contigs
maasha is offline   Reply With Quote
Old 09-13-2013, 01:09 AM   #4
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Or Quast (even without a reference): http://quast.bioinf.spbau.ru/
flxlex is offline   Reply With Quote
Old 09-13-2013, 02:53 AM   #5
Blahah404
Member
 
Location: Cambridge, UK

Join Date: Dec 2011
Posts: 48
Default

leaff --stats is also nice, and very fast.
Blahah404 is offline   Reply With Quote
Old 09-13-2013, 05:04 AM   #6
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

You show a pretty mongo Perl one-liner but "aren't good at scripting"? Perhaps you are selling yourself short.
krobison is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:31 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO