SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
shuffleSequences.pl script in velvet pbm13 RNA Sequencing 7 05-19-2015 09:05 AM
ask perl script: break contigs into overlapping sequences pony2001mx Bioinformatics 10 10-23-2013 12:34 AM
Basic statistics from alignment using bwa NGS_New_User Bioinformatics 1 10-15-2012 12:55 AM
Velvet compilation: basic question gmer Bioinformatics 1 05-31-2012 09:46 AM
number of contigs in velvet bioenvisage Bioinformatics 6 03-24-2010 09:10 PM

Reply
 
Thread Tools
Old 12-10-2013, 05:19 AM   #1
mmmm
Senior Member
 
Location: UK

Join Date: Jul 2013
Posts: 131
Default script to get basic statistics from velvet contigs.fa

could I get a script that can generate some basic statistics from the velvet output file (contigs.fa)
mmmm is offline   Reply With Quote
Old 12-10-2013, 05:30 AM   #2
sphil
Senior Member
 
Location: Stuttgart, Germany

Join Date: Apr 2010
Posts: 192
Default

Please be more specific about what you need and what your understanding of "statistics from the velvet output" is. Do you mean N50, number of Contigs, longest, shortest....
sphil is offline   Reply With Quote
Old 12-10-2013, 05:52 AM   #3
mmmm
Senior Member
 
Location: UK

Join Date: Jul 2013
Posts: 131
Default

Thanks. to be more specific I would like to see the following statistics for velvet output fasta file:

-Statistics for contig lengths:
Min contig length
Max contig length
Mean contig length
standard deviation of contig length
Median contig length
N50 contig length

- Statistics for no. of contigs:
No. of contigs
No of contigs >=1kb
No. of contigs in N50

-Statistics for bases in the contigs:
No. of bases in all contigs
No. of bases in contigs >=1kb
GC content of contigs

-Simple Dinucleotide repeats:
No. of contigs with over 70% dinucleotide repeats
AT
CG
AC
TG
AG
TC

_Simple mononucleotide repeats:
No. of contigs with over 50% dinucleotide repeats
AA
TT
CC
GG
mmmm is offline   Reply With Quote
Old 12-10-2013, 06:02 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,082
Default

Take your pick:

https://github.com/ajmazurie/velvet-stats

http://milkweedgenome.org/?q=node/73

http://korflab.ucdavis.edu/datasets/...athon_stats.pl
GenoMax is offline   Reply With Quote
Old 12-10-2013, 06:03 AM   #5
sphil
Senior Member
 
Location: Stuttgart, Germany

Join Date: Apr 2010
Posts: 192
Default

i don't which of your interest will be covered but have a look here velet std summary. I find the way they do it quite neat. However, i think you have to script something yourself to get all desired information.
sphil is offline   Reply With Quote
Old 12-11-2013, 01:10 AM   #6
maasha
Senior Member
 
Location: Denmark

Join Date: Apr 2009
Posts: 153
Default

Using Biopieces (www.biopieces.org):

Assembled contigs can be analyzed to get some stats using analyze_assembly. We include a filtering step to discard contigs shorter than 200 bases:

Code:
read_fasta -i contigs.fna |
grab -e "SEQ_LEN>=200" |
analyze_assembly -x
And the output:

Code:
N50: 9082
MAX: 52038
MIN: 200
MEAN: 4170
TOTAL: 3057214
COUNT: 733
---
There are many other options for stats as well. Look here:

https://code.google.com/p/biopieces/...aning_NGS_data
maasha is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:57 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO