SEQanswers

SEQanswers (http://seqanswers.com/forums/index.php)
-   Bioinformatics (http://seqanswers.com/forums/forumdisplay.php?f=18)
-   -   Plotting length distribution of contigs? (http://seqanswers.com/forums/showthread.php?t=16778)

kga1978 01-11-2012 02:35 PM

Plotting length distribution of contigs?
 
Hi all,

I have a fasta file with contigs and I would like to plot the lengths of the sequences. I have only found prinseq-graphs that can do that, but it requires me to install a bunch of perl modules (I believe - I can't find any installation information). Is there an easier way of doing this? Preferably a simple script or java program. Here's an example of some of my data:

Code:

>NODE_4_length_492_cov_13.477642
ACGGAGTATGACTTTGTATTGGTGGGTCCTTGCACTGAACCAGCCCCTCTGGTTGTGCAT
AGGGGAGGCTTGTGGGAATGTGGAAAGAAATTGGCGTCCTTTACACCTGTTATACAAGAC
CAGGATCTTGAAGTATTTGTGAGAGAGGTTGGGGACACTTCGTCTGACCTGCTGATTGGG
GCATTGAGTGATATGATGATAGACAGGCTGGGGTTAAGGGTGCAGTGGTCAGGGGTGGAC
ATTGTCTCCACACTTAGGGCTGCAGCGCCGAACTGCGAGGGGATCTTGAGTGCGGTTCTT
GAGGCAGTGGACAACTGGGTGGAGTTCAAAGGTTATGCTCTCTGTTATAGTAAGTCAAAG
GGGAAGGTGATGGTGCAGTCAAGTGGTGGTAAATTGAGACTGAAGGGCAGAACATGTGAG
GAGTTGACTAGGAAGGATGAATGCATCGAAGACATTGAGTAGTCTCCTGGCGATGGTTGG
CTCCCCCGGGGGGGCCCCCGGCGGGGGGTCCCCC
>NODE_7_length_554_cov_17.906137
ATTTATTTTGAGTCTTATGTGAAACCACGTGAAGGACCCCAATGTTCTTGTAGTCGCAAC
AAATGGTCTCACATAAGACTCAAAATAAATCTGCCTCATGAAATTGTCAACAGCATCACT
AGTGCTCACCACTCTTTCCTCCACTATGGGTTCATGTGTCCTACTGTGAGACAGCCTCAA
TTCAGATGATAACACAATGTAATGTTCCTCTCTTTTCCATTTCACAATATGTGAGACAAG
AGATAAGGCTTCACAGTTAACATCCAACGCAACACAGAGATCTAGGAATTTTATTCTAGG
TGACCACTTCATTTTGGTTGACGCTAGATCACTCATGAATGGCAATATGTGCTTCTCAAA
CACCGATGGGTACAGCCTTCTCAAAGAATGAATGATGTGATTCAAACCAACCCTATCCTC
TAATAGTTTTGATGCAGTTGGCTTTAAAGGAAAATAGTCACAAGGGTTATGCTTGAAAAA
ATCCAATACCTTAACTGTCTTAGGTTCCCCTAAGACCCATGCACCCAACTCTATTGCAGT
TGATAAGGAGATGCACATATAATCCCATAACAAGGG
>NODE_8_length_274_cov_16.138685
CCAAAATAAGTTGTCTTCCACTTTCACTCGAGGTGCGCAGAAATTGCTATCTGAAGCTAT
CAACAAGTCTGCATTCCAGAGCTCCATTGCATCTGGCTTTGTGGGGTTATGCAGAACATT
GGGTAGCAAATGTGTTCGGGGACCAAATAAGGAGAATCTGTATATTAAGTCCATTCAGTC
TCTGATTTCTGATGTCAAGGGAATCAAATTATTGACAAATTCTAATGGCATTCAGTATTG
GCGGGTTCCGCTAGAACTTAGAGATGGGAGTGGAAGTGAAAGTGTGGTCAGTTATT

I would like to plot the lengths similar to this:
http://f.cl.ly/items/0x150a1N1v1f142z0r2q/png.png

maasha 01-11-2012 11:57 PM

Using Biopieces you can use plot_distribution like this:

Code:

read_fasta -i contigs.fna | plot_distribution -k SEQ_LEN -x


                                  Distribution
      +          +          +          +          +          +          +
  14 ++*---------+----------+----------+-----------+----------+----------+-++
      |*                                                                  |
  12 ++*                                                                  ++
      |*                                                                  |
      |*                                                                  |
  10 ++*                                                                  ++
      |*                                                                  |
  8 ++*                                                                  ++
      |** *                                                                |
  6 ++****                                                                ++
      |****                                                                |
  4 ++*******                                                            ++
      |*******                                                            |
      |***********      * *                                              |
  2 ++**********************                                              ++
      |******************************************* **  * *            *    *
  0 ++*******************************************-**--*-*----+-------*--+-*+
      +          +          +          +          +          +          +
      0        10000      20000      30000      40000      50000      60000
                                    SEQ_LEN


It is also possible to output to X11 terminal or PNG, PDF, PS, and SVG.

Installing Biopieces requires a working setup of Perl w. relevant modules and Ruby w. relevant gems. However, it should be worth the trouble since Biopieces is a nice toolbox IMHO.

maubp 01-16-2012 02:22 AM

There's a "Histogram of sequence lengths" example in the Biopython Tutorial, it usesNot Java, but also not Perl ;)


All times are GMT -8. The time now is 07:53 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.