SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
< Script to compute distribution length of sequences > Giorgio C Bioinformatics 8 08-23-2012 02:29 AM
Illumina fragment length distribution delphi_ote Genomic Resequencing 3 05-18-2012 01:59 AM
Roche gsMapper output exon contigs rather than full-length sequence? sulicon Bioinformatics 0 02-28-2011 04:51 PM
Periodical illumina read length distribution after trimming of low-quality bases luxmare General 4 12-20-2010 03:18 PM
Error in contigs length in 454AllContigs.fna 454 Output file mmanrique Bioinformatics 4 07-21-2010 04:51 AM

Reply
 
Thread Tools
Old 01-11-2012, 01:35 PM   #1
kga1978
Senior Member
 
Location: Boston, MA

Join Date: Nov 2010
Posts: 100
Question Plotting length distribution of contigs?

Hi all,

I have a fasta file with contigs and I would like to plot the lengths of the sequences. I have only found prinseq-graphs that can do that, but it requires me to install a bunch of perl modules (I believe - I can't find any installation information). Is there an easier way of doing this? Preferably a simple script or java program. Here's an example of some of my data:

Code:
>NODE_4_length_492_cov_13.477642
ACGGAGTATGACTTTGTATTGGTGGGTCCTTGCACTGAACCAGCCCCTCTGGTTGTGCAT
AGGGGAGGCTTGTGGGAATGTGGAAAGAAATTGGCGTCCTTTACACCTGTTATACAAGAC
CAGGATCTTGAAGTATTTGTGAGAGAGGTTGGGGACACTTCGTCTGACCTGCTGATTGGG
GCATTGAGTGATATGATGATAGACAGGCTGGGGTTAAGGGTGCAGTGGTCAGGGGTGGAC
ATTGTCTCCACACTTAGGGCTGCAGCGCCGAACTGCGAGGGGATCTTGAGTGCGGTTCTT
GAGGCAGTGGACAACTGGGTGGAGTTCAAAGGTTATGCTCTCTGTTATAGTAAGTCAAAG
GGGAAGGTGATGGTGCAGTCAAGTGGTGGTAAATTGAGACTGAAGGGCAGAACATGTGAG
GAGTTGACTAGGAAGGATGAATGCATCGAAGACATTGAGTAGTCTCCTGGCGATGGTTGG
CTCCCCCGGGGGGGCCCCCGGCGGGGGGTCCCCC
>NODE_7_length_554_cov_17.906137
ATTTATTTTGAGTCTTATGTGAAACCACGTGAAGGACCCCAATGTTCTTGTAGTCGCAAC
AAATGGTCTCACATAAGACTCAAAATAAATCTGCCTCATGAAATTGTCAACAGCATCACT
AGTGCTCACCACTCTTTCCTCCACTATGGGTTCATGTGTCCTACTGTGAGACAGCCTCAA
TTCAGATGATAACACAATGTAATGTTCCTCTCTTTTCCATTTCACAATATGTGAGACAAG
AGATAAGGCTTCACAGTTAACATCCAACGCAACACAGAGATCTAGGAATTTTATTCTAGG
TGACCACTTCATTTTGGTTGACGCTAGATCACTCATGAATGGCAATATGTGCTTCTCAAA
CACCGATGGGTACAGCCTTCTCAAAGAATGAATGATGTGATTCAAACCAACCCTATCCTC
TAATAGTTTTGATGCAGTTGGCTTTAAAGGAAAATAGTCACAAGGGTTATGCTTGAAAAA
ATCCAATACCTTAACTGTCTTAGGTTCCCCTAAGACCCATGCACCCAACTCTATTGCAGT
TGATAAGGAGATGCACATATAATCCCATAACAAGGG
>NODE_8_length_274_cov_16.138685
CCAAAATAAGTTGTCTTCCACTTTCACTCGAGGTGCGCAGAAATTGCTATCTGAAGCTAT
CAACAAGTCTGCATTCCAGAGCTCCATTGCATCTGGCTTTGTGGGGTTATGCAGAACATT
GGGTAGCAAATGTGTTCGGGGACCAAATAAGGAGAATCTGTATATTAAGTCCATTCAGTC
TCTGATTTCTGATGTCAAGGGAATCAAATTATTGACAAATTCTAATGGCATTCAGTATTG
GCGGGTTCCGCTAGAACTTAGAGATGGGAGTGGAAGTGAAAGTGTGGTCAGTTATT
I would like to plot the lengths similar to this:
kga1978 is offline   Reply With Quote
Old 01-11-2012, 10:57 PM   #2
maasha
Senior Member
 
Location: Denmark

Join Date: Apr 2009
Posts: 153
Default

Using Biopieces you can use plot_distribution like this:

Code:
read_fasta -i contigs.fna | plot_distribution -k SEQ_LEN -x


                                  Distribution
      +          +          +          +           +          +          +
  14 ++*---------+----------+----------+-----------+----------+----------+-++
      |*                                                                   |
  12 ++*                                                                   ++
      |*                                                                   |
      |*                                                                   |
  10 ++*                                                                   ++
      |*                                                                   |
   8 ++*                                                                   ++
      |** *                                                                |
   6 ++****                                                                ++
      |****                                                                |
   4 ++*******                                                             ++
      |*******                                                             |
      |***********       * *                                               |
   2 ++**********************                                              ++
      |******************************************* **  * *            *    *
   0 ++*******************************************-**--*-*----+-------*--+-*+
      +          +          +          +           +          +          +
      0        10000      20000      30000       40000      50000      60000
                                     SEQ_LEN

It is also possible to output to X11 terminal or PNG, PDF, PS, and SVG.

Installing Biopieces requires a working setup of Perl w. relevant modules and Ruby w. relevant gems. However, it should be worth the trouble since Biopieces is a nice toolbox IMHO.
maasha is offline   Reply With Quote
Old 01-16-2012, 01:22 AM   #3
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

There's a "Histogram of sequence lengths" example in the Biopython Tutorial, it usesNot Java, but also not Perl
maubp is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:58 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO