![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
< Script to compute distribution length of sequences > | Giorgio C | Bioinformatics | 8 | 08-23-2012 03:29 AM |
Illumina fragment length distribution | delphi_ote | Genomic Resequencing | 3 | 05-18-2012 02:59 AM |
Roche gsMapper output exon contigs rather than full-length sequence? | sulicon | Bioinformatics | 0 | 02-28-2011 05:51 PM |
Periodical illumina read length distribution after trimming of low-quality bases | luxmare | General | 4 | 12-20-2010 04:18 PM |
Error in contigs length in 454AllContigs.fna 454 Output file | mmanrique | Bioinformatics | 4 | 07-21-2010 05:51 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Senior Member
Location: Boston, MA Join Date: Nov 2010
Posts: 100
|
![]()
Hi all,
I have a fasta file with contigs and I would like to plot the lengths of the sequences. I have only found prinseq-graphs that can do that, but it requires me to install a bunch of perl modules (I believe - I can't find any installation information). Is there an easier way of doing this? Preferably a simple script or java program. Here's an example of some of my data: Code:
>NODE_4_length_492_cov_13.477642 ACGGAGTATGACTTTGTATTGGTGGGTCCTTGCACTGAACCAGCCCCTCTGGTTGTGCAT AGGGGAGGCTTGTGGGAATGTGGAAAGAAATTGGCGTCCTTTACACCTGTTATACAAGAC CAGGATCTTGAAGTATTTGTGAGAGAGGTTGGGGACACTTCGTCTGACCTGCTGATTGGG GCATTGAGTGATATGATGATAGACAGGCTGGGGTTAAGGGTGCAGTGGTCAGGGGTGGAC ATTGTCTCCACACTTAGGGCTGCAGCGCCGAACTGCGAGGGGATCTTGAGTGCGGTTCTT GAGGCAGTGGACAACTGGGTGGAGTTCAAAGGTTATGCTCTCTGTTATAGTAAGTCAAAG GGGAAGGTGATGGTGCAGTCAAGTGGTGGTAAATTGAGACTGAAGGGCAGAACATGTGAG GAGTTGACTAGGAAGGATGAATGCATCGAAGACATTGAGTAGTCTCCTGGCGATGGTTGG CTCCCCCGGGGGGGCCCCCGGCGGGGGGTCCCCC >NODE_7_length_554_cov_17.906137 ATTTATTTTGAGTCTTATGTGAAACCACGTGAAGGACCCCAATGTTCTTGTAGTCGCAAC AAATGGTCTCACATAAGACTCAAAATAAATCTGCCTCATGAAATTGTCAACAGCATCACT AGTGCTCACCACTCTTTCCTCCACTATGGGTTCATGTGTCCTACTGTGAGACAGCCTCAA TTCAGATGATAACACAATGTAATGTTCCTCTCTTTTCCATTTCACAATATGTGAGACAAG AGATAAGGCTTCACAGTTAACATCCAACGCAACACAGAGATCTAGGAATTTTATTCTAGG TGACCACTTCATTTTGGTTGACGCTAGATCACTCATGAATGGCAATATGTGCTTCTCAAA CACCGATGGGTACAGCCTTCTCAAAGAATGAATGATGTGATTCAAACCAACCCTATCCTC TAATAGTTTTGATGCAGTTGGCTTTAAAGGAAAATAGTCACAAGGGTTATGCTTGAAAAA ATCCAATACCTTAACTGTCTTAGGTTCCCCTAAGACCCATGCACCCAACTCTATTGCAGT TGATAAGGAGATGCACATATAATCCCATAACAAGGG >NODE_8_length_274_cov_16.138685 CCAAAATAAGTTGTCTTCCACTTTCACTCGAGGTGCGCAGAAATTGCTATCTGAAGCTAT CAACAAGTCTGCATTCCAGAGCTCCATTGCATCTGGCTTTGTGGGGTTATGCAGAACATT GGGTAGCAAATGTGTTCGGGGACCAAATAAGGAGAATCTGTATATTAAGTCCATTCAGTC TCTGATTTCTGATGTCAAGGGAATCAAATTATTGACAAATTCTAATGGCATTCAGTATTG GCGGGTTCCGCTAGAACTTAGAGATGGGAGTGGAAGTGAAAGTGTGGTCAGTTATT ![]() |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: Denmark Join Date: Apr 2009
Posts: 153
|
![]()
Using Biopieces you can use plot_distribution like this:
Code:
read_fasta -i contigs.fna | plot_distribution -k SEQ_LEN -x Distribution + + + + + + + 14 ++*---------+----------+----------+-----------+----------+----------+-++ |* | 12 ++* ++ |* | |* | 10 ++* ++ |* | 8 ++* ++ |** * | 6 ++**** ++ |**** | 4 ++******* ++ |******* | |*********** * * | 2 ++********************** ++ |******************************************* ** * * * * 0 ++*******************************************-**--*-*----+-------*--+-*+ + + + + + + + 0 10000 20000 30000 40000 50000 60000 SEQ_LEN It is also possible to output to X11 terminal or PNG, PDF, PS, and SVG. Installing Biopieces requires a working setup of Perl w. relevant modules and Ruby w. relevant gems. However, it should be worth the trouble since Biopieces is a nice toolbox IMHO. |
![]() |
![]() |
![]() |
#3 |
Peter (Biopython etc)
Location: Dundee, Scotland, UK Join Date: Jul 2009
Posts: 1,543
|
![]()
There's a "Histogram of sequence lengths" example in the Biopython Tutorial, it usesNot Java, but also not Perl
![]() |
![]() |
![]() |
![]() |
Thread Tools | |
|
|