![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Illumina fragment length distribution | delphi_ote | Genomic Resequencing | 3 | 05-18-2012 02:59 AM |
Plotting length distribution of contigs? | kga1978 | Bioinformatics | 2 | 01-16-2012 02:22 AM |
Demultilplexing Sequences according to barcode --- Homemade script | Giorgio C | Bioinformatics | 9 | 10-14-2011 08:17 AM |
Script for calculating the average distribution of tags per gene from ChIP-seq?? | Giles | General | 1 | 09-12-2011 03:25 AM |
Compute paired-end distance distribution? | krobison | Bioinformatics | 11 | 11-12-2009 09:30 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: ITALY Join Date: Oct 2010
Posts: 89
|
![]()
Hi all,
i have a question for you: Is there any tool to identify in a fasta file the read lenght distribution ?? I mean for exemple talking about miRNAs: " How many reads have 18 nucleotides, how many 19 nt, 20 nt....until 28 nt !! Can you suggest me something also a little script please ??? At the end i need to do a graph showing this information. Thank you very much |
![]() |
![]() |
![]() |
#2 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]() Quote:
Code:
#!/usr/bin/perl use warnings; use strict; use Statistics::Descriptive; my $stat = Statistics::Descriptive::Full->new(); my (%distrib); my @bins = qw/18 19 20 21 22 23 24 25 26 27 28/; my $fastaFile = shift; open (FASTA, "<$fastaFile"); $/ = ">"; my $junkFirstOne = <FASTA>; while (<FASTA>) { chomp; my ($def,@seqlines) = split /\n/, $_; my $seq = join '', @seqlines; $stat->add_data(length($seq)); } %distrib = $stat->frequency_distribution(\@bins); print "Total reads:\t" . $stat->count() . "\n"; print "Total nt:\t" . $stat->sum() . "\n"; print "Mean length:\t" . $stat->mean() . "\n"; print "Median length:\t" . $stat->median() . "\n"; print "Mode length:\t" . $stat->mode() . "\n"; print "Max length:\t" . $stat->max() . "\n"; print "Min length:\t" . $stat->min() . "\n"; print "Length\t# Seqs\n"; foreach (sort {$a <=> $b} keys %distrib) { print "$_\t$distrib{$_}\n"; } |
|
![]() |
![]() |
![]() |
#3 |
Member
Location: ITALY Join Date: Oct 2010
Posts: 89
|
![]()
Amazing !!! Heartfelt thanks !!!
|
![]() |
![]() |
![]() |
#4 |
Member
Location: ITALY Join Date: Oct 2010
Posts: 89
|
![]()
Do you know how to modify the @bins array to know the total number of reads > of 28 nt and the tolat number of reads < of 18 nt....Is there possibile in one time ???
Thanks |
![]() |
![]() |
![]() |
#5 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]()
From the module documentation for the frequency_distribution method:
Quote:
|
|
![]() |
![]() |
![]() |
#6 |
Member
Location: ITALY Join Date: Oct 2010
Posts: 89
|
![]()
Infact it wasn't clear to me but now works fine for my purpose; Thank you very much again you have been very helpful !
|
![]() |
![]() |
![]() |
#7 |
not just another member
Location: Belgium Join Date: Aug 2010
Posts: 264
|
![]()
other bioperl script
Code:
#!/usr/bin/perl # # distribution length of a fasta file # use warnings; use strict; use Bio::SeqIO; my $i = $ARGV[0]; my $o = $ARGV[1]; open (OUT, '>',$o); my $seq_in = Bio::SeqIO->new( -format => 'fasta',-file => $i); my %length = (); while( my $seq = $seq_in->next_seq() ) { my $seq1 = $seq->seq; chomp($seq1); #print($seq1); my $l = length($seq1); #print $l."\n"; my $tmp = $length {$l}; $length {$l} = $tmp+1; } my @x = keys(%length); my $sum = 0; foreach my $k (@x) { $sum = $sum + $length{$k}; } @x = sort{$a <=> $b}@x; print(@x); foreach my $k (@x) { print OUT $k."\t".$length{$k}."\t".$length{$k}*100/$sum."\n"; } close (OUT); |
![]() |
![]() |
![]() |
#8 |
Member
Location: ITALY Join Date: Oct 2010
Posts: 89
|
![]()
Thanks you, i'm trying it !!!
|
![]() |
![]() |
![]() |
#9 |
Junior Member
Location: france Join Date: Sep 2010
Posts: 24
|
![]()
--hi
how to adjust @bins to print the report of all the length distribution ? thank you -- Laurent -- |
![]() |
![]() |
![]() |
Thread Tools | |
|
|