I'm not so familiar with perl, is there script can fetch specific length of sequence from a fasta file? many thanks!
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
its name is bp_aacomp.pl, this script can count the aminoacids of a FASTA format file. This doesn't work good with multifasta file, because it takes count of all the aminoacids of the file. If you want to count the aminoacids of 1 sequence I would recomend you. the usage is really simple
perl bp_aacomp.pl yourfile.fasta
if you need to bp, you can try this LINK where the is another script.
#!perl
use strict;
use warnings;
use Carp;
use Bio::SeqIO;
use Getopt::Long;
use Bio::SeqUtils;
use Bio::Tools::IUPAC;
my $table = new Bio::SeqUtils;
my @BASES = $table->valid_aa(0);
my %all = $table->valid_aa(2);
my ($file,$format,$help) = ( undef, 'fasta');
GetOptions(
'i|in:s' => \$file,
'f|format:s' => \$format,
'h|help|?' => \$help,
);
my $USAGE = "usage: aacomp [-f format] filename\n\tdefault format is fasta\n";
$file = shift unless $file;
die $USAGE if $help;
my $seqin;
if( defined $file ) {
print "Could not open file [$file]\n$USAGE" and exit unless -e $file;
$seqin = new Bio::SeqIO(-format => $format,
-file => $file);
} else {
$seqin = new Bio::SeqIO(-format => $format,
-fh => \*STDIN);
}
my %composition;
my $total;
foreach my $base ( @BASES ) {
$composition{$base} = 0;
}
while ( my $seq = $seqin->next_seq ) {
if( $seq->alphabet ne 'protein' ) {
confess("Must only provide amino acid sequences to aacomp...skipping this seq");
next;
}
foreach my $base ( split(//,$seq->seq()) ) {
$composition{uc $base}++;
$total++;
}
}
printf("%d aa\n",$total);
printf("%5s %4s\n", 'aa', '#' );
my $ct = 0;
foreach my $base ( @BASES ) {
printf(" %s %s %3d\n", $base, $all{$base}, $composition{$base} );
$ct += $composition{$base};
}
printf( "%6s %s\n", '','-'x5);
printf( "%6s %3d\n", '',$ct);
__END__
=head1 NAME
bp_aacomp - amino acid composition of protein sequences
=head1 SYNOPSIS
bp_aacomp [-f/--format FORMAT] [-h/--help] filename
or
bp_aacomp [-f/--format FORMAT] < filename
or
bp_aacomp [-f/--format FORMAT] -i filename
=head1 DESCRIPTION
This scripts prints out the count of amino acids over all protein
sequences from the input file.
=head1 OPTIONS
The default sequence format is fasta.
The sequence input can be provided using any of the three methods:
=over 3
=item unnamed argument
bp_aacomp filename
=item named argument
bp_aacomp -i filename
=item standard input
bp_aacomp < filename
=back
=head1 FEEDBACK
=head2 Mailing Lists
User feedback is an integral part of the evolution of this and other
Bioperl modules. Send your comments and suggestions preferably to
the Bioperl mailing list. Your participation is much appreciated.
[email protected] - General discussion
http://bioperl.org/wiki/Mailing_lists - About the mailing lists
=head2 Reporting Bugs
Report bugs to the Bioperl bug tracking system to help us keep track
of the bugs and their resolution. Bug reports can be submitted via the
web:
=head1 AUTHOR - Jason Stajich
Email [email protected]
=head1 HISTORY
Based on aacomp.c from an old version of EMBOSS
=cut
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...-
Channel: Articles
Yesterday, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
39 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
41 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
35 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
55 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Comment