Seqanswers Leaderboard Ad

**krobison** · 03-04-2013, 02:05 PM

Code:

#!/usr/bin/perl
use strict;
use Bio::SeqIO;

## debugging/tuning left as exercise for student :-)

my $reader=new Bio::SeqIO(-format=>'fasta',-file=>$ARGV[0]);
my $writer=new Bio::SeqIO(-format=>'fasta',-file=>$ARGV[0]."trimmed");
while (my $rec=$reader->next_seq)
{
  ## works only on forward strand & 0 mismatches!!!!
   if ($seq->seq=~/(ATAGCCGGCACCCTGGT.*GGCCATATGAGTGGGCC)/i)
   {
   $rec->seq($1);  
   $writer->write_seq($rec);
  } 
  else
  {
     print STDERR "Could not find head and/or tail sequences for ",$seq->id,"\n";
   }
}

**baika** · 03-04-2013, 02:34 PM

Thanks Krobison for the perl script. It gives an error-
Global symbol "$seq" requires explicit package name at ../../Scripts/trim_fastaseq.pl line 12.
Global symbol "$seq" requires explicit package name at ../../Scripts/trim_fastaseq.pl line 19.

Baika

**kmcarr** · 03-04-2013, 07:25 PM

Sorry baika; I was going to answer until I saw this line in Keith's code:

## debugging/tuning left as exercise for student :-)

Keith, did you stick the bug in there on purpose?

**A_Morozov** · 03-04-2013, 09:12 PM

No fun, guys. Baika might be (and claims to be in introduction section) a non-bioniformatitian stuck with what is not his/her area of expertise. If you went to genomics/bioinformatics lab, you should at least learn some Perl/Python, but for now - seems like it should be $rec, not $seq under regexp (the scary thingy with // and lots of uppercase).

**kmcarr** · 03-05-2013, 03:47 AM

Originally posted by A_Morozov View Post

No fun, guys. Baika might be (and claims to be in introduction section) a non-bioniformatitian stuck with what is not his/her area of expertise. If you went to genomics/bioinformatics lab, you should at least learn some Perl/Python, but for now - seems like it should be $rec, not $seq under regexp (the scary thingy with // and lots of uppercase).

Oh, it was late and I was feeling a tad impish; I wasn't going to leave baika hanging long. Anyway the better solution is to change line #9, naming the first object $seq.

Code:

Change

while (my $[COLOR="Red"]rec[/COLOR]=$reader->next_seq)

to

while (my $[COLOR="red"]seq[/COLOR]=$reader->next_seq)

**d1antho** · 03-05-2013, 05:57 AM

Not that it should make a difference, because it is unlikely to match in the middle of a sequence, but the the match operator should be bounded to the start and the end of the sequence read. But honestly I doubt it should make any difference at all

So this

Code:

if ($seq->seq=~/(ATAGCCGGCACCCTGGT.*GGCCATATGAGTGGGCC)/i)
   {
   $rec->seq($1);  
   $writer->write_seq($rec);
  }

Should then be

Code:

if ($seq->seq=~/(^ATAGCCGGCACCCTGGT.*GGCCATATGAGTGGGCC$)/i)
   {
   $rec->seq($1);  
   $writer->write_seq($rec);
  }

**baika** · 03-05-2013, 11:03 AM

finally working

Thanks krobison for writing this script, and kmcarr for pointing out the error. After incorporating all your suggestions and help from my friend Robert, finally it is working.

Thank you all

baika

Code:

#!/usr/bin/perl -w

#Usage: trim_fasta.pl YOUR_FASTA_FILE.fasta OUT_FILE_TRIMMED.FASTA

use strict;
use Bio::SeqIO;

my $reader=new Bio::SeqIO(-format=>'fasta',-file=>$ARGV[0]);
my $writer=new Bio::SeqIO(-format=>'fasta',-file=>">$ARGV[1]");
while (my $seq=$reader->next_seq)
{
  ## works only on forward strand & 0 mismatches!!!!
   if ($seq->seq=~/(CCAGTATTTGGTA.*AGTTGATAACTGGGAA)/i)
   {
   $seq->seq($1);  
   $writer->write_seq($seq);
  } 
  else
  {
     print STDERR "Could not find head and/or tail sequences for ",$seq->id,"\n";
   }
}

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Yesterday, 11:49 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

trimming FASTA file

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News