Seqanswers Leaderboard Ad

**zhidkov.ilia** · 10-22-2013, 01:59 AM

Sounds like question to PerlMonks forum, you can ask there how properly use 'substr' function for your tasks.

**sklages** · 10-22-2013, 02:32 AM

You need to get an idea on a) how to parse multi fasta files b) how to split each individual sequence found in your file.

a) http://lmgtfy.com/?q=perl+parse+fasta+file
b) http://lmgtfy.com/?q=perl+split+large+genome+sequence

It's a good exercise for a beginner ..

**pony2001mx** · 10-22-2013, 06:51 AM

Dear zhidkov.ilia and sklages,
THANKS A LOT for your replys. I think again. I can use a simplified method, i.e., combine all contigs into one sequence (I can do this), then break/split the sequence into every 2kb fragments (I need a script for this). Would you or otheres please generate this script for me? GREATLY APPRECIATE YOUR HELPS!!

**pony2001mx** · 10-22-2013, 06:54 AM

perl script:break contig into 2kb sequences

Dear zhidkov.ilia and sklages,
THANKS A LOT for your replys. I think again. I can use a simplified method, i.e., combine all contigs into one sequence (I can do this), then break/split the sequence into every 2kb fragments (I need a script for this). Would you or otheres please generate this script for me? GREATLY APPRECIATE YOUR HELPS!!

**bruce01** · 10-22-2013, 07:13 AM

I would use something like the for loop below:

Code:

for (my $i=0;$i<length($seq);$i+=1900){
     my $j=$i+2000;
     print OUT substr($seq,$i,$j);
}

But I don't think anyone is going to write your whole script for you!

**pony2001mx** · 10-22-2013, 07:19 AM

Thank you very much!

**krobison** · 10-22-2013, 07:31 AM

Originally posted by bruce01 View Post

I would use something like the for loop below:

Code:

for (my $i=0;$i<length($seq);$i+=1900){
     my $j=$i+2000;
     print OUT substr($seq,$i,$j);
}

But I don't think anyone is going to write your whole script for you!

Sounds like a dare! It's really a trivial program & good template for writing other programs that transform sequence data. A good exercise is to use Getopt::Long to set the cutoff size and overlap size.

Code:

use strict;
use Bio::SeqIO;
my $cutSize=2000; my $overlapSize=100;
my $writer=new Bio::SeqIO(-file=>">splits.fa");
foreach my $arg(@ARGV)
{
   my $rdr=new Bio::SeqIO(-file=>$arg);
   while (my $seqObj=$rdr->next_seq)
   {
      for (my $i=1; $i<$seqObj->length; $i+=$cutSize-$overlapSize)
      {
          my $endPoint=$i+$cutSize; 
          $endPoint=$seqObj->length if ($endPoint>$seqObj->length);
          my $subseq=$seqObj->subseq($i,$i+$cutSize);
          $writer->write_seq(new Bio::Seq(-id=>$seqObj->id.".$endPoint",-seq=>$subseq));
      }
   }
}

Typo correction & debugging left as exercise for the student

**bruce01** · 10-22-2013, 08:06 AM

Originally posted by krobison View Post

Sounds like a dare!

Good on you krobison! Wasn't being mean, I would have given it a go but had a bit much in front of me. Debugging is the hardest bit when learning.

**sklages** · 10-22-2013, 10:48 PM

I do not have the impression that the OP wants to learn too much ..
So he/she could use google to find some ready-to-use solutions, in perl or whatever language, e.g. http://cpansearch.perl.org/src/CJFIE...p_split_seq.pl ..

I still think it would be a great exercise for learning perl (in "bioinformatics"). Though I usually try to avoid bioperl ;-)

**pony2001mx** · 10-22-2013, 11:34 PM

Thank you all for your inputs. As a true beginner of perl (I am mostly involved in bench work), I will persits on learning perl. THANKS for your help!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

ask perl script: break contigs into overlapping sequences

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News