Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I convert one file into several multifasta fasta files using perl

    Hi
    It's the first time I work with Perl then I don't know so much about the language.
    I need a script which converts multi-fasta file in several different fasta file.
    For example:

    In multi-fasta file contains:

    >seq1
    CTGACTGGGAGTACGAAGGCCGCCTGCACAAGACAACGGGGCAGCGAACCTTCTTCTGCACCGGCACGGA
    CGACGCCGAGATGCCTCGACCTGGAGAACCTCGGCCGCGGCGAACCGCTCGCCCATGT
    >seq2
    GACGTGCTGCTGGAGATCGCCACGCCGGGTCGCTCGTTCTGTAAGCGGATGTCGATATTGGTTGACTGAT
    AGCTGGCGCGCGGTAGCATCTCGAACATGCGTTCGAGACCAGAGACGGGGCATGTAGA
    >seq3
    CTGACTGGGAGTACGAAGGCCGCCTGCACAAGACAACGGGGCAGCGAACCTTCTTCTGCACCGGCACGGA
    CGACGCCGAGATGCCTCGACCTGGAGAACCTCGGCCGCGGCGAACCGCTCGCCCATGT

    I want to convert this multifasta file into several output files fasta:

    file1.fasta
    >seq1
    CTGACTGGGAGTACGAAGGCCGCCTGCACAAGACAACGGGGCAGCGAACCTTCTTCTGCACCGGCACGGA
    CGACGCCGAGATGCCTCGACCTGGAGAACCTCGGCCGCGGCGAACCGCTCGCCCATGT

    file2.fasta
    >seq2
    GACGTGCTGCTGGAGATCGCCACGCCGGGTCGCTCGTTCTGTAAGCGGATGTCGATATTGGTTGACTGAT
    AGCTGGCGCGCGGTAGCATCTCGAACATGCGTTCGAGACCAGAGACGGGGCATGTAGA

    file3.fasta
    >seq3
    CTGACTGGGAGTACGAAGGCCGCCTGCACAAGACAACGGGGCAGCGAACCTTCTTCTGCACCGGCACGGA
    CGACGCCGAGATGCCTCGACCTGGAGAACCTCGGCCGCGGCGAACCGCTCGCCCATGT

    Remembering: must be in perl
    Last edited by iMarcelo; 05-05-2016, 10:25 AM.

  • #2
    Is this a homework assignment? If it is then you may want to show what you have done so far to get help.

    Comment


    • #3
      Code:
      #!/usr/bin/env perl
      
      use strict;
      use warnings;
      use IO::File;
      
      my $usage = "\nUSAGE: perl $0 <Fasta>"."\n";
      print $usage and exit unless($ARGV[0]);
      my $fh  = IO::File->new("$ARGV[0]");
      my $out = "";
      
      
      while(my $line = $fh->getline)
      {
          chomp($line);
          if($line =~ /^>/)
          {
      	if($out ne "")
      	{
      	    $out->close;
      	}
      	$line =~ s/^>//;
      	$out  = IO::File->new("> $line.fa");
      	print $out ">".$line."\n";
          }
          else
          {
      	print $out $line."\n";
          }
      }
      
      $fh->close;
      $out->close;
      Last edited by vivek_; 05-06-2016, 06:37 AM. Reason: Reposted code

      Comment


      • #4
        @vivek: This may be a homework assignment. In that case you don't want to provide ready code. @iMarcelo is not going to learn this way.

        Comment


        • #5
          My bad, I thought it might be a technician struggling with a task. I'll remove it but if he is subscribed to e-mail notifications, he'll get the post in e-mail anyways.

          Comment


          • #6
            That is ok. Your code used modules and a beginner student probably would not be expected to know that. iMarcelo may not be able to use that code, in that case.

            The part about it has to be perl is why I thought this may be homework.

            Comment


            • #7
              No, it isn't a homework assignment. It's a task in my job. I have no knowledge in perl language,
              I have always done my tasks in Java or C, but the person who works with perl is on vacation, so my boss gave me this task to do. I am doing research on this, but still no success.
              So who can help me, thank you.

              Comment


              • #8
                While @vivek puts his code back up you can use http://hgdownload.cse.ucsc.edu/admin...x86_64/faSplit utility from Jim Kent to do this very efficiently.

                If you know Java/C then the end result should be the same as using perl if you used one of those languages.

                Comment


                • #9
                  Originally posted by iMarcelo View Post
                  No, it isn't a homework assignment. It's a task in my job. I have no knowledge in perl language,
                  I have always done my tasks in Java or C, but the person who works with perl is on vacation, so my boss gave me this task to do. I am doing research on this, but still no success.
                  So who can help me, thank you.
                  I have reposted the code.

                  Comment


                  • #10
                    Originally posted by vivek_ View Post
                    I have reposted the code.
                    Thank you so much @vivek.

                    I have difficulty working with languages that I've never worked before, and I had to do this task for my boss urgently.

                    I will continue doing research on this language for me to learn more for future tasks.

                    Comment


                    • #11
                      Originally posted by GenoMax View Post
                      While @vivek puts his code back up you can use http://hgdownload.cse.ucsc.edu/admin...x86_64/faSplit utility from Jim Kent to do this very efficiently.

                      If you know Java/C then the end result should be the same as using perl if you used one of those languages.
                      Yes, the end result is the same, but my boss wanted in Perl, I don't know why.

                      Thank you for your help.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin


                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                        Yesterday, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      39 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      41 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      35 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X