Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • EMBL like file to FASTA conversion..

    Helo all, I wanted to parse aEMBL format like file to fasta. i cannot use bioperl because this is not complete EMBL format. so please suggest me how to get this done..

    Code:
    ID   013789-0068 
    PS   TBD 
    OO   huringiensis 
    OS   ringiensis 
    OX 
    SI   68 
    RA 
    RL   2010. OKAYAMA UNIVERSITY,JAPAN LAMB CO LTD 
    FT   source          1..1176 
    MT 
    AC   67106 
    SV 
    CT 
    PN   013789 
    PT   PROTEIN PRODUCTION METHOD, FUSION PROTEIN, AND ANTISERUM 
    PA   AMA UNIVERSITY,JAPAN LAMB CO LTD. 
    PI   HAYAKAWA TORU (JP) SAKAI, HIROSHI, HAYAKAWA, TORU 
    P8 
    P4   10013789 
    P5   0 
    PC   International Classification: \nUS Classification: \nEuropean Classification: C12N15/62; C07K14/47A25 
    PR   80199166; 
    PE   199166 
    AN   09JP63603 
    KC   1 
    P1   ng the DNA into a host bacterium to transform the host bacterium; and (c) causing the expression of the fusion protein in the transformed host bacterium.; The method may further comprise a step of removing the peptide chain (B) from the fusion protein. \n \n 
    P7 
    P9   112 
    PO 
    PM   10013789; 
    PB   10013789 
    PQ   10013789; 
    EM   esentative 
    W1   PRT 
    D1   0204 
    D2   0217 
    D3   0730 
    D4   0801 
    D5   0204 
    HL   [L[P9_GQ;0;3,WO2010013789,45,67]] [L[PM_PN_GQNUC;0;12,WO2010013789]] [L[PQ_PN_GQNUC;0;12,WO2010013789]] 
    CC   mer C1-1-f FH   Key             Location/Qualifiers Copyright (c)Inc. 2011 
    LS   Application 
    L2   Publ. Of int. appl. w4 
     
      MDNNPNINECIPYNCLSNPEVEVLGGERIETGYTPIDISLSLTQFLLSEFVPGAGFVLGLVDIIWGIFGPSQWDAFPVQI 
      EQLINQRIEEFARNQAISRLEGLSNLYQIYAESFREWEADPTNPALREEMRIQFNDMNSALTTAIPLLAVQNYQVPLLSV 
      YVQAANLHLSVLRDVSVFGQRWGFDAATINSRYNDLTRLIGNYTDYAVRWYNTGLERVWGPDSRDWVRYNQFRRELTLTV 
      LDIVALFSNYDSRRYPIRTVSQLTREIYTNPVLENFDGSFRGMAQRIEQNIRQPHLMDILNSITIYTDVHRGFNYWSGHQ 
      ITASPVGFSGPEFAFPLFGNAGNAAPPVLVSLTGLGIFRTLSSPLYRRIILGSGPNNQELFVLDGTEFSFASLTTNLPST 
      IYRQRGTVDSLDVIPPQDNSVPPRAGFSHRLSHVTMLSQAAGAVYTLRAPTFSWQHRSAEFNNIIPSSQITQIPLTKSTN 
      LGSGTSVVKGPGFTGGDILRRTSPGQISTLRVNITAPLSQRYRVRIRYASTTNLQFHTSIDGRPINQGNFSATMSSGSNL 
      QSGSFRTVGFTTPFNFSNGSSVFTLSAHVFNSGNEVYIDRIEFVPAEVTFEAEYDLERAQKAVNELFTSSNQIGLKTDVT 
      DYHIDQVSNLVECLSDEFCLDEKQELSEKVKHAKRLSDERNLLQDPNFRGINRQLDRGWRGSTDITIQGGDDVFKENYVT 
      LLGTFDECYPTYLYQKIDESKLKAYTRYQLRGYIEDSQDLEIYLIRYNAKHETVNVPGTGSLWPLSAQSPIGKCGEPNRC 
      APHLEWNPDLDCSCRDGEKCAHHSHHFSLDIDVGCTDLNEDLGVWVIFKIKTQDGHARLGNLEFLEEKPLVGEALARVKR 
     
    // 
     
    ID   0223489-0068 
    PS   TBD 
    OO   huringiensis 
    OS   ringiensis 
    OX 
    SI   68 
    RA 
    RL   2010. OKAYAMA UNIVERSITY,JAPAN LAMB CO LTD 
    FT   source          1..1176 
    MT 
    AC   67106 
    SV 
    CT 
    PN   013789 
    PT   PRN METHOD, FUSION PROTEIN, AND ANTISERUM 
    PA   AMERSITY,JAMB CO LTD. 
    PI   HAYAKAWA TORU (JP) SAKAI, HIROSHI, HAYAKAWA, TORU 
    P8 
    P4   10013789 
    P5   0 
    PC   International Classification: \nUS Classification: \nEuropean Classification: C12N15/62; C07K14/47A25 
    PR   80199166; 
    PE   199166 
    AN   09JP63603 
    KC   1 
    P1   ng the DNA into a host bacterium to transform the host bacterium; and (c) causing the expression of the fusion protein in the transformed host bacterium.; The method may further comprise a step of removing the peptide chain (B) from the fusion protein. \n \n 
    P7 
    P9   112 
    PO 
    PM   10013789; 
    PB   10013789 
    PQ   10013789; 
    EM   esentative 
    W1   PRT 
    D1   0204 
    D2   0217 
    D3   0730 
    D4   0801 
    D5   0204 
    HL   [L[P9_GQ;0;3,WO2010013789,45,67]] [L[PM_PN_GQNUC;0;12,WO2010013789]] [L[PQ_PN_GQNUC;0;12,WO2010013789]] 
    CC   mer C1-1-f FH   Key             Location/Qualifiers Copyright (c)Inc. 2011 
    LS   Application 
    L2   Publ. Of int. appl. w4 
     
      VLGGERIETGYTPIDISLSLTQFLLSEFVPGAGFVLGLVDIIWGIFGPSQWDAFPVQI 
      EQLINQRIEEFARNQAISRLEGLSNLYQIYAESFREWEADPTNPALREEMRIQFNDMNSALTTAIPLLAVQNYQVPLLSV 
       
     LLGTFDECYPTYLYQKIDESKLKAYTRYQLRGYIEDSQDLEIYLIRYNAKHETVNVPGTGSLWPLSAQSPIGKCGEPNRC 
      APHLEWNPDLDCSCRDGEKCAHHSHHFSLDIDVGCTDLNEDLGVWVIFKIKTQDGHARLGNLEFLEEKPLVGEALARVKR

    The output should be in fasta format which consists of lines starting with ID, PT, PA and Sequence. "//" the two slashes are dividing lines between two EMBL genes.

    Code:
    >013789-0068 ;  PROTEIN PRODUCTION METHOD, FUSION PROTEIN, AND ANTISERUM PA ;   AMA UNIVERSITY,JAPAN LAMB CO LTD. 
    MDNNPNINECIPYNCLSNPEVEVLGGERIETGYTPIDISLSLTQFLLSEFVPGAGFVLGLVDIIWGIFGPSQWDAFPVQI 
      EQLINQRIEEFARNQAISRLEGLSNLYQIYAESFREWEADPTNPALREEMRIQFNDMNSALTTAIPLLAVQNYQVPLLSV 
      YVQAANLHLSVLRDVSVFGQRWGFDAATINSRYNDLTRLIGNYTDYAVRWYNTGLERVWGPDSRDWVRYNQFRRELTLTV 
      LDIVALFSNYDSRRYPIRTVSQLTREIYTNPVLENFDGSFRGMAQRIEQNIRQPHLMDILNSITIYTDVHRGFNYWSGHQ 
      ITASPVGFSGPEFAFPLFGNAGNAAPPVLVSLTGLGIFRTLSSPLYRRIILGSGPNNQELFVLDGTEFSFASLTTNLPST 
      IYRQRGTVDSLDVIPPQDNSVPPRAGFSHRLSHVTMLSQAAGAVYTLRAPTFSWQHRSAEFNNIIPSSQITQIPLTKSTN 
      LGSGTSVVKGPGFTGGDILRRTSPGQISTLRVNITAPLSQRYRVRIRYASTTNLQFHTSIDGRPINQGNFSATMSSGSNL 
      QSGSFRTVGFTTPFNFSNGSSVFTLSAHVFNSGNEVYIDRIEFVPAEVTFEAEYDLERAQKAVNELFTSSNQIGLKTDVT 
      DYHIDQVSNLVECLSDEFCLDEKQELSEKVKHAKRLSDERNLLQDPNFRGINRQLDRGWRGSTDITIQGGDDVFKENYVT 
      LLGTFDECYPTYLYQKIDESKLKAYTRYQLRGYIEDSQDLEIYLIRYNAKHETVNVPGTGSLWPLSAQSPIGKCGEPNRC 
      APHLEWNPDLDCSCRDGEKCAHHSHHFSLDIDVGCTDLNEDLGVWVIFKIKTQDGHARLGNLEFLEEKPLVGEALARVKR 
     
    >0223489-0068 ; PRN METHOD, FUSION PROTEIN, AND ANTISERUM PA  ; AMERSITY,JAMB CO LTD. 
    VLGGERIETGYTPIDISLSLTQFLLSEFVPGAGFVLGLVDIIWGIFGPSQWDAFPVQIMNSALTTAIPLLAVQREEMRIQLE 
      EQLINQRIEEFARNQAISRLEGLSNLYQIYAESFREWEADPTNPALREEMRIQFNDMNSALTTAIPLLAVQNYQVPLLSV 
      LLGTFDECYPTYLYQKIDESKLKAYTRYQLRGYIEDSQDLEIYLIRYNAKHETVNVPGTGSLWPLSAQSPIGKCGEPNRC 
      APHLEWNPDLDCSCRDGEKCAHHSHHFSLDIDVGCTDLNEDLGVWVIFKIKTQDGHARLGNLEFLEEKPLVGEALARVKR
    I hope i am making sense..

  • #2
    You got several answers to your question here:


    It is nice to mention you've asked the question elsewhere, and report back with any solution, so that people here don't waste their time.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • seqadmin
      Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    30 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    32 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    28 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    53 views
    0 likes
    Last Post seqadmin  
    Working...
    X