Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Concatenating Sequences Within a single Fasta File

    Hi all,
    I need to concatenate a bunch of sequences in a FASTA file. I have a file of extracted introns and would like to essentially splice them all together for use in a program. Is there any way to do either of these using perl (preferably) or python (if necessary):

    1. Join all the introns of a single gene, preserving the FASTA heading for that gene.

    > Gene 1 Intron 1
    GTACGCC....CTGATAGAG
    >Gene 1 Intron 2
    GTCCAGGAC.....CTGAGTAAG

    Becomes
    > Gene 1 Intron 1
    GTACGCC....CTGATAGAGGTCCAGGAC.....CTGAGTAAG

    or
    2. Join a number of introns together (not accounting for what genes they came from) under a non-specific FASTA formatted heading?

    Basically I want to splice together a bunch of intron sequences like they were exons so that I can run them through a program that doesn't like how short they are. The first way would be the most biologically relevant and useful for my purposes, but if it can't be done I can live with it haha. Any help would be greatly appreciated. Thanks a lot!

  • #2
    I don't think that there is a program out there that will do what you want. In part this is because FastA headers are not standardized. Basically all you need for FastA is the '>'. So it would be hard to create a general purpose program that would combine FastA sequences together based on some regular yet arbitrary criteria. That said it would be trivial (for a programmer, at least) to create a one-shot specific program that would do the combining.

    Now that I think of it, such an program would be a good one to give to a beginning programmer. Straight-forward yet not so trivial as to be boring.
    Last edited by westerman; 12-04-2012, 07:54 AM. Reason: Added words, "one-shot specific"

    Comment


    • #3
      A little fiddly but you can use the text manipulation tool in Galaxy.

      Convert your fastas to tabula formats....and use the cut column and paste files side by side to get your sequences in the same file.... you can then merge columns and convert back to fast

      Comment


      • #4
        Jackie: I'll agree that conversion to tabular is a good first step -- and this can be done via the command line as well using the FastX tools -- but I don't see how a merger could be done automatically. Let's say after the conversion I have

        Gene 1 Intron 1 <tab> GTACGCC....CTGATAGAG
        Gene 1 Intron 2 <tab> GTCCAGGAC.....CTGAGTAAG
        Gene 2 Intron 1 <tab> sequence
        Gene 3 Intron 1 <tab> sequence
        Gene 3 Intron 2 <tab> sequence
        Gene 3 Intron 3 <tab> sequence
        etc.

        How to pull all of the Gene1, Gene2, Gene3, etc. sequences together without some programming? I'm not familiar enough with Galaxy to follow your "cut column and paste files" suggestion but if it is anything like the normal Unix 'cut' and 'paste' commands then I just don't see how to do the cut'n'paste automatically.

        Perhaps if the tabular file becomes:

        Gene<tab>1<tab>Intron<tab>1<tab>Sequence
        Gene<tab>1<tab>Intron<tab>2<tab>Sequence
        etc.

        Then maybe cut'n'paste would work.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-27-2024, 06:37 PM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-27-2024, 06:07 PM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        69 views
        0 likes
        Last Post seqadmin  
        Working...
        X