Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Giorgio C
    Member
    • Oct 2010
    • 89

    convert a text file in fasta with decollpasing

    Hi all,

    I have this file:

    TGAGGTAGTAGATTGTATAGTT 424866
    TAGCTTATCAGACTGATGTTGA 359141
    TAGCTTATCAGACTGATGTTGAC 276052
    TGAGGTAGTAGGTTGTATAGTT 268735
    ACAGTAGTCTGCACATTGGTT 209280
    ACAGTAGTCTGCACATTGGTTA 178652
    TAGCTTATCAGACTGATGTTG 166159
    TGAGGTAGTAGGTTGTGTGGTT 105275
    TGAGGTAGTAGGTTGTATGGTT 102447
    AGCAGCATTGTACAGGGCTATGA 91296
    TGAGGTAGTAGGTTGTGTGGTTT 63300
    TGAGGTAGTAGTTTGTACAGTT 61604
    TGAGGTAGTAGATTGTATAGT 61492
    TAGCACCATCTGAAATCGGTTA 60637
    TTCAAGTAATCCAGGATAGGCT 52300
    TGAGGTAGTAGATTGTATAGTTA 50905
    TGAGGTAGTAGGTTGTATAGT 48150
    TACAGTAGTCTGCACATTGGTT 47534
    TCTACAGTCCGACGATC 45803
    ................

    They are sequences and the numbers are the respective occurrences. I would like to convert that file in a fasta format, decollapsing the sequences and giving a name like that:

    >Sample1_0
    TGAGGTAGTAGATTGTATAGTT
    >Sample1_1
    TGAGGTAGTAGATTGTATAGTT
    >Sample1_2
    TGAGGTAGTAGATTGTATAGTT
    .....
    for 424866 times.
    >Sample1_424666
    TGAGGTAGTAGATTGTATAGTT

    then
    >Sample1_424667
    TAGCTTATCAGACTGATGTTGA (the second sequences)

    The same for the other sequences in series. Is there any scripts for that purpose?

    Thanks in advance,
    Giorgio
  • vivek_
    PhD Student
    • Jul 2012
    • 164

    #2
    Code:
    awk '{for(i=0;i<=($2-1);i++) print ">Sample"NR"_"i"\n"$1}' file.txt
    This might work!

    Comment

    • Giorgio C
      Member
      • Oct 2010
      • 89

      #3
      Thanks vivek it works great!

      Comment

      • maasha
        Senior Member
        • Apr 2009
        • 153

        #4
        This can also be done with Biopieces (www.biopieces.org):

        Code:
        read_tab -i in.tab -k SEQ,COUNT | duplicate_record -k COUNT | add_ident -k SEQ_NAME -p Sample1_ | write_fasta -x

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Today, 11:10 AM
        0 responses
        5 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        41 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        102 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        123 views
        0 reactions
        Last Post SEQadmin2  
        Working...