Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • looping sed commands

    greatings, I was wondering if anyone could provide any clues/assitance in automating a series of commands. Currently, I am formatting Illumina data for a UPARSE/USEARCH pipeline. Post quality filtered, trimmed samples need to have the fasta header changed to from
    ">Sample1_sequence1"

    to

    ">barcodelabel=Sample1;Sample1_sequence1"

    This is easy to do with sed;

    sed 's/>Sample1/barcodelabel=Sample1/g' sample.fna > sample.rename.fna

    However, the file I have has 48 different samples, and I was wondering if there is way to automate or loop sed, so that it might be able to read the Sample names from a text file, and then replace them in the new file.

    Any suggestions are welcome. Thanks,

    -tony

  • #2
    Use the following commands to loop either linewise through or through all files of a certain directory. From that you can use e.g. "basename" to get the filename without the path or parse it in any other way you need it.

    In order to loop linewise through a file, use:
    Code:
    #Set the field seperator (IFS) to a line break
    IFS="
    "
    for line in `cat yourSampleNameFile.txt`;do
    sampleName = $line
    ...
    done
    If you want to loop through all files in a certain directory, use:
    Code:
    FILES=/path/to/your/sample/files/*
    for f in $FILES
    do
    echo "Processing file $f..."
    ...
    done

    Comment


    • #3
      sed 's/^>\([^_]*\)_/>barcodelabel=\1;\1_/g' sample.fna > sample.rename.fna

      Should work for any number of sample names as long as they are
      between a leading ">" and followed with underscore.
      one pass, no redundant external copy of sample names required.

      Pitfalls to watch out for are lines with the start & end pattern
      but are not sample definition lines and empty sample names.
      These issues can be addressed by creating a more specific pattern recognizer
      within "\([^_]*\)" which is currently just "accept anything but an underscore".

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      25 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      52 views
      0 likes
      Last Post seqadmin  
      Working...
      X