Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • looping sed commands

    greatings, I was wondering if anyone could provide any clues/assitance in automating a series of commands. Currently, I am formatting Illumina data for a UPARSE/USEARCH pipeline. Post quality filtered, trimmed samples need to have the fasta header changed to from
    ">Sample1_sequence1"

    to

    ">barcodelabel=Sample1;Sample1_sequence1"

    This is easy to do with sed;

    sed 's/>Sample1/barcodelabel=Sample1/g' sample.fna > sample.rename.fna

    However, the file I have has 48 different samples, and I was wondering if there is way to automate or loop sed, so that it might be able to read the Sample names from a text file, and then replace them in the new file.

    Any suggestions are welcome. Thanks,

    -tony

  • #2
    Use the following commands to loop either linewise through or through all files of a certain directory. From that you can use e.g. "basename" to get the filename without the path or parse it in any other way you need it.

    In order to loop linewise through a file, use:
    Code:
    #Set the field seperator (IFS) to a line break
    IFS="
    "
    for line in `cat yourSampleNameFile.txt`;do
    sampleName = $line
    ...
    done
    If you want to loop through all files in a certain directory, use:
    Code:
    FILES=/path/to/your/sample/files/*
    for f in $FILES
    do
    echo "Processing file $f..."
    ...
    done

    Comment


    • #3
      sed 's/^>\([^_]*\)_/>barcodelabel=\1;\1_/g' sample.fna > sample.rename.fna

      Should work for any number of sample names as long as they are
      between a leading ">" and followed with underscore.
      one pass, no redundant external copy of sample names required.

      Pitfalls to watch out for are lines with the start & end pattern
      but are not sample definition lines and empty sample names.
      These issues can be addressed by creating a more specific pattern recognizer
      within "\([^_]*\)" which is currently just "accept anything but an underscore".

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 08:47 AM
      0 responses
      10 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      59 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X