Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • question to scripting gurus

    I came across a simple one line command allowing take files in a folder one by one and pipe them into another command, but I cannot find this thread. Can someone help? Say, decompress hundreds of gziped files in a folder after downloading a database, or pipe to another process.

  • #2
    Originally posted by yaximik View Post
    I came across a simple one line command allowing take files in a folder one by one and pipe them into another command, but I cannot find this thread. Can someone help? Say, decompress hundreds of gziped files in a folder after downloading a database, or pipe to another process.
    Let's see if this puts you in the right direction...

    To uncompress all the gzip files in the current directory this may suffice:
    Code:
    gunzip *.gz ## Use gunzip -r to descent in subdirectories
    To do something with each gzipped file, e.g. print out the first 15 lines and write them to a file:

    Code:
    for gz in `find mydir/ -name '*.gz'`    
    do
    gunzip -c $gz | head -n 15 > ${gz}.head
    done
    Dario

    Comment


    • #3
      This should do it: http://www.cyberciti.biz/faq/bash-loop-over-file/

      There will be analogous commands for tcsh, if that is your favorite.

      Comment


      • #4
        My first impression was, You are talking about the 'find' command.
        The find command has much more functionality than a simple bash loop. However, for simplicity I use bash loops more often.

        Here is the manpage of find

        Best,
        Simon

        Comment


        • #5
          or ls | xargs ...

          Comment


          • #6
            Originally posted by dariober View Post
            To uncompress all the gzip files in the current directory this may suffice:
            Code:
            gunzip *.gz ## Use gunzip -r to descent in subdirectories
            To do something with each gzipped file, e.g. print out the first 15 lines and write them to a file:

            Code:
            for gz in `find mydir/ -name '*.gz'`    
            do
            gunzip -c $gz | head -n 15 > ${gz}.head
            done
            Dario
            Yep.
            And to get the results in the same file, you can do

            Code:
            for gz in *.gz ; do less $gz | head -15 ; done > outfile

            Comment


            • #7
              Thanks everyone. I realized I did not express the question clearly. Gzip is just one of applications, i rather asked about general looping over files:

              Code:
              while/for/if
                  [I]something is in the folder[/I]
              do
                 [I]something to each file[/I]
              until
                 [I]all are processed[/I]
              done
              The beauty of what I saw was that it fit just in one line and was pretty general, so it can be applied to other commands that do not take '*' for multiple files.

              Comment


              • #8
                In fact, you can also do for loop one-liners with ";" or even with & to fork them to background
                For example looping over Fasta files (*.fa) and counting the headers (i.e. number of sequences) could be done on one line like so:

                Code:
                for f in *.fa; do grep -c ">"; done
                If you want to parallel unzip a bunch of files you could fork the loop like so:
                Code:
                for f in *.gz; do gunzip ${f} & done; wait
                This runs all gunzip calls in parallel and 'wait's for them to be finished (optional)

                I learned the hard way: While those one-liners are nice and quick for scripting on a terminal, never use them in longer bash-scripts because a) you will at some point have a typo and then will have a very hard time finding it and b) there is essentially no need in saving space/shortening out commands in a bash script.

                Best
                Simon
                Last edited by sisch; 01-17-2013, 01:14 AM. Reason: for loop code corrected

                Comment


                • #9
                  Hi yaximik,

                  I use this in my bash scripts:

                  Code:
                  #extract all the filenames to an array
                  files=`ls -l *.gz | awk '{ print $9 }'`
                  
                  #iterate over these
                  for i in $files
                  do
                  	echo "gunzipping file $i"
                  	gunzip $i 
                  done
                  cheers

                  Micha

                  Comment


                  • #10
                    Thanks a lot, everyone who provided inputs. That was very educational in general as I get a better idea how I can use simple loops. I wonder, would it be useful to have something like WiKiBits (or a better name) as a collection of little ingenious solutions?

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 08:47 AM
                    0 responses
                    11 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    59 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    54 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X