SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Scripting Help - Common Elements Giorgio C Bioinformatics 10 10-05-2013 10:53 AM
Scripting help to identify adaptors in reads Giorgio C Bioinformatics 7 11-10-2011 06:01 AM

Reply
 
Thread Tools
Old 01-16-2013, 05:53 AM   #1
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default question to scripting gurus

I came across a simple one line command allowing take files in a folder one by one and pipe them into another command, but I cannot find this thread. Can someone help? Say, decompress hundreds of gziped files in a folder after downloading a database, or pipe to another process.
yaximik is offline   Reply With Quote
Old 01-16-2013, 06:07 AM   #2
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 311
Default

Quote:
Originally Posted by yaximik View Post
I came across a simple one line command allowing take files in a folder one by one and pipe them into another command, but I cannot find this thread. Can someone help? Say, decompress hundreds of gziped files in a folder after downloading a database, or pipe to another process.
Let's see if this puts you in the right direction...

To uncompress all the gzip files in the current directory this may suffice:
Code:
gunzip *.gz ## Use gunzip -r to descent in subdirectories
To do something with each gzipped file, e.g. print out the first 15 lines and write them to a file:

Code:
for gz in `find mydir/ -name '*.gz'`    
do
gunzip -c $gz | head -n 15 > ${gz}.head
done
Dario
dariober is offline   Reply With Quote
Old 01-16-2013, 06:07 AM   #3
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,022
Default

This should do it: http://www.cyberciti.biz/faq/bash-loop-over-file/

There will be analogous commands for tcsh, if that is your favorite.
GenoMax is offline   Reply With Quote
Old 01-16-2013, 06:49 AM   #4
sisch
Member
 
Location: Dusseldorf, Germany

Join Date: Jun 2011
Posts: 29
Default

My first impression was, You are talking about the 'find' command.
The find command has much more functionality than a simple bash loop. However, for simplicity I use bash loops more often.

Here is the manpage of find

Best,
Simon
sisch is offline   Reply With Quote
Old 01-16-2013, 07:21 AM   #5
EGrassi
Member
 
Location: Turin, Italy

Join Date: Oct 2010
Posts: 66
Default

or ls | xargs ...
EGrassi is offline   Reply With Quote
Old 01-16-2013, 07:41 AM   #6
syfo
Just a member
 
Location: Southern EU

Join Date: Nov 2012
Posts: 103
Default

Quote:
Originally Posted by dariober View Post
To uncompress all the gzip files in the current directory this may suffice:
Code:
gunzip *.gz ## Use gunzip -r to descent in subdirectories
To do something with each gzipped file, e.g. print out the first 15 lines and write them to a file:

Code:
for gz in `find mydir/ -name '*.gz'`    
do
gunzip -c $gz | head -n 15 > ${gz}.head
done
Dario
Yep.
And to get the results in the same file, you can do

Code:
for gz in *.gz ; do less $gz | head -15 ; done > outfile
syfo is offline   Reply With Quote
Old 01-16-2013, 03:01 PM   #7
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default

Thanks everyone. I realized I did not express the question clearly. Gzip is just one of applications, i rather asked about general looping over files:

Code:
while/for/if
    something is in the folder
do
   something to each file
until
   all are processed
done
The beauty of what I saw was that it fit just in one line and was pretty general, so it can be applied to other commands that do not take '*' for multiple files.
yaximik is offline   Reply With Quote
Old 01-17-2013, 12:12 AM   #8
sisch
Member
 
Location: Dusseldorf, Germany

Join Date: Jun 2011
Posts: 29
Default

In fact, you can also do for loop one-liners with ";" or even with & to fork them to background
For example looping over Fasta files (*.fa) and counting the headers (i.e. number of sequences) could be done on one line like so:

Code:
for f in *.fa; do grep -c ">"; done
If you want to parallel unzip a bunch of files you could fork the loop like so:
Code:
for f in *.gz; do gunzip ${f} & done; wait
This runs all gunzip calls in parallel and 'wait's for them to be finished (optional)

I learned the hard way: While those one-liners are nice and quick for scripting on a terminal, never use them in longer bash-scripts because a) you will at some point have a typo and then will have a very hard time finding it and b) there is essentially no need in saving space/shortening out commands in a bash script.

Best
Simon

Last edited by sisch; 01-17-2013 at 12:14 AM. Reason: for loop code corrected
sisch is offline   Reply With Quote
Old 01-21-2013, 12:58 AM   #9
mbayer
Member
 
Location: Dundee, Scotland

Join Date: Mar 2009
Posts: 29
Default

Hi yaximik,

I use this in my bash scripts:

Code:
#extract all the filenames to an array
files=`ls -l *.gz | awk '{ print $9 }'`

#iterate over these
for i in $files
do
	echo "gunzipping file $i"
	gunzip $i 
done
cheers

Micha
mbayer is offline   Reply With Quote
Old 01-23-2013, 06:47 AM   #10
yaximik
Senior Member
 
Location: Oregon

Join Date: Apr 2011
Posts: 205
Default

Thanks a lot, everyone who provided inputs. That was very educational in general as I get a better idea how I can use simple loops. I wonder, would it be useful to have something like WiKiBits (or a better name) as a collection of little ingenious solutions?
yaximik is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:56 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO