SEQanswers

Go Back   SEQanswers > Introductions

Similar Threads
Thread Thread Starter Forum Replies Last Post
runAnalysisFilter on GS Junior computer SABC 454 Pyrosequencing 6 12-04-2011 03:22 AM
computer for RNA-seq analysis eggplant72 Bioinformatics 2 11-12-2011 08:05 AM
HP Z400 computer issues? suefo 454 Pyrosequencing 0 09-22-2011 05:21 AM
Computer hardware requirements Najim Bioinformatics 25 04-30-2010 04:46 PM
Which Computer to Buy? polsum Bioinformatics 7 08-04-2009 09:48 PM

Reply
 
Thread Tools
Old 04-02-2010, 08:52 PM   #1
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 276
Cool Hello - I use to think I was good with a computer

I wonder how many people are in the same boat as me.

1) Institute bought a couple of GAIIs
2) No one has money to use them
3) Institute has internal competition to pay for a couple of runs (makes the donors feel better about their donation if someone uses the machines), and you are lucky enough to get funded
4) You send a couple of samples off to never-never land and someone sends back a terabyte drive or two with "next-gen sequencing data"
5) You quickly realize people that use to do survival curves in your bioinformatics core don't really know that Illumina fastq is different from Sanger fastq and the analysis they provide is limited at best
5) Now what do you do?
6) Google > seqanswers > let the misery begins

So what have I learned this week,

A) My boss should have made me read and do the "Unix and Perl for Biologist" tutorial years ago. Google it if you are new and a bench/gene jockey (sanger sequencing/microarray person) like me with no unix experience it is an excellent use of a day
B) A place called SourceForge exists
C) If I had a MAQ for my TOPHAT and a BOWTIE to go with my BWA I'd be better of than GERALD and his SAMTOOLS
D) just type "make" to compile...Opps that doesn't work if Xcode is not installed yet
E) No Mac OS comes with Xcode installed and if you have a Leopard machine, you better know where the OS install disks are as you can only install the new version for snow leopard that is not compatible...One would think that pancreatic cancer survivor Steve Jobs would try to make my life easier not harder
F) The genome is not the genome, ensembl is the place to get chromosomes but 1000 genomes is the place to get the genome.
G) BWA can align on my laptop...cool...next-gen/2nd gen alignment on a laptop and I though I needed a super computer

One step forward, one backward

PS - I generally believe in the KISS principle, so I'll try to come back and list my solutions as I bumble my way to something. But in a week I've learned enough Unix to actually like it and got a couple of lanes of PE data into IGV so I can take a look see

Last edited by Jon_Keats; 04-12-2010 at 09:50 PM.
Jon_Keats is offline   Reply With Quote
Old 04-02-2010, 08:58 PM   #2
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,283
Default

Quote:
Originally Posted by Jon_Keats View Post
I wonder how many people are in the same boat as me.

1) Institute bought a couple of GAIIs
2) No one has money to use them
3) Institute has internal competition to pay for a couple of runs (makes the donors feel better about their donation if someone uses the machines), and you are lucky enough to get funded
4) You send a couple of samples off to never-never land and someone sends back a terabyte drive or two with "next-gen sequencing data"
5) You quickly realize people that use to do survival curves in your bioinformatics core don't really know that Illumina fastq is different from Sanger fastq and the analysis they provide is limited at best
5) Now what do you do?
6) Google > seqanswers > let the misery begins

So what have I learned this week,

A) My boss should have made me read and do the "Unix and Perl for Biologist" tutorial years ago. Google it if you are new and a bench/gene jockey (sanger sequencing/microarray person) like me with no unix experience it is an excellent use of a day
B) A place called SorgeForge exists
C) If I had a MAQ for my TOPHAT and a BOWTIE to go with my BWA I'd be better of than GERALD and his SAMTOOLS
D) just type "make" to compile...Opps that doesn't work if Xcode is not installed yet
E) No Mac OS comes with Xcode installed and if you have a Leopard machine, you better know where the OS install disks are as you can only install the new version for snow leopard that is not compatible...One would think that pancreatic cancer survivor Steve Jobs would try to make my life easier not harder
F) The genome is not the genome, ensembl is the place to get chromosomes but 1000 genomes is the place to get the genome.
G) BWA can align on my laptop...cool...next-gen/2nd gen alignment on a laptop and I though I needed a super computer

One step forward, one backward

PS - I generally believe in the KISS principle, so I'll try to come back and list my solutions as I bumble my way to something. But in a week I've learned enough Unix to actually like it and got a couple of lanes of PE data into IGV so I can take a look see
Can I nominate this as the best post on seqanswers? You really deserve a prize.
nilshomer is offline   Reply With Quote
Old 04-03-2010, 05:21 AM   #3
mangrove
Junior Member
 
Location: East Central US

Join Date: Mar 2010
Posts: 1
Default

I agree... it is prizeworthy. Even at the price, the promise of all that sequence data is alluring, but no one I have talked to has gotten the data and NOT been overwhelmed. I hope this goes away as we get larger, faster computers, a guru to install all the programs (fortunately we have that), and eventually, the realization from NSF that lots of $ and many months will be required to actually use all those TBs. Good luck to us all (as Tiny Tim would say).
mangrove is offline   Reply With Quote
Old 04-04-2010, 06:33 AM   #4
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,296
Default

Nice post again Jonathan,

This is actually a great post/series of posts to kick off the "Basics in Bioinformatics" subforum that we've discussed in the Site Feedback forum. Don't be alarmed if I do some rearranging/forum creating later today.
ECO is offline   Reply With Quote
Old 04-04-2010, 08:11 AM   #5
RockChalkJayhawk
Senior Member
 
Location: Rochester, MN

Join Date: Mar 2009
Posts: 191
Default

Jon,

I was is your exact situation 6 months ago. It gets better (slowly). The UNIX and Perl for Biologists was really helpful for me, as was this forum. You're headed in the right direction, just keep it up!

And I too agree that this is the best post on the site!
RockChalkJayhawk is offline   Reply With Quote
Old 04-05-2010, 08:16 AM   #6
drio
Senior Member
 
Location: 4117'49"N / 24'42"E

Join Date: Oct 2008
Posts: 323
Default

Just catching up with SA posts.

This post is brilliant. Printing right now and posting in my cubicle. Instant classic.
__________________
-drd
drio is offline   Reply With Quote
Old 04-05-2010, 11:21 AM   #7
MQ-BCBB
Member
 
Location: Maryland

Join Date: May 2009
Posts: 22
Default loved your post

I felt exactly like that not too long ago. And yes, I still remember the great feeling when I successfully loaded my data into IGV.
MQ-BCBB is offline   Reply With Quote
Old 04-05-2010, 04:28 PM   #8
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 276
Default

Thanks I'm glad to see people find my attempt at a bit of science geek humor funny, even with the typical spelling mistakes
Jon_Keats is offline   Reply With Quote
Old 04-06-2010, 12:59 PM   #9
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 276
Default

Getting Started: Unix and Xcode

As I said in the first post I'm going to drop in a couple of posts over the next couple of days to outline my experiences to date.

In clinical training they have a mantra of "See one, Do one, Teach one" but on the research side it seems to be more "Need to do one, Figure one out, Maybe Teach one" so this will be my lame teaching attempt or at the very least a place others in our research group can get some basic instructions to replicate the pipeline I'm starting to put together. Hopefully this will be relevant to a number of people and will make some peoples life easier.

I've tested most of the following steps on both my laptop and the workstation we have in the lab (still waiting for Apple to release new Mac Pros… common Steve I'll buy an iPad if you release them in April). Obviously, I'm a Mac guy so these instructions are Mac oriented but should be comparable with any Unix/Linux environment, but that is only a guess.

*Workstation = Mac Pro with two dual core Intel Xeon5150 CPUs at 2.66Gz and 8Gb of 667MHz DDR2 RAM running Mac OSX Leopard 10.5.8*
*Laptop = MacBook Pro with an Intel Core 2 Duo CPU at 2.66GHz and 4Gb of 1067MHz DDR3 RAM running Mac OSX Snow Leopard 10.6.3*

Okay, so today you got some terabyte drives with Illumina data and you want to do something with it. The following instructions should get you ready to do something:

First thing to do is to familiarize yourself with Unix and the Terminal (Applications>Utilities>Terminal) application on your MAC. I would highly recommend working though at least the Unix portion of the "Unix and Perl for Biologists" course made public by Keith Bradham and Ian Korf at UC Davis (http://groups.google.com/group/unix-...for-biologists). I'd recommend going to their website and get the entire course package (http://korflab.ucdavis.edu/Unix_and_Perl/index.html) it is well worth a night or two of your time I promise.
If you are not going to do that you need to understand one or two commands to get going:

To get a manual on any command type "man command". Hit "space" to page down, "b" to back-up, and "q" to quit
To see what folder you are in currently type "pwd"
To see what folders and files exist in the current directory type "ls"
To move into a folder in the current directory type "cd myfolder" Note: you can move multiple levels downstream with "cd myfolder/myfolder2"
To go back one directory type "cd .." Note: you can move back multiple levels upstream with "cd ../.."
To copy a file from the current directory to a downstream folder "cp myfile myfolder/" or "cp myflie ../" to copy a file up a directory
To move a file from the current directory use "mv" in place of cp
A folder immediately downstream of the root directory (ie. absolute top of the tree) is always defined by "command /folder" (ie. if you type "cd /something" it looks for the folder "something" downstream of the root directory)
The current directory can always be noted by "./"
You will need to change the permissions of the compiled applications with "chmod 755 myfile". This makes the file readable and executable by everyone but only you can write, alternatively use 777 so anyone can do everything.

To run many of the applications Maq, BWA, Samtools, etc.. you will need to either place the applications in the PATH, define additional PATH locations, or you need to note the location of the application each time you call it. To find the current PATH directories used by Unix type "$PATH" and you should get a print out similar to the following:
-bash: /sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin

NOTE: These folders are directly below the root directory and represent places unix looks for applications to run. If you want to run any of the applications you will download and compile such as BWA you need to either type "./bwa" and the current directory must contain the bwa application. Assuming you have administrator rights to your machine, the way I initially got around this was to place the applications in one of the path directories as follows:

1) copy application to a PATH directory, type "sudo cp myfile /usr/bin" you will be prompted for you password (sudo = superuser … yes, today you are SUPERMAN)
2) make the file executable, type "sudo chmod /usr/bin/myfile"

NOTE: After running into an issue installing BFAST I'd suggest the following NOT the previous (Actually suggestion from Nils Homer...thanks)
1) create a directory in your home directory for the applications "mkdir -p $HOME/local/bin"
2) edit your .profile file so this directory is in your PATH directories
>open terminal
>type "ls -a" (You should see a file called .profile)
>open with nano "nano .profile"
*** Add the following lines to your .profile file, DO NOT remove things in the current version ***

export PATH=$HOME/local/bin:$PATH

> To save edits "control-O"
> To exit nano "control-X"

# Subsequently when you install things place the executable's in this directory so they are in a $PATH directory
# Either copy application to the directory $HOME/local/bin
# If using install script "./configure --prefix=$HOME/local"

This no longer requires sudo (Guess we shouldn't always be Superman)

Second you need to install Xcode on your Mac System so you can compile the various applications
- Download the current version, Xcode3.2, at (http://developer.apple.com/technolog...ols/xcode.html). You will have to become a member otherwise find your OS install discs and do it from the disc install option.
NOTE: This version is only compatible with Mac OSX Snow Leopard 10.6.x
- If you have a Leopard system go find the OS install discs (you need Disc 2) and install the package
> Mac OS X Install Disc 2 > open Xcode Tools folder > double click XcodeTools.mpkg

Third, for some applications like BFAST it will help to install "Fink" (...Another Nils suggestion) or "MacPorts" (seems more up to date)
- Download and install the current version from (http://www.finkproject.org/) or (http://www.macports.org/)
- You should install the package md5deep at least to install BFAST "fink install md5deep" or "port install md5deep"

Next step get the applications you need…

See the next post,

Jonathan

Last edited by Jon_Keats; 02-23-2011 at 07:21 AM. Reason: Updated, to reflect changes that occured over time
Jon_Keats is offline   Reply With Quote
Old 04-07-2010, 01:10 AM   #10
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 192
Default

rofl I shld do a version with SOLiD data with the rainbow assorted myriad of problems with colorspace. Good Post!
KevinLam is offline   Reply With Quote
Old 04-07-2010, 07:41 AM   #11
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,283
Default

Quote:
Originally Posted by KevinLam View Post
rofl I shld do a version with SOLiD data with the rainbow assorted myriad of problems with colorspace. Good Post!
I don't understand why you say there are problems with colorspace? Aligners (like BFAST) will convert your csfasta/qual files into FASTQ, will align your data sensitively, output to the SAM format, and then any SNP caller can be used without modification. Beyond specifying one command line option (to say the data is color space) during alignment there is no difference between Illumina/SOLiD (basespace/colorspace) data in terms of processing. It's the same workflow. Also the theoretical and practical benefits of colorspace (low false discovery rate) are rarely mentioned.

Sorry, just a slight pet peeve from a very happy SOLiD user.

Nils
nilshomer is offline   Reply With Quote
Old 04-07-2010, 06:33 PM   #12
KevinLam
Senior Member
 
Location: SEA

Join Date: Nov 2009
Posts: 192
Default

Quote:
Originally Posted by nilshomer View Post
I don't understand why you say there are problems with colorspace? Aligners (like BFAST) will convert your csfasta/qual files into FASTQ, will align your data sensitively, output to the SAM format, and then any SNP caller can be used without modification. Beyond specifying one command line option (to say the data is color space) during alignment there is no difference between Illumina/SOLiD (basespace/colorspace) data in terms of processing. It's the same workflow. Also the theoretical and practical benefits of colorspace (low false discovery rate) are rarely mentioned.

Sorry, just a slight pet peeve from a very happy SOLiD user.

Nils
Hi Nils,
No offence meant! It's all in good fun.
by problems I think I meant it more as caveats that you should watch for.

firstly it seems terribly important to understand dual base encoding but actually you just need an overview.

2ndly you are stuck with color space aware progs unless you wanna throw the benefits of colorspace away by direct conversion to base space and risk 3' ends being wrongly converted.
for de novo assembly with velvet you have to double encode your file into a format that looks exactly like 25 bp basespace fasta files. which can be misleading if someone else comes across the file and doesn't read the documentation you left there.

and it doesn't help that ABI's documentation for their software rarely exceeds 3 pages in pdf.

Other than that I am nearly a happy SOLiD user as you
KevinLam is offline   Reply With Quote
Old 04-07-2010, 08:09 PM   #13
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 276
Default

Getting setup and compiling the applications

As promised here is my next installment on getting a working environment going or at least my poor excuse of one. The first step I setup was a series of folders to manage the data off my terabyte drives and move it around as each step is completed. To make my examples more clear I've setup the following folders and subfolders:

Main working directory called "ngs" in my $HOME directory (Users/MeOrYou/) from which all steps and scripts will be called.
With primary subfolders: /ngs/analysisnotes
/ngs/applications
/ngs/bwase
/ngs/bwape
/ngs/finaloutputs
/ngs/scripts
With a number of secondary subfolders in each primary directory (See create_ngs_directorystructure_v3.sh script for full details)

I'm slowly building pipeline scripts to feed data from the input folders to final outputs that I'll try and post when complete.

The basic idea is that I have some raw reads from our Illumina GAIIs (exon capture and RNAseq PE data with each sample on two flowcell lanes) and I want to process them with BWA and view the alignments in the IGV browser. So I need to do a couple of things to the best of my understanding. Step one is to convert the Illumina raw data files, should look like "s_1_sequences.txt", to sanger fastq format files. To understand the differences please see the following references (http://en.wikipedia.org/wiki/FASTQ_format) or (http://maq.sourceforge.net/fastq.shtml) or (Cock, PJA et al. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nuc. Acids Res. 2010 38(6):1767-1771). The nice thing with the conversion is that the sanger format does not list the read info for both the read sequence and read quality values so the files are significantly smaller.

Step 1 - Download the following source files and patches

UPDATE - This step is no longer necessary as recent versions of BWA allow you to do this on the fly during alignment using the -I option. I'll leave this here in case people need a way to process illumina 1.3-1.7 fastq into sanger format. Watch the version of casava used by your core in the coming year as the new illumina 1.8 pipeline will output files in sanger format so conversion will no longer be needed.

A) Maq (http://sourceforge.net/projects/maq/) *TO REALLY CONFUSE YOU THE MAIN DOWNLOAD IS ACTUALLY BWA, argh...)

NOTE: I only downloaded this to use the ill2sanger command to convert Illumina 1.3+ fastq files (ie. s_1_sequence.txt) to Sanger fastq format. Other options exist BioPerl, BioPython but I couldn't figure them out

- click on "View all files"
- click on "maq" folder
- click on newest version "0.7.1" (ASSUMPTION: This should be a long standing version since it appears that the development of Maq is dead with Heng Li releasing BWA)
- click on the download file "maq-0.7.1.tar.bz2"

Now download the ill2sanger patch that is needed to convert illumina 1.3+ fastq files to Sanger fastq

- click on "Develop" tab
- click on "Tracker" tab and select the "Patches" dropdown menu

NOTE: There are two versions to download the historic one by "daweonline" and a new alternative by "joelmartin" I used the original patch as I could find install instructions on Seqanswers

- click on ID 2841164 "illumina to sanger conversion"
- click download to download "maq-ill2sanger.patch" (current version submitted 2009-08-20)

Now Compile, patch, and re-compile the application

- move "maq-0.7.1.tar.bz2" to your "NGS/ApplicationDownloads" folder
- move "maq-ill2sanger.patch" to your "NGS" folder
- double click "maq-0.7.1.tar.bz2" to decompress the file
- open up "Terminal"
- navigate to the decompressed folder cd Documents/NGS/ApplicationDownloads/maq-0.7.1
- compile as per option 2 in Maq Manual (Release 0.5.0) (http://maq.sourceforge.net/maq-man.shtml)
> enter the following command "make -f Makefile.generic"
- a bunch of "Stuff" will come up, check to ensure no errors are listed!
- apply the patch as follows:
> step back one directory "cd .."
> run "ls" command to ensure the current directory contains folder "maq-0.7.1" and the patch "maq-ill2sanger.patch"
> install the patch with the following command "cd maq-0.7.1; patch -p1 < ../maq-ill2sanger.patch" found on seqanswers (http://seqanswers.com/forums/showthread.php?t=2499)
* You should get the following messages : patching file fastq2bfq.c
patching file main.c
patching file main.h
- recompile the maq application
> "make -f Makefile.generic"

Now check to see if the conversion patch was successful
> enter "./maq"

***This should bring up a window with the maq command options, check that ill2sanger is available under the "Format Converting" section***

B) BWA (http://sourceforge.net/projects/bio-bwa/files/)

- click on the download link for the newest version "bwa-0.5.9"
- download the file "bwa-0.5.9.tar.bz2"
- move "bwa-0.5.9.tar.bz2" to your "NGS/ApplicationDownloads" folder
- double click "bwa-0.5.9.tar.bz2" to decompress the file
- open up "Terminal"
- navigate to the decompressed folder "cd Documents/NGS/ApplicationDownloads/bwa-0.5.9"
- compile with make command
> enter the following command "make" (Really its that simple, this command line stuffs not that scary)
- a bunch of "Stuff" will come up, check to ensure no errors are listed!

Now check to see if the install was successful
> enter "./bwa"

***This should bring up a window with the bwa command options***

C) SAMtools (http://sourceforge.net/projects/samtools/)

- click on the download link for the newest version "samtools-0.1.12a"
- download the file "samtools-0.1.12a.tar.bz2"
- move "samtools-0.1.12a.tar.bz2" to your "NGS/ApplicationDownloads" folder
- double click "samtools-0.1.12a.tar.bz2" to decompress the file
- open up "Terminal"
- navigate to the decompressed folder "cd Documents/NGS/ApplicationDownloads/samtools-0.1.12a"
- compile with make command
> enter the following command "make" (Really its that simple)
- a bunch of "Stuff" will come up, check to ensure no errors are listed!

Now check to see if the install was successful
> enter "./samtools"

***This should bring up a window with the samtools command options***

The next step is to make each application executable as per the previous post options and we are just about ready to go.

Since we will use BWA we need to download the reference genomes to align against. The simplest place to get the data seems to be ensembl (http://www.ensembl.org/info/data/ftp/index.html) but we run into a problem with the full human genome file (Homo_sapiens.GRCh37.57.dna.toplevel.fa.gz) as it exceeds the maximum character length allowed by the bwa index command. To get around this problem if you want to use a GRCh37/hg19 genome version the best option seems to be the 1000 genomes version (ftp://ftp.sanger.ac.uk/pub/1000genom...ect_reference/) file (human_g1k_v37.fasta.gz). Copy the human_g1k_v37.fasta.gz file to the NGS/RefGenomes folder and then decompress it by double clicking on the file.

Test Question: Do you know the difference between UCSC mapping versus NCBI/Ensembl....Hint: Its a difference of 0 and 1 but it can really ruin your day when the commercial software manufacture doesn't know the difference!!

Next step, making the computer do some of the work

See next post,

Jonathan

Last edited by Jon_Keats; 02-23-2011 at 07:35 AM. Reason: Updated to reflect changes occuring with time
Jon_Keats is offline   Reply With Quote
Old 04-07-2010, 08:37 PM   #14
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 276
Default

The following script will create all the directories noted in the previous post if you want to replicate the pipeline I'm putting together...

NOTE - THIS SCRIPT HAS BEEN UPDATED TO VERSION 3 IN A LATER POST

Code:
#!/bin/sh

# Create_NGS_DirectoryStructureV1.sh
# Created by Jonathan Keats on 4/5/10.
# This file will create the directory structure needed for the subsequent pipeline
# To get this script working do one of the following:
# Option 1 - Open Terminal
#		Navigate to directory of interest, "Documents" in my case (cd Documents/)
#		Type "nano Create_NGS_DirectoryStructureV1.sh"
#                        (This open the unix nano text editor)
#		Paste from "#!/bin/sh" to "echo Pipeline Directory Structure Created"
#		Control-O to save, Control-X to exit
#		Make executable, Type "chmod 755 Create_NGS_DirectoryStructureV1.sh"
# Option 2 -  Open Xcode
#		Click "File" and select "New File"
#		In "Choose a template for your new file" select "Shell Script"
#		In new file dialogue enter File Name:"Create_NGS_DirectoryStructureV1.sh"
#		Change location to directory of interest, "Documents" in my case
#		Click "Finish"
#		Paste from "#!/bin/sh" to "echo Pipeline Directory Structure Created"
#		Click "File" and select "Save"
#		Close the file, which should already be executable
# Regardless of Option used - type "./Create_NGS_DirectoryStructureV1.sh" to launch

echo ***Creating Pipeline Directory Structure***
pwd
ls
mkdir NGS
cd NGS/
mkdir AnalysisNotes
mkdir ApplicationDownloads
mkdir BAMfiles
mkdir FinalOutputs
mkdir InputSequence
mkdir RefGenomes
mkdir SAMfiles
mkdir Scripts
cd BAMfiles/
mkdir Merged
mkdir Original
mkdir Sorted
cd ../FinalOutputs/
mkdir AlignmentResults
mkdir Illumina
mkdir SangerFastq
mkdir SortedBAMfiles
mkdir MergedBAMfiles
cd Illumina/
mkdir Read1
mkdir Read2
cd ../../InputSequence/
mkdir Illumina
mkdir SangerFastq
cd Illumina/
mkdir Read1
mkdir Read2
cd ../../RefGenomes
mkdir BFAST_Indexed
mkdir BOWTIE_Indexed
mkdir BWA_Indexed
mkdir GenomeDownloads
cd ../Scripts
mkdir ScriptBackups
cd ../..
cp Create_NGS_DirectoryStructureV1.sh NGS/Scripts/ScriptBackups/
cd NGS/
pwd
ls
echo Pipeline Directory Structure Created

Last edited by Jon_Keats; 09-08-2010 at 12:29 PM.
Jon_Keats is offline   Reply With Quote
Old 04-09-2010, 09:30 AM   #15
Michael.James.Clark
Senior Member
 
Location: Palo Alto

Join Date: Apr 2009
Posts: 213
Default

I went from being a pipette jockey who did qPCR for a living to writing an algorithm for SV detection in SOLiD data and publishing a whole genome sequence in a major journal.

YOU CAN DO IT TOO!

And here's how:
Attached Images
File Type: jpg Giant-Coffee-Cup.jpg (27.8 KB, 256 views)
Michael.James.Clark is offline   Reply With Quote
Old 04-09-2010, 10:25 AM   #16
Kurt
Junior Member
 
Location: Baltimore

Join Date: Aug 2009
Posts: 3
Default

I'm LMAO since it appears that I'm coming to these revelations at just about the exact same time you were/are. It's definitely comforting to see somebody is figuring out the exact same stuff that I am at basically the same time I am. Everytime I talk about this stuff here, people look at me like I'm ALF or something.
Kurt is offline   Reply With Quote
Old 04-10-2010, 07:06 AM   #17
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Do yourself a favor & swear off shell scripts for Perl/Python/Ruby -- once you start trying to have any sort of conditional programming, shell becomes a nightmare.

Converting your script above to Perl is pretty easy (but left as an exercise for the student :-)
krobison is offline   Reply With Quote
Old 04-10-2010, 10:18 AM   #18
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 276
Default

Sounds like I should finish working through the "Unix and Perl for biologist" course to see if I can figure Perl out. I managed to figure out a solution for my problem about half way through the Unix section so I started with that but maybe I shouldn't have jumped the gun so much. I'll see what I can come up with this weekend.
Jon_Keats is offline   Reply With Quote
Old 04-12-2010, 10:23 AM   #19
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Aaah, I'm too big a softie. Also, IMHO the perl is more readable than shell to understand what is going on (yikes! something less readable than perl!)

Personally, I prefer directory names in all lowercase. Brevity is valuable too -- "bam" & "sam" would be my choices for those directories & I probably wouldn't bother having separate ones anyway. But that is all personal preference & you should pick what suits you.

Also: DO NOT try to use a subdirectory to backup your scripts -- it is easy to install & learn git (or a similar system) to control versions. It is much too easy to accidentally copy your file the wrong way and wipe something out; version control programs track all your changes & allow you to safely move backwards & forwards through the history of your code. They can also be a handy way to replicate code across machines (and keep them in sync) as well as a path towards collaborative coding.
Code:
#!/usr/bin/perl
use strict;
print "***Creating Pipeline Directory Structure***\n"; # \n is newline
my $root="NGS";
system("mkdir $root");

# this line only makes directories with no subdirectories
&makeSubdirs($root,
                          "AnalysisNotes", "ApplicationDownloads",
                          "InputSequence","SAMfiles");
&makeSubdirs("$root/BAMfiles","Merged","Original","Sorted");
&makeSubdirs("$root/FinalOutputs",
                     "AlignmentResults","Illumina","SangerFasta",
                       "SortedBAMfiles","MergedBAMfiles");
&makeSubdirs("$root/RefGenomes",
                   "BFAST_Indexed","BOWTIE_Indexed","GenomeDownloads");
&makeSubdirs("$root/Scripts","ScriptBackups");  # DANGEROUS!!!
print "Pipeline Directory Structure Created\n";

sub makeSubdirs
{
  my @subdirs=@_;
  my $parentDir=shift(@subdirs);
  mkdir($parentDir) if (! -d $parentDir);
  foreach my $subDir(@subdirs)
  {
     mkdir("$parentDir/$subDir");
  }
}

#cd ../Scripts
#mkdir ScriptBackups
#cd ../..
#cp Create_NGS_DirectoryStructureV1.sh NGS/Scripts/ScriptBackups/
#cd NGS/
#pwd
krobison is offline   Reply With Quote
Old 04-12-2010, 11:11 PM   #20
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 276
Default

Okay still working on figuring out the Perl version but it seem pretty obvious less some of the perl coding language. However, I'm nothing if not competitive so here is the three line version. Not sure if "krobinson" perl version is fewer characters or not but this must win for number of lines.
I guess reading the manuals for functions helps mkdir -p sure helps! I'll work on figuring out git version control and a bit more perl but I can't wait to see krobinson's response to the 400+ pipeline shell script. At least I took the suggestion to ditch the upper and lower case filenames, so everything is now lower case.

NOTE - THIS SCRIPT HAS BEEN UPATED TO VERSION 3 IN A SUBSEQUENT POST

Code:
#!/bin/sh

#Create_NGS_DirectoryStructureV2.sh
#Written by Jonathan Keats 04/12/2010

echo ***Creating Pipeline Directory Structure***
mkdir -p ngs/analysisnotes ngs/applicationdownloads ngs/bamfiles/merged ngs/bamfiles/original ngs/bamfiles/sorted ngs/finaloutputs/alignmentresults ngs/finaloutputs/illumina/read1 ngs/finaloutputs/illumina/read2 ngs/finaloutputs/mergedbamfiles ngs/finaloutputs/sangerfastq ngs/finaloutputs/sortedbamfiles ngs/inputsequences/illumina/read1 ngs/inputsequences/illumina/read2 ngs/inputsequences/sangerfastq ngs/refgenomes/bfast_indexed ngs/refgenomes/bowtie_indexed ngs/refgenomes/bwa_indexed ngs/refgenomes/genomedownloads ngs/samfiles ngs/scripts/scriptbackups
cp Create_NGS_DirectoryStructureV2.sh ngs/scripts/scriptbackups/
echo ***Pipeline Directory Structure Created***

Last edited by Jon_Keats; 09-08-2010 at 12:31 PM. Reason: Updating to match subsequent posts
Jon_Keats is offline   Reply With Quote
Reply

Tags
bwa, illumina, newbie, samtools, unix

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:51 PM.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.