SEQanswers

Go Back   SEQanswers > Introductions



Similar Threads
Thread Thread Starter Forum Replies Last Post
runAnalysisFilter on GS Junior computer SABC 454 Pyrosequencing 6 12-04-2011 03:22 AM
computer for RNA-seq analysis eggplant72 Bioinformatics 2 11-12-2011 08:05 AM
HP Z400 computer issues? suefo 454 Pyrosequencing 0 09-22-2011 05:21 AM
Computer hardware requirements Najim Bioinformatics 25 04-30-2010 04:46 PM
Which Computer to Buy? polsum Bioinformatics 7 08-04-2009 09:48 PM

Reply
 
Thread Tools
Old 04-13-2010, 06:27 AM   #21
krobison
Senior Member
 
Location: Boston area

Join Date: Nov 2007
Posts: 747
Default

Better metrics have to do with readability & maintainability rather than utter brevity -- it is quite possible to write extremely short code in Perl which looks like hieroglyphics.

Your program is straightforward -- but as it grows it will get less so. One thing in particular I addressed is if you change the name of certain of your directories -- in your shell script some of the directories are listed twice, which is an opportunity for error should you decide to rename them.

P.S. Only a terminal -- no internal -- Asparagine on my name!
krobison is offline   Reply With Quote
Old 04-13-2010, 07:26 AM   #22
Bruins
Member
 
Location: Groningen

Join Date: Feb 2010
Posts: 78
Default

Hi all,

Let me do my best to help out with the Perl.

A lot of basic operations have already been written in Perl. You can easily use these bits of code. If you need something done, check http://search.cpan.org/. This goes for bioinformatics things as well: check http://www.bioperl.org for stuff like dealing with sequence files.

For your Perl script I used File::Path (http://search.cpan.org/~jhi/perl-5.8.0/lib/File/Path.pm). It took me about 5 minutes to write.
Code:
#!/usr/bin/perl
use strict;
use File::Path; # CPAN modules are this easy to use

# the dirs
my $dirs = [ 'NGS/AnalysisNotes',
	  'NGS/ApplicationDownloads',
	  'NGS/InputSequence',
	  'NGS/SAMfiles', 
	  'NGS/BAMfiles/Merged', 'NGS/BAMfiles/Original', 'NGS/BAMfiles/Sorted', 
	  'NGS/FinalOutputs/AlignmentResults', 'NGS/FinalOutputs/Illumina', 'NGS/FinalOutputs/SangerFasta', 'NGS/FinalOutputs/SortedBAMfiles', 'NGS/FinalOutputs/MergedBAMfiles', 
	  'NGS/RefGenomes/BFAST_Indexed', 'NGS/RefGenomes/BOWTIE_Indexed', 'NGS/RefGenomes/GenomeDownloads',
	  'NGS/Scripts/ScriptBackups' ];

print "***Creating Pipeline Directory Structure***\n";

# eval will evaluate whatever is inside the {}. Any errors are stored in $@
# mkpath is the File::Path method that will create the given directories
# it takes three arguments in total: the dirs, whether you wish to print the created directories (in that case, 1) or not (0) and the last argument has to do with file permissions, I omitted it
eval { mkpath ( $dirs, 1 ); };

# now check if there were errors. If so, print.
if ( $@ ) {
  print "Couldn't create NGS directories: $@\n";
}
print "Pipeline Directory Structure Created\n";
Hope this helps?

Cheers,
Wil
Bruins is offline   Reply With Quote
Old 04-13-2010, 09:50 AM   #23
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

To Genome or Not to Genome

I'm sure I'm not the only person facing a conundrum on which genome version to use for alignment. In general I'm a believer in the "out with the old, in with the new" mentality so my gut is pushing me to use the newest genome build GRCh37/hg19/Ensembl55+ over the old version NCBI36/hg18/Ensembl38-54. Plus CCDS has been on GRCh37 for nearly a year now and they just released dbSNP131 on GRCh37 so it's getting harder and harder to ignore the newest build. BUT, for my dataset I have paired copy number data on Agilent arrays, which still use hg18 design files. So after wasting a couple of days aligning things to GRCh37 (1000 genomes version) I'm going back to redo it with hg18 so my coordinates match between the two datasets, though wasting might be the wrong word as I'm betting I'll want to be on the new version in a month. To get this genome reference file I decided to use the UCSC chromosome files (http://hgdownload.cse.ucsc.edu/golde...8/chromosomes/) since I'd seen someone mentioning the advantage of their simple naming structure. I did not use the various haplotype builds or the random contigs.
- Downloaded each individual chromosome file Chr1-Chr22, ChrX, ChrY, and ChrM.
- Placed them all in my "ngs/refgenomes/genome_downloads/hg18" folder
- Navigate to the folder in Terminal "cd ngs/refgenomes/genome_downloads/hg18"
- Decompress the files using gunzip, type "gunzip *.fa.gz"
- Then merge the files using unix cat command [cat (infile1) (infile2) (infileN) > (outfile)]
- Type "cat chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa chr6.fa chr7.fa chr8.fa chr9.fa chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fa chr18.fa chr19.fa chr20.fa chr21.fa chr22.fa chrX.fa chrY.fa chrM.fa > hg18.fasta"
- check the ouput file contains all chromosomes in the correct order using unix grep command to pull out the fasta header lines
- Type "grep ">" hg18.fa"
- Which should output the following if correct:
>chr1
>chr2
>chr3
>chr4
>chr5
>chr6
>chr7
>chr8
>chr9
>chr10
>chr11
>chr12
>chr13
>chr14
>chr15
>chr16
>chr17
>chr18
>chr19
>chr20
>chr21
>chr22
>chrX
>chrY
>chrM

A similar process should be possible with any genome version or source like ensembl. Just make sure the end platform supports the version you are using. And be careful with the annotation files if you get them from UCSC as the positions are 0 base indexed.

Last edited by Jon_Keats; 09-08-2010 at 12:35 PM. Reason: Found error in suggested cat command
Jon_Keats is offline   Reply With Quote
Old 04-13-2010, 10:02 AM   #24
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

The Shell Script from Hell or My Masterpiece you Decide

I'll preface this with acknowledging all the help from various people who have replied to this thread and I'll promise to try and figure out perl over shell scripts, but two weeks ago the "Hello World!" script in any programing language was "Greek" to me so bare with my ignorance for a week or two more. But in the interest of sharing here is my current bwa samse pipeline. I'm stuck at not being able to merge the sorted bam files but at least this lets me get back to the lab and opens up some of my time to write the back log of papers I need to polish off so I can get my own lab and make someone else deal with these issues
**Currently, this uses all default settings so modify commands as you see fit***

Code:
#!/bin/sh

# Batch_BWAsamse_V1.sh
# Created by Jonathan Keats on 04/05/10.
# This script is designed to take a batch of raw Illumina 1.3+ reads to sorted BAM files.
# It is designed to be initiated from a folder called "NGS" with a specific subdirectory structure
# Use the script called "Create_NGS_DirectoryStructureV1.sh" or the other variations (see Jon_Keats SeqAnswers Introduction Thread) to create the analysis directory structure
# To run this script you MUST first place your reference file in "NGS/RefGenomes/BWA_Indexed" and have run the "bwa index" command to create the BWT index files
# If you are NOT using a reference genome called "hg18.fa" you will NEED to edit lines 124 and 150 accordingly!
# The script is based on having ***RENAMED*** Illumina files in NGS/InputSequence/Illumina/Read1 and if available NGS/InputSequence/Illumina/Read2 if paired end runs
# The renamed format MUST be "YourSampleName_RunX_LaneX_R1.txt" and "YourSampleName_RunX_LaneX_R2.txt" otherwise files will be overwritten, paired-end analysis will not be possible, and lane merging will not be possible 
# At each step it queries specific folders for available files and passes them to the next analysis module
# After each step the filename extension of the output files are corrected. (ie. "MySequenceFile.txt.fastq" to "MySequenceFile.fastq")
# Order of Embedded Steps	- Converts Illumina 1.3+ fastq files "s_1_sequence.txt" to Sanger fastq using "maq ill2sanger" command
#							- Aligns created fastq files to reference genome using "bwa aln" command
#							- Generates SAM files from alignment files using "bwa samse" command
#							- Converts SAM files to BAM files using "samtools view" command
#							- Sorts BAM files using "samtools sort" command
#							- Final output files are archived then the input and analysis directories are cleaned-up and readied for the next analysis batch
#    ***DURING CLEAN-UP THE ENTIRE CONTENTS OF SEVERAL FOLDERS ARE DELETED, SO ONLY PLACE ADDITIONAL FILES IN THE "NGS/FINALOUTPUTS" DIRECTORY OR OUTSIDE THE NGS FOLDER SUBDIRECTORIES***
# The script creates a log file in "NGS/AnalysisNotes" to track the steps completed and the time each step started and finished
# Some of the log events will print to both the terminal screen and the log file so you can see what is going on
# On our workstation this will process two lanes of PE data overnight or a full run over the weekend

# Sould you merge files (lane 1 and 2) before analysis as illumina fastq files or after sorting BAM files?
# If you merge before analysis does this mess up the a paired-end analysis?

# Carabelle - Can you have a fail safe start query to kill the script if it is not initiated from the "NGS" folder? Probably doesn't run anyways
# Carabelle - Also can you check if destination folders are empty and if not kill the script?
# Carabelle - Such as if directory NGS/SAMfiles is NOT empty kill script and print "The NGS/SAMfiles directory is not empty please remove files otherwise they will be deleted"
# Ligne=Line en francais (I had a significant amount of help from our French post-docs husband who is a unix programer in Paris)

#Starting directory = NGS/

echo ***Starting Analysis Batch***
date '+%m/%d/%y %H:%M:%S'

#The following step creates the log file in the AnalysisNotes subdirectory the first time the script is run
#On subsequent runs the results are printed at the bottom of the pre-existing log file

echo ***Starting Analysis Batch*** >> AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> AnalysisNotes/Analysis.log

#In this step we will converted the original Illumina 1.3+ fastq format files to Sanger fastq format
#Input files from NGS/InputSequences/Illumina/Read1 and NGS/InputSequences/Illumina/Read2
#Output files to NGS/InputSequences/SangerFastq

echo Starting Step1a - Read1 Fastq Conversion with maq ill2sanger
date '+%m/%d/%y %H:%M:%S'
echo Starting Step1a - Read1 Fastq Conversion with maq ill2sanger >> AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> AnalysisNotes/Analysis.log
cd InputSequence/Illumina/Read1
#Current directory = NGS/InputSequence/Illumina/Read1
echo Converting the following Illumina files from the Read1 folder:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
echo Converting the following Illumina files from the Read1 folder: >> ../../../AnalysisNotes/Analysis.log
for ligne in `ls *.txt`
do
echo $ligne >> ../../../AnalysisNotes/Analysis.log
done
for ligne in `ls *.txt`
do
maq ill2sanger $ligne ../../SangerFastq/$ligne.fastq
done
echo Finished Step1a - Read1 Fastq Conversion
date '+%m/%d/%y %H:%M:%S'
echo Finished Step1a - Read1 Fastq Conversion >> ../../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../AnalysisNotes/Analysis.log
cd ../Read2
#Current directory = NGS/InputSequence/Illumina/Read2
echo Starting Step1b - Read2 Fastq Conversion with maq ill2sanger
date '+%m/%d/%y %H:%M:%S'
echo Starting Step1b - Read2 Fastq Conversion with maq ill2sanger >> ../../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../AnalysisNotes/Analysis.log
echo Converting the following Illumina files from the Read2 folder:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
echo Converting the following Illumina files from the Read2 folder: >> ../../../AnalysisNotes/Analysis.log
for ligne in `ls *.txt`
do
echo $ligne >> ../../../AnalysisNotes/Analysis.log
done
for ligne in `ls *.txt`
do
maq ill2sanger $ligne ../../SangerFastq/$ligne.fastq
done
cd ../../SangerFastq/
#Current directory = NGS/InputSequence/SangerFastq
#The next step will change the file extensions of ths Sanger Fastq outputs from "File_Run1_Lane1_R1.txt.fastq" to "File_Run1_Lane1_R1.fastq"
old_ext=txt.fastq
new_ext=fastq
find . -type f -name "*.$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done
echo Finished Step1b - Read2 Fastq Conversion
date '+%m/%d/%y %H:%M:%S'
echo Finished Step1b - Read2 Fastq Conversion >> ../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../AnalysisNotes/Analysis.log

#In the next step we will align the converted sanger fastq format files to the reference genome (hg18.fa)

echo Starting Step2 - bwa aln process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step2 - bwa aln process >> ../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../AnalysisNotes/Analysis.log
echo The following fastq files will be aligned:
for ligne in `ls *.fastq`
do                                                                     
echo $ligne
done
echo The following fastq files will be aligned: >> ../../AnalysisNotes/Analysis.log
for ligne in `ls *.fastq`
do
echo $ligne >> ../../AnalysisNotes/Analysis.log
done
for ligne in `ls *.fastq`
do
bwa aln ../../RefGenomes/BWA_Indexed/hg18.fa $ligne > $ligne.sai 	 
done
echo Finished Step2 - bwa aln process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step2 - bwa aln process >> ../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../AnalysisNotes/Analysis.log

#In the next step we will generate SAM files for the alignments using bwa samse

echo Starting Step3 - bwa samse process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step3 - bwa samse process >> ../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../AnalysisNotes/Analysis.log
echo The following alignment files will be converted to SAM files:
for ligne in `ls *.sai`
do                                                                     
echo $ligne
done
echo The following alignment files will be converted to SAM files: >> ../../AnalysisNotes/Analysis.log
for ligne in `ls *.sai`
do
echo $ligne >> ../../AnalysisNotes/Analysis.log
done
#To get this step to work I had to trick the system to get the conventional (bwa samse <database.fasta> <aln.sai> <input.fastq> > aln.sam) command as I can only pull up one list at a time to automatically feed files into the command
for ligne in `ls *.fastq`
do
bwa samse ../../RefGenomes/BWA_Indexed/hg18.fa $ligne.sai $ligne > ../../SAMfiles/$ligne.sam
done
#This step renames the bwa aln output files from "File_Run1_Lane1_R1.fastq.sai" to "File_Run1_Lane1_R1.sai"
#This is done on purpuse at this point to allow the preceeding step to work
old_ext=fastq.sai
new_ext=sai
find . -type f -name "*.$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done 
cd ../../SAMfiles/
#Current directory = NGS/SAMfiles
#This step renames the bwa samse output files from "File_Run1_Lane1_R1.fastq.sam" to "File_Run1_Lane1_R1_bwase.sam"
#We up in the "_bwase" to indicate the files are bwa alignments converted to SAM files with the bwa samse command
old_ext=.fastq.sam
new_ext=_bwase.sam
find . -type f -name "*$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done 
echo Finished Step3 - bwa samse process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step3 - bwa samse process >> ../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../AnalysisNotes/Analysis.log

#In the next step we will convert each SAM file to a BAM file

echo Starting Step4 - samtools SAM to BAM conversion
date '+%m/%d/%y %H:%M:%S'
echo Starting Step4 - samtools SAM to BAM conversion >> ../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../AnalysisNotes/Analysis.log
echo The following SAM files will be converted to BAM files:
for ligne in `ls *.sam`
do                                                                     
echo $ligne
done
echo The following SAM files will be converted to BAM files: >> ../AnalysisNotes/Analysis.log
for ligne in `ls *.sam`
do                                                                     
echo $ligne >> ../AnalysisNotes/Analysis.log
done
for ligne in `ls *.sam`
do
samtools view -bS -o ../BAMfiles/Original/$ligne.bam $ligne
done
cd ../BAMfiles/Original
#Current directory = NGS/BAMfiles/Original
#This step renames the samtools view output files from "File_Run1_Lane1_R1_bwase.sam.bam" to "File_Run1_Lane1_R1_bwase.bam"
old_ext=sam.bam
new_ext=bam
find . -type f -name "*.$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done 
echo Finished Step4 - samtools SAM to BAM conversion
date '+%m/%d/%y %H:%M:%S'
echo Finished Step4 - samtools SAM to BAM conversion >> ../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../AnalysisNotes/Analysis.log

#In the next step we will sort the BAM file by chromosome coordinate

echo Starting Step4 - samtools BAM sorting process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step4 - samtools BAM sorting process >> ../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../AnalysisNotes/Analysis.log
echo The following BAM files will be sorted:
for ligne in `ls *.bam`
do                                                                     
echo $ligne
done
echo The following BAM files will be sorted: >> ../../AnalysisNotes/Analysis.log
for ligne in `ls *.bam`
do                                                                     
echo $ligne >> ../../AnalysisNotes/Analysis.log
done
for ligne in `ls *.bam`
do                                                                     
samtools sort $ligne ../Sorted/$ligne
done
cd ../Sorted
#Current directory = NGS/BAMfiles/Sorted
#This step renames the samtools sort output files from "File_Run1_Lane1_R1_bwase.bam.bam" to "File_Run1_Lane1_R1_bwase_sorted.bam
old_ext=.bam.bam
new_ext=_sorted.bam
find . -type f -name "*$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done 
echo Finished Step4 - samtools BAM sorting process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step4 - samtools BAM sorting process >> ../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../AnalysisNotes/Analysis.log
cd ../..
#Current directory = NGS/

#In the next step we will move files from the input and analysis directories to their respective "NGS/FinalOutputs" subdirectory so the pipeline can be used another time.

echo Cleaning up Input and Analysis Directories
date '+%m/%d/%y %H:%M:%S'
echo Cleaning up Input and Analysis Directories >> AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> AnalysisNotes/Analysis.log
cd InputSequence/Illumina/Read1
#Current directory = NGS/InputSequence/Illumina/Read1
echo Moving the following Illumina Fastq Files from NGS/InputSequence/Illumina/Read1 to NGS/FinalOutputs/Illumina/Read1:
for ligne in `ls *.txt`
do
echo $ligne
done
echo Moving Illumina Fastq Files from NGS/InputSequence/Illumina/Read1 to NGS/FinalOutputs/Illumina/Read1 >> ../../../AnalysisNotes/Analysis.log
for ligne in `ls *.txt`
do
echo $ligne >> ../../../AnalysisNotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../FinalOutputs/Illumina/Read1/
done
cd ../Read2
#Current directory = NGS/InputSequence/Illumina/Read2
echo Moving the following Illumina Fastq Files from NGS/InputSequence/Illumina/Read2 to NGS/FinalOutputs/Illumina/Read2:
for ligne in `ls *.txt`
do
echo $ligne
done
echo Moving Illumina Fastq Files from NGS/InputSequence/Illumina/Read2 to NGS/FinalOutputs/Illumina/Read2 >> ../../../AnalysisNotes/Analysis.log
for ligne in `ls *.txt`
do
echo $ligne >> ../../../AnalysisNotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../FinalOutputs/Illumina/Read2/
done
echo Moving Illumina Input Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Illumina Input Files Complete >> ../../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../AnalysisNotes/Analysis.log
cd ../../SangerFastq
#Current directory = NGS/InputSequence/SangerFastq
echo Moving the following Sanger Format Fastq Files from NGS/InputSequence/SangerFastq to NGS/FinalOutputs/SangerFastq:
for ligne in `ls *.fastq`
do
echo $ligne
done
echo Moving the following Sanger Format Fastq Files from NGS/InputSequence/SangerFastq to NGS/FinalOutputs/SangerFastq: >> ../../AnalysisNotes/Analysis.log
for ligne in `ls *.fastq`
do
echo $ligne >> ../../AnalysisNotes/Analysis.log
done
for ligne in `ls *.fastq`
do
mv $ligne ../../FinalOutputs/SangerFastq/
done
echo Moving Sanger Format Fastq Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Sanger Format Fastq Files Complete >> ../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../AnalysisNotes/Analysis.log
echo Moving the following Alignment .sai Files from NGS/InputSequence/SangerFastq to NGS/FinalOutputs/AlignmentResults:
for ligne in `ls *.sai`
do
echo $ligne
done
echo Moving the following Alignment .sai Files from NGS/InputSequence/SangerFastq to NGS/FinalOutputs/AlignmentResults: >> ../../AnalysisNotes/Analysis.log
for ligne in `ls *.sai`
do
echo $ligne >> ../../AnalysisNotes/Analysis.log
done
for ligne in `ls *.sai`
do
mv $ligne ../../FinalOutputs/AlignmentResults/
done
echo Moving Alignment Results Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Alignment Results Files Complete >> ../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../AnalysisNotes/Analysis.log
cd ../../BAMfiles/Sorted/
#Current directory = NGS/BAMfiles/Sorted/
echo Moving the following Sorted BAM files from NGS/BAMfiles/Sorted/ to NGS/FinalOutputs/SortedBAMfiles:
for ligne in `ls *.bam`
do
echo $ligne
done
echo Moving the following Sorted BAM files from NGS/BAMfiles/Sorted/ to NGS/FinalOutputs/SortedBAMfiles: >> ../../AnalysisNotes/Analysis.log
for ligne in `ls *.bam`
do
echo $ligne >> ../../AnalysisNotes/Analysis.log
done
for ligne in `ls *.bam`
do
mv $ligne ../../FinalOutputs/SortedBAMfiles/
done
echo Moving Sorted BAM Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Alignment Results Files Complete >> ../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../AnalysisNotes/Analysis.log
cd ../../Samfiles
#Current directory = NGS/SAMfiles
echo Deleting the following SAM Files from NGS/SAMfiles:
for ligne in `ls *.sam`
do
echo $ligne
done
echo Deleting the following SAM Files from NGS/SAMfiles: >> ../AnalysisNotes/Analysis.log
for ligne in `ls *.sam`
do
echo $ligne >> ../AnalysisNotes/Analysis.log
done
for ligne in `ls *.sam`
do
rm $ligne
done
echo Deleting SAM Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Deleting SAM Files Complete >> ../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../AnalysisNotes/Analysis.log
cd ../BAMfiles/Original/
#Current directory = NGS/BAMfiles/Original
echo Deleting the following BAM Files from NGS/BAMfiles/Original:
for ligne in `ls *.bam`
do
echo $ligne
done
echo Deleting the following BAM Files from NGS/BAMfiles/Original: >> ../../AnalysisNotes/Analysis.log
for ligne in `ls *.bam`
do
echo $ligne >> ../../AnalysisNotes/Analysis.log
done
for ligne in `ls *.bam`
do
rm $ligne
done
echo Deleting BAM Files from NGS/BAMfiles/Original Complete
date '+%m/%d/%y %H:%M:%S'
echo Deleting BAM Files from NGS/BAMfiles/Original Complete >> ../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../AnalysisNotes/Analysis.log
cd ../..
#Current directory = NGS/
echo Cleaning up Input and Analysis Directories Complete
date '+%m/%d/%y %H:%M:%S'
echo Cleaning up Input and Analysis Directories Complete >> AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> AnalysisNotes/Analysis.log
echo ***Analysis Batch Complete***
echo ***Analysis Batch Complete*** >> AnalysisNotes/Analysis.log

# Future development: Merge BAM files based on filename structure "RunX_LaneX_RX", Index files for IGV viewing
# Figure out Perl or Python so I don't look like such an amateur :)
Jon_Keats is offline   Reply With Quote
Old 04-14-2010, 12:39 PM   #25
greigite
Senior Member
 
Location: Cambridge, MA

Join Date: Mar 2009
Posts: 141
Wink

Quote:
Originally Posted by Jon_Keats View Post
The Shell Script from Hell or My Masterpiece you Decide

I'll preface this with acknowledging all the help from various people who have replied to this thread and I'll promise to try and figure out perl over shell scripts, but two weeks ago the "Hello World!" script in any programing language was "Greek" to me so bare with my ignorance for a week or two more. But in the interest of sharing here is my current bwa samse pipeline. I'm stuck at not being able to merge the sorted bam files but at least this lets me get back to the lab and opens up some of my time to write the back log of papers I need to polish off so I can get my own lab and make someone else deal with these issues
**Currently, this uses all default settings so modify commands as you see fit***
lol, one of the main benefits of new-PIship is being able to tell someone else to do all the boring stuff- in fact a lot of science revolves around finding other people to do your work

that said I do actually kind of enjoy dealing with these kind of informatics problems and I'm glad you are providing an introduction that many folks will find useful. I wanted to point out another alignment program you haven't mentioned, Mosaik (from the Marth lab at BC), which has some of the best documentation of any of the short read aligners I've used. The other advantage of Mosaik is that it allows much higher levels of divergence between the read and the reference than BWA/Bowtie does. I've found this useful as I'm working with reads from related bacterial strains that can diverge substantially from the reference, yet still have alignments supported with high depth. Mosaik alignments can be easily converted to SAM/BAM format for viewing in IGV/snp calling with SAMtools and there is also an associated SNP caller (Gigabayes).
I am also a big fan of IGV due to its flexibility. It is relatively easy to write Perl scripts to create your own custom tracks for viewing using the gff format. For example, I created a custom track that shows me the location of all SNPs above certain depth/quality thresholds.
greigite is offline   Reply With Quote
Old 04-15-2010, 05:00 AM   #26
NextGenSeq
Senior Member
 
Location: USA

Join Date: Apr 2009
Posts: 482
Default

Thank god we have CLCBio for assembly and variant detection. This post has shown how archaic the freeware tools are.

We can assemble and detect variants on 30x human whole exome data overnight (admittedly that is the easy part now).
NextGenSeq is offline   Reply With Quote
Old 04-15-2010, 09:46 PM   #27
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

Making Some Progress

Getting ready to head out for AACR and get depressed about how far ahead the big sequencing centers are in the sequencing world. I'm betting we'll seeing at least 100 cancer genomes being discussed, any other guesses? Well I know of at least 40 so that might be on the low side.
The good news on the home front is that I've cleaned up a bunch of disk space on our workstation and will have the poor beast going for the next 3-4 days processing data. So there should be a big pile of data to sift through when I get back. One lesson I learned is that pipelines are great but you need to remember how much disk space the whole pipeline uses....or your morning starts with an error message of "0 bytes remaining". What the hell are bytes, I know of these things called megabytes, gigabytes, or terabytes, and I roughly remember kilobytes...I though the sticker on our machine said Mac Pro not Apple IIE. Therefore, version2 of the pipeline will need some kind of a check for the available amount of disk space relative to the number of samples to be processed. Oh goodie, one more thing to learn and I thought I was smart enough already!

RE: Comments

MOSAIK, funny you mention it the guys were up at the Mothership giving at talk on Wednesday that we tuned into over the webcast. Sounded pretty good from what they showed but it did sounds like it is a bit to much of a memory hog for our current workstation. However, they mention that an upcoming release is coming that is less memory intensive.

CLCBio: I'm a all for turn key solutions and would be the first to say if you can drop thousands of dollars on data generation you need to be willing to pay for analysis...otherwise what is the point. But as a new user it seems simple for most sequencing cores to have in place basic pipelines using open source software that can generate the basic outputs that most investigators are looking to get back.

Important Note: If you use the open source software packages please remember someone put a great deal of work into even the bad ones and many of the developers are great citizens on this forum. So if you see requests like Nils' request for grant support letters for BFAST, or similar ones from the other developers if they make similar requests, please take a couple of minutes to help the people that help you!
Jon_Keats is offline   Reply With Quote
Old 04-28-2010, 10:53 PM   #28
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

Remove the Duplicates

Well the meeting is over and I'm disappointed we didn't see a bigger impact of sequencing. Boo Hoo to all the people who only talked about published datasets, whats the point of a meeting if people only talk about their nature paper from 3 months ago...I read that already! But a couple of little birdies told me about several big, multi-patient, papers that are already accepted for publication...so does that mean 40 tumor normal pairs is an Oncogene paper now? Okay enough complaining.

So if you have been following the thread I've managed to get from some raw Illumina reads to sorted BAM files by pipe lining the following commands in a unix shell script:
bwa index -a bwtsw hg18.fasta (done manually as it only needs to be done once)
maq ill2sanger s_x_sequence.txt s_x_sequence.fastq
bwa aln hg18.fasta s_x_sequence.fastq > s_x_sequence.sai
bwa samse hg18.fasta s_x_sequence.sai s_x_sequence.fastq > s_x_sequence.sam
samtools view -bS -o s_x_sequence.bam s_x_sequence.sam
samtools sort s_x_sequence.bam s_x_sequence_sorted.bam

But I'm still stuck at having multiple BAM files per sample and having to manually merge the files after the current pipeline finishes using:

samtools merge PatientX_MergedBAM.bam s_1_sequence_sorted.bam s_2_sequence_sorted.bam

Then I create index files (needed for IGV) using:

samtools index PatientX_MergedBAM.bam

And then I drop them into the IGV browser (anyone else find it is faster to navigate around with chromosome coordinates than gene names?) FYI - I use the binary distribution version so I can use more than 2GB of RAM, since I can't use the 10GB version on our 8GB workstation. I've modified the igv_mac-intel.sh script to -Xmx6144m and then launch it from Terminal to open IGV with 6.144GB of RAM.

After looking at the data in IGV I came to appreciate how many duplicates are present in my dataset... So I'm back to google, seqanswers trying to figure out how to do something.. What I've come to understand is that most people, including Heng Li who built bwa/samtools, recommends using Picard to mark/remove duplicates over samtools as picard removes duplicates on multiple chromosomes. So back to sourceforge.com to download Picard (http://sourceforge.net/projects/picard/files/). I downloaded the current version and placed the folder in my "NGS/ApplicationDownloads/" folder and decompressed it. Right now I can't figure out how to put java .jar files in my JAVA path (I'm sure I can figure it out but I'm getting lazy and just copy the .jar files around as needed, bad I know).

I've encountered one issue with my BWA-Samtools BAM files giving an error because the unmapped reads have characters in the CIGAR string which makes Picard grumpy. So you need to add one additional option to the command to set the file validation stringency to silent:

java -Xmx4g -jar MarkDuplicates.jar INPUT=PatientX_MergedBAM.bam OUTPUT=PatientX_MergedBAM_NoDups.bam METRICS_FILE=PatientX_NoDups_Stats.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT

Next getting piled up in my pileup files...

Last edited by Jon_Keats; 09-02-2010 at 07:43 PM. Reason: updated to match entire thread
Jon_Keats is offline   Reply With Quote
Old 05-05-2010, 09:09 AM   #29
golharam
Member
 
Location: Philadelphia, PA

Join Date: Dec 2009
Posts: 55
Default

I'll one up you...just a cleaner mkdir command.

#!/bin/sh

#Create_NGS_DirectoryStructureV3.sh
#Written by Ryan Golhar 05/05/2010

echo ***Creating Pipeline Directory Structure***
mkdir -p ngs/{analysisnotes,applicationdownloads,samfiles}
mkdir -p ngs/bamfiles/{merged,original,sorted}
mkdir -p ngs/finaloutputs/{alignmentresults,mergedbamfiles,sangerfastq,sortedbamfiles}
mkdir -p ngs/finaloutputs/illumina/{read1,read2}
mkdir -p ngs/inputsequences/illumina/{read1,read2,sangerfastq}
mkdir -p ngs/refgenomes/{bfast_indexed,bowtie_indexed,bwa_indexed,genomesdownloads}
mkdir -p ngs/scripts/scriptbackups

cp Create_NGS_DirectoryStructureV2.sh ngs/scripts/scriptbackups/
echo ***Pipeline Directory Structure Created***
golharam is offline   Reply With Quote
Old 05-07-2010, 10:05 AM   #30
Kurt
Junior Member
 
Location: Baltimore

Join Date: Aug 2009
Posts: 3
Default Re: Removing the duplicates

Noticed this a 2nd ago. It looks like you are working with single end sequencing data which you wouldn't want to remove duplicates.

http://sourceforge.net/apps/mediawik...rged_alignment

(Item #6)

The duplicate removal across chromosomes would only apply for paired end data.
Kurt is offline   Reply With Quote
Old 05-07-2010, 10:52 AM   #31
nilshomer
Nils Homer
 
nilshomer's Avatar
 
Location: Boston, MA, USA

Join Date: Nov 2008
Posts: 1,285
Default

Quote:
Originally Posted by Kurt View Post
Noticed this a 2nd ago. It looks like you are working with single end sequencing data which you wouldn't want to remove duplicates.

http://sourceforge.net/apps/mediawik...rged_alignment

(Item #6)

The duplicate removal across chromosomes would only apply for paired end data.
You should definitely remove duplicates on single end data if your coverage is not too high. The point is if you have 200x coverage, then you expect many reads to have the same start position, while for low-coverage, this happens by random chance infrequently.
nilshomer is offline   Reply With Quote
Old 05-07-2010, 11:09 AM   #32
Kurt
Junior Member
 
Location: Baltimore

Join Date: Aug 2009
Posts: 3
Default

Quote:
Originally Posted by nilshomer View Post
You should definitely remove duplicates on single end data if your coverage is not too high. The point is if you have 200x coverage, then you expect many reads to have the same start position, while for low-coverage, this happens by random chance infrequently.
Would this still apply for a capture enrichment technology (say for Agilent's Sure Select platform or Rain Dance)? We haven't done those here for single end (and I'm not sure if we ever would), but I'm just wondering out loud at this point I guess. Sorry, I know that this doesn't necessarily apply to Keat's post.

Last edited by Kurt; 05-07-2010 at 11:13 AM. Reason: clarification
Kurt is offline   Reply With Quote
Old 05-16-2010, 10:39 PM   #33
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

In my case the data we have is paired-end but the first alignment test I've done was using bwa in single end mode. Odd choice I know, but this is actually an mRNAseq dataset and when aligned to genome the paired-end mode causes a lot of artifacts as it tries to pair reads between exons that often exceed the typical insert size.
Regardless, I would always recommend to remove duplicates single-end or paired-end or mate-pair for that matter. In a real life example; a whole genome seq (multiple runs), 1 library, duplicates removed per run NOT across all runs, interesting biological hit is PCR artifact (identical read in multiple runs).
Remember that for single-end reads duplicate removal limits your coverage to a max of your read length x2. Obviously it can be higher for paired-end reads were one read maybe identical but the other read is different.

I like your version golharam, thanks for sharing

Last edited by Jon_Keats; 01-05-2011 at 10:31 PM. Reason: found error
Jon_Keats is offline   Reply With Quote
Old 06-02-2010, 08:39 PM   #34
Fabrice ODEFREY
Member
 
Location: Melbourne

Join Date: May 2010
Posts: 21
Default

Thanks heaps for your posts Jonathan, this is very useful.
I'm now in the same position you were a few month ago (on a SOLiD) and took the approach to first learn linux and Perl to then do some analysis (mainly because data are not there yet...). Looking forward to more interesting posts from you soon as it really helps newbies (at least me) to have a better overview of the pipeline to implement.
Fabrice ODEFREY is offline   Reply With Quote
Old 09-08-2010, 09:44 AM   #35
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

Time to Git a Linux Machine

I'm slowly getting back up and running after moving from my post-doc to an independent position. Other than learning how damn expensive everything is I'm slowly deciding that I should swear off the idea of a new MacPro workstation for a Linux workstation given all the issues I seem to run into with the "not quite so standard Mac OSX10.6 implementation of Unix". But with the idea of sharing my ongoing experiences and a trail I can follow to build my next machine I thought I'd update my thread. I hope some people have found it useful...

Some new idea's and an update

1) I'm becoming increasing certain that I'm getting good enough at command line issues to REALLY mess up my system
2) My list of used programs continues to increase as I try each new sequencing method
3) As per issue 1 - I'm also not reading instructions very well. New Rule - If at first you don't succeed...Go back and read the damn instructions again because most likely you didn't follow them correctly!

New Applications to Install

As previously stated in the post if you are using a Mac OS environment you need to do a couple of special things

A) Install Xcode on your system (See earlier post)
B) Install Fink on your system (See earlier post)
- install the following fink packages:md5deep and pkgconfig
- "fink install md5deep" (needed for bfast install)
- "fink install pkgconfig" (needed for fastx-toolkit install)
C) Install Git on your system (http://git-scm.com/)
D) Create a $PATH Directory and update this directory in your .profile (See earlier post for instructions)
- In my case "$HOME/local/bin"

1) Install FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/)

***Why did I get this package***
Because I have some illumina mate-pair data that I want to analyze with BWA using sampe but the reads need to be reverse complemented to work correctly by my understanding so I'm using the fastx_reverse_complement application in the package that seems to be very fast and correctly reverse complements the reads and reverses the quality values

Instructions:
- Go to download page and download the following:
a) fastx_toolkit-0.0.13.tar.bz2
b) libgtextutils-0.6.tar.bz2
- Move both to ngs/applications folder and unpack both packages
- In Terminal navigate the libgtextutils folder "cd ngs/applications/libtestutils-0.6"
- Install the package as follows:
./configure
make
sudo make install (this will ask for your password, must be admin level privileged set)

- Move to fastx_toolkit folder "cd ../fastx_toolkit-0.0.13"
- Install the package as follows:
./configure --prefix=$HOME/local/bin
make
make install

- Test install by typing "fastx_uncollapser -h", this should pop up a usage documentation for this app

2) Install Bfast, DNAA, and Breakway

*** Why these packages***
As you might guess from the above install I now have some mate-pair data and want to try out the Breakway package from the UCLA group but it depends on two of their other packages Bfast and DNAA

Bfast - This package seems to be the vain of my existence but thankfully Nils and the helplist have been amazingly helpful

Mac Related Issues:
a) You must have fink and have installed the md5deep package otherwise "make check" will fail
b) The current sourceforge version (0.6.4e) does not install correctly thought the previous version does, however, this is a known issue and has been fixed in the master branch (if that's a new term to you we are in the same boat) but this mean you need to use the git repository version
c) Using ".configure -prefix=$HOME/local" works but makes DNAA mad when you install it so use sudo (time to be superman again)

Instructions:
a) In Terminal navigate to ngs/applications
b) Get current Bfast version from Git (restart Terminal after git installation)
- type "git clone git://bfast.git.sourceforge.net/gitroot/bfast/bfast"
- this will create a folder called "bfast" in the current directory
- Move into the directory "cd bfast"
- Install bfast by typing the following:
sh autogen.sh
./configure
make
make check
sudo make install (requests a password with admin level privileges)

- Test install and check current version by typing "bfast" in Terminal

c) Navigate back to ngs directory by typing "cd ../"
d) Get current version of DNAA from Git
- type "git clone git://dnaa.git.sourceforge.net/gitroot/dnaa/dnaa dnaa"
- this will create a directory named "dnaa" in the current directory (ngs/applications)
- move into the the dnaa directory by typing "cd dnaa"
e) Because this package depends on both BFAST and SAMTOOLS you need to provide links to these application directories even though you already have them in a $PATH directory (/usr/local/bin and $HOME/local/bin repectively)
- create a link to the BFAST package you just installed by typing "ln -s ../bfast bfast"
- create a link to your current SAMTOOLS package by typing "ln -s ../samtools-0.1.8 samtools"
f) Install DNAA by typing the following:
sh autogen.sh
./configure
make
sudo make install (requests a password with admin level privileges)

g) Download current version of BREAKWAY from sourceforge (http://sourceforge.net/projects/breakway/), move it to the ngs/applications folder and unpack it and you should be ready to go

Last edited by Jon_Keats; 02-17-2011 at 09:18 PM. Reason: found typo in url
Jon_Keats is offline   Reply With Quote
Old 09-13-2010, 04:55 PM   #36
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default

Building a Paired-End Pipeline

Up till now I've been frustrated because I could not automate a variety of pairing steps that occur as I process raw data to BAM files. This is usually either in the SAMPE step of BWA or when I wanted to merge multiple lanes into one BAM file. I think I convinced myself that I can just use the "cat" function to merge the multiple lanes together before processing, which ends up being a simple solution as long as all the lanes are available at the same time. For the SAMPE pairing I spent sometime with my Unix guru from France when he came over to visit his wife and I seem to have a workable solution as long as a specific file tree structure is used in conjunction with two unix scripts, one that processes each pair from raw data to two sort BAM files, one with and without duplicates, and a second that pulls each sample into the analysis framework and launches the aforementioned script. So since this requires a specific directory structure I've updated my directory structure script to version 3.

Code:
#!/bin/sh

# Create_NGS_DirectoryStructure_V3.sh
# 
#
# Created by Jonathan Keats on 9/3/10 based on suggestion from Ryan Golhar on my Seqanswers thread.
# Translational Genomics Research Institute
#
#########################################################################
#  CREATES A DIRECTORY STURCTURE TO SUPPORT A VARIETY OF NGS PIPELINES  #
#########################################################################
#
# Designed for a Mac OS enviroment and requires initiation from your home folder (/User/You/)

# Check to confirm current location is $HOME/ (ie. /User/You/)

echo "Confirming Script Initiation Directory"
var1=$HOME
if [ "`pwd`" != "$var1" ] 
	then 
	echo " The script must be launched from your home directory "
	echo " The script was automatically killed due to a launch error - See Above Error Message" 
	exit 2                              
fi
echo "1) Launch Location is Correct ($HOME/)"

# Create required directories to support pipelines (BWAse, BWApe, and others to come...)

echo ***Creating Pipeline Directory Structure***
mkdir -p ngs/{analysisnotes,applications,scripts}
mkdir -p ngs/refgenomes/{bfast_indexed,bowtie_indexed,bwa_indexed,genome_downloads}
mkdir -p ngs/refgenomes/genome_downloads/{hg18,hg19}
mkdir -p ngs/finaloutputs/{alignmentresults_bwa,illumina,sangerfastq}
mkdir -p ngs/finaloutputs/bamfiles/{merged,sorted,nodups}
mkdir -p ngs/bwase/inputsequences/{illumina,sangerfastq}
mkdir -p ngs/bwase/samfiles
mkdir -p ngs/bwase/bamfiles/{merged,original,sorted,nodups}
mkdir -p ngs/bwape/samfiles
mkdir -p ngs/bwape/bamfiles/{merged,original,sorted,nodups}
mkdir -p ngs/bwape/inputsequences/{illumina,sangerfastq,hold}
mkdir -p ngs/bwape/inputsequences/illumina/{read1,read2}
mkdir -p ngs/bwape/inputsequences/sangerfastq/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/{lane1,lane2,lane3,lane4,lane5,lane6,lane7,lane8}
mkdir -p ngs/bwape/inputsequences/hold/lane1/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/lane2/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/lane3/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/lane4/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/lane5/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/lane6/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/lane7/{read1,read2}
mkdir -p ngs/bwape/inputsequences/hold/lane8/{read1,read2}

mv create_ngs_directorystructure_v3.sh ngs/scripts/
echo ***Pipeline Directory Structure Created***

Last edited by Jon_Keats; 09-13-2010 at 04:58 PM.
Jon_Keats is offline   Reply With Quote
Old 09-22-2010, 02:01 PM   #37
jdanderson
Member
 
Location: Davis, CA

Join Date: Sep 2010
Posts: 45
Default

Just wanted to chime in and affirm the sentiment that several people have already expressed; thank you for your posts, they are quite helpful. I am just starting out and this forum and your posts are eminently helpful. Thank you for taking the time to post!
jdanderson is offline   Reply With Quote
Old 09-26-2010, 01:20 AM   #38
maverick123
Junior Member
 
Location: India

Join Date: Sep 2010
Posts: 1
Default

Hello i am ronnie .. i am from chandigath city... currently i am btech student..and its a nice informative thread...
maverick123 is offline   Reply With Quote
Old 09-28-2010, 07:36 PM   #39
Jon_Keats
Senior Member
 
Location: Phoenix, AZ

Join Date: Mar 2010
Posts: 279
Default [B]BWA SAMPE Pipeline Version[/B]

As I mentioned before its taken a while to sort out a method that can automate a paired-end analysis using BWA but it seems to work now. Feel free to use the scripts below in conjunction with the "create_ngs_directorystructure_v3.sh" script that creates the required directory structure.
The following two scripts can be used to process files using BWA to automate a paired-end analysis from the output "s_x_sequence.txt" files to aligned, indexed, and duplicate removed BAM files. The design of the pipeline has a couple of requirements:

1) You need to have all the required applications in a $PATH directory. As detailed in this thread I personally use "$HOME/local/bin".
2) You will need; MAQ with ill2sanger patch installed, BWA, SAMTOOLS, and PICARD MarkDuplicates.jar in this path directory.
NOTE: If you use a different path directory you need to alter line 623 of BWApe_hg18_v1.sh as MarkDuplicates.jar is being called specifically from this directory while all others are being called through the $PATH directory. ****If you know how to put a directory in the JAVA path on a Mac drop me a line****
3) Both shell scripts are designed to be in your $PATH directory so you can call them from the ngs directory using "BWApe_hg18_v1.sh" for a single sample analysis or "multi_bwape_analysis_v1.sh" for a multiple sample analysis. Alternatively, you can place them in the "/ngs" folder and call them directly using "./BWApe_hg18_v1.sh" or "./multi_bwape_analysis_v1.sh" (NOTE: If you do this you need to modify the lines that launch BWApe_hg18_v1.sh to include the direct launch indicator "./"
4) The input file names must be unique and end with a "_R1.txt" read identifier such as "YourSample_R1.txt" and "YourSample_R2.txt"

NOTE: The name BWApe_hg18_v1.sh only reflects the reference genome used in the development of the script. You can easily change to what ever genome mouse, human you want to use you just need to generate the bwa index and update the BWApe_hg18_v1.sh script as indicated in the script.

NOTE
: If using "BWApe_hg18_v1.sh" you need to place the raw illumina files in "ngs/bwape/inputsequences/illumina/read1" and "ngs/bwape/inputsequences/illumina/read2". If using "multi_bwape_analysis_v1.sh" you need to place the raw illumina files in "ngs/bwape/inputsequences/hold/laneX/read1" and "ngs/bwape/inputsequences/hold/laneX/read2" as appropriate to your sample set. The script is only designed for 8 lanes/samples so if you have more you needed to copy/paste to extend the script. After completing each lane/sample it checks to see if there is data for another lane/sample in the next sequential lane/sample folder and process it if available or ends the script if it is empty, so you need to put files in the hold/lane1, 2, 3, 4, 5, 6, 7, and 8 read folders in order.

Code:
#!/bin/sh

# BWApe_hg18_V1.sh
# Created by Jonathan Keats
# Translational Genomics Research Institute

# This script is designed to take a batch of raw Illumina 1.3+ reads to sorted and indexed BAM files with and without duplicates using BWA in paired end mode.
# It is designed to be initiated from a folder called "ngs" in your $HOME folder with a specific subdirectory structure
# To create the directory struture launch "create_ngs_directorystructure_v3.sh" from your "$Home" folder

####################################################################################################
##  To Run This Script You Must Have The Following Applications In One Of Your $PATH Directories  ##
##						1) MAQ with ill2sanger patch installed									  ##
##						2) BWA																	  ##
##						3) SAMTOOLS																  ##
##						4) PICARD - MarkDuplicates.jar (Must be in $HOME/local/bin)				  ##
####################################################################################################

# To run this script you MUST first place your reference file in ngs/refgenomes/bwa_indexed and have run the "bwa index" command to create the BWT index files

######################################################################################################
# WARNING - YOU MUST ENSURE THE NAME OF YOUR REFERENCE GENOME FILE MATCHES LINES (274, 310, and 367) #
###################################################################################################### 

# The script is based on having ***RENAMED*** Illumina files in "ngs/bwape/inputsequences/illumina/read1" and "ngs/bwape/inputsequences/illumina/read2"
# The renamed format ***MUST*** be "YourSampleName_R1.txt" and "YourSampleName_R2.txt" otherwise pairing and renaming will not occur correctly
# Multiple lanes should be concatinated together before initiating the script, unless you want to manually merge in samtools 
# At each step it queries specific folders for available files and passes them to the next analysis module
# After each step the filename extension of the output files are corrected. (ie. "MySequenceFile_R1.txt.fastq" to "MySequenceFile_R1.fastq")
# Order of Embedded Steps	- Converts Illumina 1.3+ fastq files "s_1_sequence.txt" to Sanger fastq files "s_1_sequence.fastq" using "maq ill2sanger" command
#							- Aligns created fastq files to reference genome using "bwa aln" command
#							- Generates SAM files from alignment files using "bwa sampe" command
#							- Converts SAM files to BAM files using "samtools view" command
#							- Sorts BAM files using "samtools sort" command
#							- Indexes the sorted BAM files for use in IGV browser using "samtools index" command
#							- Removes duplicates from the sorted bam files using "picard - MarkDuplicates.jar" command
#							- Indexes the no duplicates BAM files for use in IGV browser using "samtools index" command
#							- Final output files are archived then the input and analysis directories are cleaned-up and readied for the next analysis batch
# The script creates a log file in /ngs/analysisnotes to track the steps completed and the time each step started and finished
# Some of the log events will print to both the terminal screen and the log file so you can see what is going on
# Much of this would not be possible with out the help of a former colleagues husband who is a Unix programmer in France so I've kept some french terms such as linge instead of line in his honor (thanks Charabelle)

#Starting directory = $HOME/ngs

#In this step	- We check that you are lauching the script from the correct location in case you are using it from a path directory
#				- We check that the destination directories used by the script are empty to prevent deleting erroneous files and unexpected analysis events
#				- Hope to add a check for available disk space

echo ***Checking Diretory Structure***

#List of directoryies to check
var1=$HOME/ngs
var2=$HOME/ngs/bwape/samfiles
var3=$HOME/ngs/bwape/bamfiles/merged
var4=$HOME/ngs/bwape/bamfiles/original
var5=$HOME/ngs/bwape/bamfiles/sorted
var6=$HOME/ngs/bwape/bamfiles/nodups
var7=$HOME/ngs/bwape/inputsequences/sangerfastq/read1
var8=$HOME/ngs/bwape/inputsequences/sangerfastq/read2

#Checking if launch location is correct

if [ "`pwd`" != "$var1" ] 
	then 
	echo " The script must be launched from the NGS directory "
	echo " The script was automatically killed due to a launch error - See Above Error Message" 
	exit 2                              
fi
echo "1) Launch Location is Correct ($HOME/ngs)"

#Checking if analysis directories are empty


if [ `ls $var2 | wc -l` != 0 ]       
	then 
	echo " The bwape/samfiles directory is not empty - Any data in this directory would be deleted by the script "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo "2) bwape/samfiles directory is empty as required"
if [ `ls $var3 | wc -l` != 0 ]       
	then 
	echo " The bwape/bamfiles/merged directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/merged "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo "3) bwape/bamfiles/merged directory is empty as required"
if [ `ls $var4 | wc -l` != 0 ]       
	then 
	echo " The bwape/bamfiles/original directory is not empty - Any data in this directory would be deleted by the script "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo "4) bwape/bamfiles/original directory is empty as required"
if [ `ls $var5 | wc -l` != 0 ]       
	then 
	echo " The bwape/bamfiles/sorted directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/sorted by the script "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo "5) bwape/bamfiles/sorted directory is empty as required"
if [ `ls $var6 | wc -l` != 0 ]       
	then 
	echo " The bwape/bamfiles/nodups directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/nodups by the script "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo "6) bwape/bamfiles/nodups directory is empty as required"
if [ `ls $var7 | wc -l` != 0 ]       
	then 
	echo " The bwape/illuminasequences/sangerfastq/read1 directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/nodups by the script "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo "7) bwape/illuminasequences/sangerfastq/read1 directory is empty as required"
if [ `ls $var8 | wc -l` != 0 ]       
	then 
	echo " The bwape/illuminasequences/sangerfastq/read2 directory is not empty - Any data in this directory would be moved by the script to ngs/finaloutputs/bamfiles/nodups by the script "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo "8) bwape/illuminasequences/sangerfastq/read2 directory is empty as required"

echo ***Pre Run Check Completed Successfully***

#Current directory=ngs
echo ***Starting BWA SAMPE Analysis Batch***
date '+%m/%d/%y %H:%M:%S'

#The following step creates the log file in the AnalysisNotes subdirectory the first time the script is run
#On subsequent runs the results are printed at the bottom of the pre-existing log file

echo ***Starting BWA SAMPE Analysis Batch*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we convert the "Read1" illumina fastq files to sanger fastq files using the maq ill2sanger script

echo Starting Step1a - Read1 Illumina to Sanger Fastq Conversion with maq ill2sanger
date '+%m/%d/%y %H:%M:%S'
echo Starting Step1a - Illumina to Sanger Fastq Conversion with maq ill2sanger >> AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log
cd bwape/inputsequences/illumina/read1
#Current directory = ngs/bwape/inputsequences/illumina/read1
echo Converting the following Illumina files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
echo Converting the following Illumina files: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.txt`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
maq ill2sanger $ligne ../../sangerfastq/read1/$ligne.fastq
done

#In the next step we clean up the Illumina Read1 folder so it is ready for the next analysis batch

echo Cleaning up Input Sequences Illumina Read1
date '+%m/%d/%y %H:%M:%S'
echo Cleaning up Input Sequences Illumina Read1 >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
echo Moving the following Illumina Fastq Files from ngs/bwape/inputsequences/illumina/read1 to ngs/finaloutputs/illumina:
for ligne in `ls *.txt`
do
echo $ligne
done
echo Moving the following Illumina Fastq Files from ngs/bwase/inputsequences/illumina/read1 to ngs/finaloutputs/illumina >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.txt`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../../finaloutputs/illumina
done

#In the next step we rename the "Read1" sanger format fastq files from ".txt.fastq" extensions to ".fastq"

cd ../../sangerfastq/read1
#Current directory = ngs/bwape/inputsequences/sangerfastq/read1
old_ext=txt.fastq
new_ext=fastq
find . -type f -name "*.$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done
echo Finished Step1a - Illumina to Sanger Fastq Conversion
date '+%m/%d/%y %H:%M:%S'
echo Finished Step1a - Illumina to Sanger Fastq Conversion >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

#In the next step we convert the "Read2" illumina fastq files to sanger fastq files using the maq ill2sanger script

cd ../../illumina/read2
#Current directory = ngs/bwape/inputsequences/illumina/read2
echo Starting Step1b - Read2 Illumina to Sanger Fastq Conversion with maq ill2sanger
date '+%m/%d/%y %H:%M:%S'
echo Starting Step1b - Read2 Illumina to Sanger Fastq Conversion with maq ill2sanger >> ../../../../AnalysisNotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
echo Converting the following Illumina files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
echo Converting the following Illumina files: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.txt`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
maq ill2sanger $ligne ../../sangerfastq/read2/$ligne.fastq
done

#In the next step we clean up the Illumina Read2 folder so it is ready for the next analysis batch

echo Cleaning up Input Sequences Illumina Read2
date '+%m/%d/%y %H:%M:%S'
echo Cleaning up Input Sequences Illumina Read2 >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
echo Moving the following Illumina Fastq Files from ngs/bwape/inputsequences/illumina/read2 to ngs/finaloutputs/illumina:
for ligne in `ls *.txt`
do
echo $ligne
done
echo Moving the following Illumina Fastq Files from ngs/bwase/inputsequences/illumina/read2 to ngs/finaloutputs/illumina >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.txt`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../../finaloutputs/illumina
done

#In the next step we rename the "Read2" sanger format fastq files from ".txt.fastq" extensions to ".fastq"

cd ../../sangerfastq/read2
#Current directory = ngs/bwape/inputsequences/sangerfastq/read2
old_ext=txt.fastq
new_ext=fastq
find . -type f -name "*.$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done
echo Finished Step1b - Illumina to Sanger Fastq Conversion
date '+%m/%d/%y %H:%M:%S'
echo Finished Step1b - Illumina to Sanger Fastq Conversion >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

#In the next step we will align the converted "Read1" sanger fastq format files to the reference genome

echo Starting Step2a - Read1 bwa aln process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step2a - Read1 bwa aln process >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
cd ../read1
#Current directory = ngs/bwape/inputsequences/sangerfastq/read1
echo The following fastq files will be aligned:
for ligne in `ls *.fastq`
do                                                                     
echo $ligne
done
echo The following fastq files will be aligned: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.fastq`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.fastq`
do
bwa aln ../../../../refgenomes/bwa_indexed/hg18.fasta $ligne > $ligne.sai 	 
done

#In the next step we will rename the "Read1" alignment files

old_ext=.fastq.sai
new_ext=_bwa.sai
find . -type f -name "*$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done 
echo Finished Step2a - bwa aln process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step2a - bwa aln process >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

#In the next step we will align the converted "Read2" sanger fastq format files to the reference genome

echo Starting Step2b - Read2 bwa aln process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step2b - Read2 bwa aln process >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
cd ../read2
#Current directory = ngs/bwape/inputsequences/sangerfastq/read2
echo The following fastq files will be aligned:
for ligne in `ls *.fastq`
do                                                                     
echo $ligne
done
echo The following fastq files will be aligned: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.fastq`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.fastq`
do
bwa aln ../../../../refgenomes/bwa_indexed/hg18.fasta $ligne > $ligne.sai 	 
done

#In the next step we will rename the "Read2" alignment files

old_ext=.fastq.sai
new_ext=_bwa.sai
find . -type f -name "*$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done 
echo Finished Step2b - bwa aln process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step2b - bwa aln process >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

#In the next step we will generate SAM files for the alignments using bwa sampe

echo Starting Step3 - bwa sampe process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step3 - bwa sampe process >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log
echo The following alignment files will be converted to SAM files:

cd ../read1
#Current directory = ngs/bwape/inputsequences/sangerfastq/read1
for ligne in `ls *.sai`
do                                                                     
aln1=`echo $ligne`
done
echo $aln1
for ligne in `ls *.fastq`
do                                                                     
read1=`echo $ligne`
done
echo $read1

cd ../read2
#Current directory = ngs/bwape/inputsequences/sangerfastq/read2
for ligne in `ls *.sai`
do                                                                     
aln2=`echo $ligne`
done
echo $aln2
for ligne in `ls *.fastq`
do                                                                     
read2=`echo $ligne`
done
echo $read2
echo The following alignment files will be converted to SAM files: >> ../../../../analysisnotes/Analysis.log
echo $aln1 >> ../../../../analysisnotes/Analysis.log
echo $read1 >> ../../../../analysisnotes/Analysis.log
echo $aln2 >> ../../../../analysisnotes/Analysis.log
echo $read2 >> ../../../../analysisnotes/Analysis.log
cd ../../../samfiles
#Current directory = ngs/bwape/samfiles
#(bwa sampe <database.fasta> <aln1.sai> <aln2.sai> <input1.fastq> <input2.fastq> > aln.sam)
bwa sampe ../../refgenomes/bwa_indexed/hg18.fasta ../inputsequences/sangerfastq/read1/$aln1 ../inputsequences/sangerfastq/read2/$aln2 ../inputsequences/sangerfastq/read1/$read1 ../inputsequences/sangerfastq/read2/$read2 > $read1.sam

#In the next step we will rename the SAM files generated by bwa sampe analysis of the "Read1" and "Read2" alignment files

old_ext=_R1.fastq.sam
new_ext=_bwape.sam
find . -type f -name "*$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done 
echo Finished Step3 - bwa sampe process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step3 - bwa sampe process >> ../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../analysisnotes/Analysis.log

#In the next step we will convert each SAM file to a BAM file

echo Starting Step4 - samtools SAM to BAM conversion
date '+%m/%d/%y %H:%M:%S'
echo Starting Step4 - samtools SAM to BAM conversion >> ../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../analysisnotes/Analysis.log
echo The following SAM files will be converted to BAM files:
for ligne in `ls *.sam`
do                                                                     
echo $ligne
done
echo The following SAM files will be converted to BAM files: >> ../../analysisnotes/Analysis.log
for ligne in `ls *.sam`
do                                                                     
echo $ligne >> ../../analysisnotes/Analysis.log
done
for ligne in `ls *.sam`
do
samtools view -bS -o ../bamfiles/original/$ligne.bam $ligne
done

#In the next step we will delete the SAM file to save disc space as the BAM file contains all the data in a binary format

echo Deleting the following SAM Files from ngs/bwape/samfiles:
for ligne in `ls *.sam`
do
echo $ligne
done
echo Deleting the following SAM Files from ngs/bwape/samfiles: >> ../../analysisnotes/Analysis.log
for ligne in `ls *.sam`
do
echo $ligne >> ../../analysisnotes/Analysis.log
done
for ligne in `ls *.sam`
do
rm $ligne
done
echo Deleting SAM Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Deleting SAM Files Complete >> ../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../analysisnotes/Analysis.log

#In the next step we clean up the Sanger Fastq "Read1" folder so it is ready for the next analyis batch

cd ../inputsequences/sangerfastq/read1
#Current directory = ngs/bwape/inputsequences/sangerfastq/read1
echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/sangerfastq:
for ligne in `ls *.fastq`
do
echo $ligne
done
echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/sangerfastq: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.fastq`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.fastq`
do
mv $ligne ../../../../finaloutputs/sangerfastq/
done
echo Moving Sanger Format Fastq Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Sanger Format Fastq Files Complete >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/alignmentresults_bwa:
for ligne in `ls *.sai`
do
echo $ligne
done
echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read1 to ngs/finaloutputs/alignmentresults_bwa: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.sai`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.sai`
do
mv $ligne ../../../../finaloutputs/alignmentresults_bwa/
done
echo Moving Alignment Results Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Alignment Results Files Complete >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

#In the next step we clean up the Sanger Fastq "Read2" folder so it is ready for the next analyis batch

cd ../read2
#Current directory = ngs/bwape/inputsequences/sangerfastq/read2
echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/sangerfastq:
for ligne in `ls *.fastq`
do
echo $ligne
done
echo Moving the following Sanger Format Fastq Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/sangerfastq: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.fastq`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.fastq`
do
mv $ligne ../../../../finaloutputs/sangerfastq/
done
echo Moving Sanger Format Fastq Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Sanger Format Fastq Files Complete >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/alignmentresults_bwa:
for ligne in `ls *.sai`
do
echo $ligne
done
echo Moving the following Alignment .sai Files from ngs/bwape/inputsequences/sangerfastq/read2 to ngs/finaloutputs/alignmentresults_bwa: >> ../../../../analysisnotes/Analysis.log
for ligne in `ls *.sai`
do
echo $ligne >> ../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.sai`
do
mv $ligne ../../../../finaloutputs/alignmentresults_bwa/
done
echo Moving Alignment Results Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Alignment Results Files Complete >> ../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../analysisnotes/Analysis.log

#In the next step we will rename the BAM files created by the samtools SAM-to-BAM conversion process

cd ../../../bamfiles/original
#Current directory = ngs/bwape/bamfiles/original
old_ext=sam.bam
new_ext=bam
find . -type f -name "*.$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done 
echo Finished Step4 - samtools SAM to BAM conversion
date '+%m/%d/%y %H:%M:%S'
echo Finished Step4 - samtools SAM to BAM conversion >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

#In the next step we will sort the BAM file by chromosome coordinate

echo Starting Step5 - samtools BAM sorting process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step5 - samtools BAM sorting process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
echo The following BAM files will be sorted:
for ligne in `ls *.bam`
do                                                                     
echo $ligne
done
echo The following BAM files will be sorted: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bam`
do                                                                     
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bam`
do                                                                     
samtools sort $ligne ../Sorted/$ligne
done

#In the next step we will delete the original unsorted BAM file to save disc space as the sorted BAM contains all the needed information

echo Deleting the following BAM Files from ngs/bwape/bamfiles/original:
for ligne in `ls *.bam`
do
echo $ligne
done
echo Deleting the following BAM Files from ngs/bwape/bamfiles/original: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bam`
do
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bam`
do
rm $ligne
done
echo Deleting BAM Files from ngs/bwape/bamfiles/original Complete
date '+%m/%d/%y %H:%M:%S'
echo Deleting BAM Files from ngs/bwape/bamfiles/original Complete >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

#In the next step we will rename the sort BAM files created by the samtools sort process

cd ../sorted
#Current directory = ngs/bwape/bamfiles/sorted
old_ext=.bam.bam
new_ext=_sorted.bam
find . -type f -name "*$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done 
echo Finished Step5 - samtools BAM sorting process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step5 - samtools BAM sorting process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

#In the next step we will index the sorted BAM files for fast access and viewing in the IGV browser

echo Starting Step6 - samtools BAM indexing process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step6 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
echo The following BAM files will be indexed:
for ligne in `ls *.bam`
do                                                                     
echo $ligne
done
echo The following BAM files will be indexed: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bam`
do                                                                     
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bam`
do                                                                     
samtools index $ligne
done
echo Finished Step6 - samtools BAM indexing process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step6 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

#In the next step we will remove the duplicate reads from the sorted bam files

echo Starting Step7 - picard markduplicates process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step7 - picard markduplicates process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
echo Duplicate reads will be removed from the following sorted BAM files:
for ligne in `ls *.bam`
do                                                                     
echo $ligne
done
echo Duplicate reads will be removed from the following sorted BAM files: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bam`
do                                                                     
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bam`
do                                                                     
java -Xmx2g -jar $HOME/local/bin/MarkDuplicates.jar INPUT=$ligne OUTPUT=../nodups/$ligne METRICS_FILE=../nodups/$ligne.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true VALIDATION_STRINGENCY=SILENT
done

#In the next step we clean up the Sorted BAM files folder so it is ready for the next analyis batch

echo Moving the following Sorted BAM Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted:
for ligne in `ls *.bam`
do
echo $ligne
done
echo Moving the following Sorted BAM Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bam`
do
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bam`
do
mv $ligne ../../../finaloutputs/bamfiles/sorted/
done
echo Moving Sorted BAM Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Sorted BAM Files Complete >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

echo Moving the following BAM Index .bai Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted:
for ligne in `ls *.bai`
do
echo $ligne
done
echo Moving the following BAM Index .bai Files from ngs/bwape/bamfiles/sorted to ngs/finaloutputs/bamfiles/sorted: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bai`
do
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bai`
do
mv $ligne ../../../finaloutputs/bamfiles/sorted/
done
echo Moving Sorted BAM Index Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Sorted BAM Index Files Complete >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

#In the next step we will rename the BAM files and Metrics files created after duplicate removal by picard

cd ../nodups
#Current directory = ngs/bwape/bamfiles/nodups
old_ext=_sorted.bam
new_ext=_sorted_nodups.bam
find . -type f -name "*$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done
old_ext=_sorted.bam.txt
new_ext=_sorted_nodups_metrics.txt
find . -type f -name "*$old_ext" -print | while read file
do
    mv $file ${file%${old_ext}}${new_ext}
done
echo Finished Step7 - picard markduplicates process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step7 - picard markduplicates process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

#In the next step we will index the nodups BAM files for fast access and viewing in the IGV browser

echo Starting Step8 - samtools BAM indexing process
date '+%m/%d/%y %H:%M:%S'
echo Starting Step8 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log
echo The following BAM files will be indexed:
for ligne in `ls *.bam`
do                                                                     
echo $ligne
done
echo The following BAM files will be indexed: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bam`
do                                                                     
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bam`
do                                                                     
samtools index $ligne
done

#In the next step we clean up the nodups BAM files folder so it is ready for the next analyis batch

echo Moving the following NoDups BAM Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups:
for ligne in `ls *.bam`
do
echo $ligne
done
echo Moving the following NoDups BAM Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bam`
do
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bam`
do
mv $ligne ../../../finaloutputs/bamfiles/nodups/
done
echo Moving NoDups BAM Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving NoDups BAM Files Complete >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

echo Moving the following NoDups BAM Index .bai Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups:
for ligne in `ls *.bai`
do
echo $ligne
done
echo Moving the following NoDups BAM Index .bai Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.bai`
do
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.bai`
do
mv $ligne ../../../finaloutputs/bamfiles/nodups/
done
echo Moving NoDups BAM Index Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving NoDups BAM Index Files Complete >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

echo Moving the following MarkDuplicates Metrics Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups:
for ligne in `ls *.txt`
do
echo $ligne
done
echo Moving the following MarkDuplicates Metrics Files from ngs/bwape/bamfiles/nodups to ngs/finaloutputs/bamfiles/nodups: >> ../../../analysisnotes/Analysis.log
for ligne in `ls *.txt`
do
echo $ligne >> ../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../finaloutputs/bamfiles/nodups/
done
echo Moving MarkDuplicates Metrics Files Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving MarkDuplicates Metrics Files Complete >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

echo Finished Step8 - samtools BAM indexing process
date '+%m/%d/%y %H:%M:%S'
echo Finished Step8 - samtools BAM indexing process >> ../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../analysisnotes/Analysis.log

#In the next step we return to the launch folder $HOME/Documents/ngs

cd ../../..
#Current directory = ngs/
echo ***Analysis Batch Complete***
echo ***Analysis Batch Complete*** >> analysisnotes/Analysis.log

Code:
#!/bin/sh

# multi_bwape_analysis_v1.sh
# 
#
# Created by Jonathan Keats on 9/5/10.
# Translational Genomics Research Institute

# This script is designed to allow multiple samples/lanes of paired-end illumina data to be passed into the "BWApe_hg18_v1" pipeline

###############################################################################################################################
## To facilitate its use you must put uniquely named Illumina 1.3+ files in ngs/bwape/inputsequences/hold/lane(X)/read(1-2)   #
## It is essential that these file names are uniquely name or overwriting will occur										  #
## These files MUST have the ".txt" extension characteristic of the Illumina V1.3+ output "s_x_sequences.txt"				  #
###############################################################################################################################

#In this step we check that you are lauching the script from the correct location in case you are using it from a path directory

echo ***Checking Current Directory is Correct***

#List of directoryies to check
temp1=$HOME/ngs

#Checking if launch location is correct

if [ "`pwd`" != "$temp1" ] 
	then 
	echo " The script must be launched from the NGS directory "
	echo " The script was automatically killed due to a launch error - See Above Error Message" 
	exit 2                              
fi
echo ***Current	Directory is Correct***

#Check if files exist in the lane1 hold folder

#List of directoryies to check
temp2=$HOME/ngs/bwape/inputsequences/hold/lane1/read1
temp3=$HOME/ngs/bwape/inputsequences/hold/lane1/read2

echo ***Checking Lane1 Hold Folder***
if [ `ls $temp2 | wc -l` != 1 ]       
	then 
	echo " The Lane1 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp3 | wc -l` != 1 ]       
	then 
	echo " The Lane1 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
#Current directory=ngs
echo ***Starting The Analysis of Lane1***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane1*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane1" data from ngs/bwape/inputsequences/hold/lane1/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane1/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane1/read1
echo Moving Lane1 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane1 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane1 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane1 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane1/read2
echo Moving Lane1 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane1 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane1 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane1 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane1 Analysis Complete***

# The analysis directory should now be empty and we can now load the sample2/lane2 data into the analysis directories

#Check if files exist in the lane2 hold folder

#List of directoryies to check
temp4=$HOME/ngs/bwape/inputsequences/hold/lane2/read1
temp5=$HOME/ngs/bwape/inputsequences/hold/lane2/read2

echo ***Checking Lane2 Hold Folder***
if [ `ls $temp4 | wc -l` != 1 ]       
	then 
	echo " The Lane2 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp5 | wc -l` != 1 ]       
	then 
	echo " The Lane2 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
echo ***Starting The Analysis of Lane2***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane2*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane2" data from ngs/bwape/inputsequences/hold/lane2/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane2/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane2/read1
echo Moving Lane2 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane2 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane2 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane2 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane2/read2
echo Moving Lane2 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane2 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane2 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane2 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane2 Analysis Complete***

# The analysis directory should now be empty and we can now load the sample3/lane3 data into the analysis directories

#Check if files exist in the lane3 hold folder

#List of directoryies to check
temp6=$HOME/ngs/bwape/inputsequences/hold/lane3/read1
temp7=$HOME/ngs/bwape/inputsequences/hold/lane3/read2

echo ***Checking Lane3 Hold Folder***
if [ `ls $temp6 | wc -l` != 1 ]       
	then 
	echo " The Lane3 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp7 | wc -l` != 1 ]       
	then 
	echo " The Lane3 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
#Current directory=ngs
echo ***Starting The Analysis of Lane3***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane3*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane3" data from ngs/bwape/inputsequences/hold/lane3/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane3/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane3/read1
echo Moving Lane3 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane3 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane3 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane3 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane3/read2
echo Moving Lane3 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane3 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane3 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane3 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane3 Analysis Complete***

# The analysis directory should now be empty and we can now load the sample4/lane4 data into the analysis directories

#Check if files exist in the lane4 hold folder

#List of directoryies to check
temp8=$HOME/ngs/bwape/inputsequences/hold/lane4/read1
temp9=$HOME/ngs/bwape/inputsequences/hold/lane4/read2

echo ***Checking Lane4 Hold Folder***
if [ `ls $temp8 | wc -l` != 1 ]       
	then 
	echo " The Lane4 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp9 | wc -l` != 1 ]       
	then 
	echo " The Lane4 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
echo ***Starting The Analysis of Lane4***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane4*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane4" data from ngs/bwape/inputsequences/hold/lane4/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane4/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane4/read1
echo Moving Lane4 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane4 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane4 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane4 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane4/read2
echo Moving Lane4 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane4 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane4 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane4 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane4 Analysis Complete***

# The analysis directory should now be empty and we can now load the sample5/lane5 data into the analysis directories

#Check if files exist in the lane1 hold folder

#List of directoryies to check
temp10=$HOME/ngs/bwape/inputsequences/hold/lane5/read1
temp11=$HOME/ngs/bwape/inputsequences/hold/lane5/read2

echo ***Checking Lane1 Hold Folder***
if [ `ls $temp10 | wc -l` != 1 ]       
	then 
	echo " The Lane5 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp11 | wc -l` != 1 ]       
	then 
	echo " The Lane5 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
#Current directory=ngs
echo ***Starting The Analysis of Lane5***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane5*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane5" data from ngs/bwape/inputsequences/hold/lane5/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane5/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane5/read1
echo Moving Lane5 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane5 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane5 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane5 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane5/read2
echo Moving Lane5 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane5 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane5 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane5 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane5 Analysis Complete***

# The analysis directory should now be empty and we can now load the sample6/lane6 data into the analysis directories

#Check if files exist in the lane6 hold folder

#List of directoryies to check
temp12=$HOME/ngs/bwape/inputsequences/hold/lane6/read1
temp13=$HOME/ngs/bwape/inputsequences/hold/lane6/read2

echo ***Checking Lane6 Hold Folder***
if [ `ls $temp12 | wc -l` != 1 ]       
	then 
	echo " The Lane6 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp13 | wc -l` != 1 ]       
	then 
	echo " The Lane6 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
echo ***Starting The Analysis of Lane6***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane6*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane6" data from ngs/bwape/inputsequences/hold/lane6/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane6/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane6/read1
echo Moving Lane6 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane6 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane6 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane6 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane6/read2
echo Moving Lane6 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane6 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane6 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane6 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane6 Analysis Complete***

# The analysis directory should now be empty and we can now load the sample7/lane7 data into the analysis directories

#Check if files exist in the lane7 hold folder

#List of directoryies to check
temp14=$HOME/ngs/bwape/inputsequences/hold/lane7/read1
temp15=$HOME/ngs/bwape/inputsequences/hold/lane7/read2

echo ***Checking Lane7 Hold Folder***
if [ `ls $temp14 | wc -l` != 1 ]       
	then 
	echo " The Lane7 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp15 | wc -l` != 1 ]       
	then 
	echo " The Lane7 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
#Current directory=ngs
echo ***Starting The Analysis of Lane7***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane7*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane7" data from ngs/bwape/inputsequences/hold/lane7/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane7/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane7/read1
echo Moving Lane7 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane7 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane7 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane7 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane3/read2
echo Moving Lane7 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane7 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane7 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane7 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane7 Analysis Complete***

# The analysis directory should now be empty and we can now load the sample8/lane8 data into the analysis directories

#Check if files exist in the lane8 hold folder

#List of directoryies to check
temp16=$HOME/ngs/bwape/inputsequences/hold/lane8/read1
temp17=$HOME/ngs/bwape/inputsequences/hold/lane8/read2

echo ***Checking Lane8 Hold Folder***
if [ `ls $temp16 | wc -l` != 1 ]       
	then 
	echo " The Lane8 Read1 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
if [ `ls $temp17 | wc -l` != 1 ]       
	then 
	echo " The Lane8 Read2 hold folder does not contain the expect single file "
	echo " ERROR - The script was automatically killed due to a launch error - See Above Error Message"
	exit 2                     
fi
echo ***Found Expected Files***
echo ***Starting The Analysis of Lane8***
date '+%m/%d/%y %H:%M:%S'
echo ***Starting The Analysis of Lane8*** >> analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> analysisnotes/Analysis.log

#In the next step we move the "Lane8" data from ngs/bwape/inputsequences/hold/lane8/(read1-2) to ngs/bwape/inputsequences/illumina/(read1-2)

cd bwape/inputsequences/hold/lane8/read1
#Current Directory=ngs/bwape/inputsequences/hold/lane8/read1
echo Moving Lane8 Read1 File to Read1 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane8 Read1 File to Read1 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read1/
done
echo Moving Lane8 Read1 File to Read1 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane8 Read1 File to Read1 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../read2
#Current Directory=ngs/bwape/inputsequences/hold/lane8/read2
echo Moving Lane8 Read2 File to Read2 Analysis Directory
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane8 Read2 File to Read2 Analysis Directory >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log
echo Moving the following files:
for ligne in `ls *.txt`
do                                                                     
echo $ligne
done
for ligne in `ls *.txt`
do
echo Moving the following files: >> ../../../../../analysisnotes/Analysis.log
done
for ligne in `ls *.txt`
do
mv $ligne ../../../illumina/read2/
done
echo Moving Lane8 Read2 File to Read2 Analysis Directory Complete
date '+%m/%d/%y %H:%M:%S'
echo Moving Lane8 Read2 File to Read2 Analysis Directory Complete >> ../../../../../analysisnotes/Analysis.log
date '+%m/%d/%y %H:%M:%S' >> ../../../../../analysisnotes/Analysis.log

cd ../../../../../
#Current Directory=ngs/

# Now we call the "BWApe_hg18_v1.sh" script to analyze this sample/lane using bwa sampe

BWApe_hg18_v1.sh

echo ***Lane8 Analysis Complete***

Last edited by Jon_Keats; 10-04-2010 at 11:09 AM. Reason: Fixed bug in pipeline script
Jon_Keats is offline   Reply With Quote
Old 01-04-2011, 08:24 PM   #40
JBuenrostro
Member
 
Location: Stanford

Join Date: Sep 2009
Posts: 13
Default Best post

Glad I found this, so far its the best post I've seen here. Thanks for the help!
JBuenrostro is offline   Reply With Quote
Reply

Tags
bwa, illumina, newbie, samtools, unix

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:53 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO