Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    Why is this post not stickied in bioinformatics forum? This is a really great info on how-to for noobs like me. Would have saved couple days of googling stuff, trial and error

    Comment


    • #77
      Hey Jon,

      We are in total agreement in the definition of those terms, except for what you get out of picard insert metrics. I agree that tophat wants inner distance as the distance between the reads, and I believe that is what you get from picard as well. I ran a pipeline over the weekend with the Illumina's Body Map skeletal muscle paired end sample. Picard inner distance was 178bp. When Cufflinks was run on the same sample, it calculates the fragment length distribution automatically ( for the -m/--frag-len-mean value) and it was 176. I am inclined to believe -r/--mate-inner-dist for tophat and -m/--frag-len-mean for cufflinks refer to the same distance and therefore INNER_DISTANCE you get from Picard insert metrics should be left alone without subtracting the read lengths. What do you think? I'm not trying to prove you wrong but merely trying to clear out any confusions either of us might have.

      Comment


      • #78
        From a BAM file you created using the same reference genome the following command should do it

        Code:
        samtools view -H YourFile.bam | cut -f2- | sed 's/SN://g' | sed 's/LN://g' | awk 'NR > 1 {OFS="\t" ; print $1, "1", $2}' > YourChromosomeList.bed

        Comment


        • #79
          Hello everyone its been a very long time since I updated this thread. Over the last year I've had people in my lab maintain a detail protocol on how to build up a new machine to do most NGS analysis, well at least those we do in my group. So thanks to David and Teja we have a pretty comprehensive instruction set. Sadly, some are already a bit out of data and I'll try to have a test done on the next machin in our lab and update as needed.

          FULL INSTRUCTION LIST: How To Transform Your Mac Into A Sequencing Analysis Machine

          Introduction

          I’m a newly hired RA from Jonathan Keats’s lab who will be helping with a bunch of new sequencing stuff. I have been working on installing the suite of sequencing programs on our new workstation. Before I started, I knew virtually nothing about Terminal, Unix or manipulating sequencing files when I started. (In my mind, Terminal was where you board trains and Unix was some Talaxian from Star Trek: Voyager.) The learning curve has been steep, obviously, but Jonathan’s previous posts have been invaluable in making the adjustment.

          I wanted to update those posts, however, because (a) some of the instructions have changed as newer versions of applications have appeared; (b) posts could be combined into one gigantic “master-post”; (c) some of the instructions are much more advanced/complicated than others; and (d) some helpful instructions for certain applications weren’t included.

          To make things easier on the next bright-eyed generation of programming-illiterate biologists, I have included specific code instructions at practically every step of the installation process. After a couple times mentioning a particular command, I will stop including it to save space, so if you’re starting from the middle of the instruction set, refer to previous instructions for more information.

          If you find this compilation of instructions frustratingly simplistic, then I suggest you read through the previous posts, if only to read through Jonathan’s wry comments about the entire bioinformatics process. Hopefully, this post will be helpful to extreme sequencing/Unix novices like myself.

          Please let me know if you have any questions, good luck, and happy hunting!

          David K. Edwards V and Jonathan Keats

          Before You Begin: Programs

          Unix

          Before you begin, you should familiarize yourself with Terminal (Applications>Utilities>Terminal). Or better yet, you should invest some time working though the Unix portion of the "Unix and Perl for Biologists" course (http://groups.google.com/group/unix-...for-biologists), made public by Keith Bradham and Ian Korf at UC Davis. Or preorder their book on Amazon: http://www.amazon.com/UNIX-Perl-Resc...0189572&sr=8-1. Tell your PI it will be the best $50 investment of their career!

          It’s really helpful for beginners understanding non-GUI file manipulations and gives you a good list of important Unix commands. (If you’re completely new to programming, it might be too confusing or complicated, but nobody said this was going to be easy.)

          Download the entire course package: http://korflab.ucdavis.edu/Unix_and_Perl/index.html.

          Here is a general list of helpful Unix commands:
          • To get a manual on any command, type "man command". Type "space" to page down, "b" to back-up, and "q" to quit.
          • To see what folder you are in currently, type "pwd".
          • To see what folders and files exist in the current directory, type "ls".
          • To move into a folder in the current directory, type "cd myfolder". (Note: You can move multiple levels downstream with "cd myfolder/myfolder2".)
          • To go back one directory, type "cd ..". (Note: You can move back multiple levels upstream with "cd ../..".)
          • To copy a file from the current directory to a downstream folder, type "cp myfile myfolder/". (Note: You can copy a file up one directory with "cp myfile ../".)
          • To move a file from the current directory, type "mv" instead of typing “cp”.
          • A folder immediately downstream of the root directory (i.e. absolute top of the tree) is always defined by "command /folder". (This means if you type "cd /something", it looks for the folder "something" downstream of the root directory.)
          • To note the current directory, type ".".
          • To change the permissions of the compiled applications, type "chmod 755 myfile". (This makes the file readable and executable by everyone but only writable by you. To allow everybody to do everything to the file, type “chmod 777 myfile”.)
          • To become a super user for a particular command (and become Superman!), type “sudo”.
          • To decompress a tarball file, type “tar -xvzf file.tar.gz”, where “file.tar.gz” is the decompressed file.



          Xcode (http://developer.apple.com/technolog...ols/xcode.html)

          NOTE: For some reason Apple has decided to mess with you and recent versions of Xcode (OS Lion and OS Mountain Lion compatible versions) no longer install some essential command line commands like "make" which you will use extensively to build the applications. However, there is an extremely simple solution to install these applications from within Xcode.

          To install command line tools see (http://slashusr.wordpress.com/2012/0...nd-line-tools/)
          • Launch Xcode
          • Go to Preferences
          • Go to Downloads
          • Click the "Command Line Tools" radio button
          • Follow Prompts


          You need to install Xcode on your computer so you can compile the various applications and if you start writing your own scripts it is a nice text editor in our opinion.

          The newest version available on the App store, Xcode 4.3, is only compatible with OSX Mountain Lion (10.8.x). If you have Leopard (10.5.x) or Snow Leopard (10.6.x) or Lion (10.7.x), then you can install the package from your OS installation disks. Insert Mac OS X Install Disc 2, open the “Xcode Tools” folder, and double click “XCodeTools.mpkg”. Otherwise, you need to sign up to be a developer and download it from the website.

          MacPorts (http://www.macports.org/)

          You need to install some packages to run certain applications. There are two programs to install those packages, Fink and MacPorts. There isn’t much difference between both programs; in general, Fink is more conservative about upgrading packages that MacPorts, but both are perfectly acceptable. I simply chose MacPorts for this protocol.

          R and Bioconductor

          R: You will need R to perform statistical computations and generate graphs from your data. To install, visit http://www.r-project.org/, then select preferred CRAN mirror and follow the instructions.

          Bioconductor: You will probably need Bioconductor to analyze your high-throughput genomic data. To install Bioconductor, you must have the most recent release version of R. The most common packages you will need to install are affy, simpleaffy, and gcmra.

          To install these packages, starting first with affy, simply start R and type in the following:

          Code:
          source("http://bioconductor.org/biocLite.R")
          biocLite("affy")
          Press enter. R will automatically install the dependencies ‘Biobase’, ‘affyio’, and ‘preprocessCore’ during this installation.

          To install simplaffy, replace “affy” with “simplaffy” in the above code and press enter. R will automatically install the dependencies ‘DBI’, ‘RSQLite’, ‘xtable’, ‘IRanges’, ‘AnnotationDbi’, ‘annotate’, ‘Biostrings’, ‘genefilter’, and ‘gcrma’.

          There are three other dependencies you should install:


          (NOTE: The version of cummeRbund that is installed through the current BioConductor development version is 1.0.0. The latest version, version 1.1.3 will be available as part of the Bioconductor development version 2.10, which will be made available in April 2012. For more information, please visit: http://compbio.mit.edu/cummeRbund/index.html.)

          For more installation instructions, visit http://www.bioconductor.org/install/. (For this protocol, the current release version of R is 2.14, and the currently released Bioconductor version is 2.9.)

          Before You Begin: Folders

          You should establish a series of folders to manage your sequencing data and move around after each step is completed. You don’t necessarily have to follow this system of folders and subfolders, but all of our instructions for installing programs are based on this file hierarchy, so if you want to avoid confusion, and jump on our awesome folder-managing bandwagon, then read carefully!

          Here is our system of folders and subfolders:

          We have a main working directory called "ngs" in our $HOME directory (Users/YourUserName/). This is our home base for data analysis, and all of our steps and scripts will be called from this folder. Here are our subfolders within “ngs”:
          • ngs/{applications,bwa,run_parameters,run_parameters,scripts,temp,tophat,tophat_fusion}
          • ngs/analyzed_read_files/{chipseq,exomes,genomes,matepair,rnaseq}
          • ngs/finaloutputs/{chipseq,exomes,genomes,matepair,rnaseq}
          • ngs/refgenomes/{bfast_indexed,bowtie_indexed,bwa_indexed,downloads}
          • ngs/refgenomes/downloads/{ncbi36_hg18,grch37_hg19}
          • ngs/refgenomes/downloads/ncbi36_hg18/{annotation_files,reference_sequences}
          • ngs/refgenomes/downloads/grch37_hg19/{annotation_files,reference_sequences}


          Each of these subfolders have subfolders, so instead of listing everything here, please visit the script “create_ngs_directorystructure_v4.sh” (http://seqanswers.com/forums/showthr...?t=4589&page=4, post #61) for more information. (To run the script, simply copy and paste the code included in that post when you immediately start Terminal, or when you are in the home directory. The corresponding files and folders will be created.)

          Before You Begin: Picking Genome Files

          [Maq is no longer included in this protocol because of recent improvements to BWA. If you need to install Maq, please see Jon’s preceding post on how to install it.]

          This step is important and can be the source of most issues. You need to pick a source for all information genome sequence files and annotations. We use ensembl over UCSC for many reasons. For human genome reference files, we recommend the 1000 genomes versions. They think about the human genome much more than you do, so give them some credit. Besides, many of the applications you will use are published by those groups, so running them is streamlined and less complicated.

          We will be using BWA to align our sequencing data against the reference genome (see BWA installation instructions under “Installing Programs”). You might think to use ensembl (http://www.ensembl.org/info/data/ftp/index.html) to get your reference genome, but the full human genome file (Homo_sapiens.GRCh37.66.dna_rm.toplevel.fa.gz) exceeds the maximum character length allowed by BWA’s index command.

          Instead, you should use the 1000 Genomes reference genome (ftp://ftp.sanger.ac.uk/pub/1000genom...ect_reference/). You need to save the reference genome onto your computer:
          • Copy the file human_g1k_v37.fasta.gz” to your “ngs/refgenomes” folder.
          • Decompress the file by double clicking on it.


          Installing Applications

          Welcome to the meat-and-potatoes of this somewhat bloated post: program installation. This section has been written in chronological order, meaning that I started with the first program and proceeded onward to the last program. Some of the programs require that you have installed other programs, and unfortunately, unless explicitly mentioned, I don’t know which programs have those requirements.

          Therefore, I recommend you follow the same installation order for your own computer. This will certainly make things simpler for newbies like myself, especially since I included the commonly used programs (e.g. BWA) before the less commonly used programs (e.g. Cairo).

          As mentioned above, if you’re skipping around, I have written next to each application if it requires one of the preceding applications. However, I can’t be sure that this information is correct, so if you encounter a problem during installation, please let us know and we can amend our instructions.

          Final note: The version numbers of programs might be out-of-date, so please change the instructions based on those new version numbers. We will try to update this document periodically to avoid this problem, but you should be forewarned!

          Setting Your Path Directory

          To run many of the applications, you will need to either place the applications in the PATH, define additional PATH locations, or note the location of the application each time you call it. To find the current PATH directories used by Unix, type "$PATH". You should see something similar to the following:

          Code:
          -bash: /sw/bin:/sw/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/X11R6/bin
          These folders are directly below the root directory and represent the places Unix looks when running an application. If you want to run any of these applications, you must download and compile the application. Before you begin installing any application, do the following (thanks to Nils Homer for the suggestion):

          Create a directory in your home directory for the applications:
          Code:
          mkdir -p $HOME/local/bin
          Edit your .profile file so this directory is in your PATH directories (you should see a file called “.profile”.):
          Code:
          ls –a
          Open with nano by typing:
          Code:
          nano .profile
          Add the following lines to your .profile file but DO NOT remove things in the current version: (you don’t need “sudo”):
          Code:
          export PATH=$HOME/local/bin:$PATH
          Save your changes by typing "control O".
          Exit nano by typing "control X".

          Additionally, when you install applications, place the executable files in this directory so they are in a $PATH directory. You can either copy the application to the directory $HOME/local/bin or install using install script "./configure --prefix=$HOME/local”

          BWA (http://sourceforge.net/projects/bio-bwa/files/; change naming in instructions based on BWA version)
          NOTE: Reference indexes created in previous versions do not work in version 6 so you need to reindex each reference if you have worked with previous versions or more importantly if you are setting up and someone is providing you pre-index reference files
          • Click on the link above and download the newest version (called "bwa-0.6.1.tar.bz2").
          • Move the "bwa-0.6.1.tar.bz2" file to your "ngs/applications" folder.
          • Decompress the file by double clicking on it.
          • Open Terminal (if previously open, ensure you are in your home directory).
          • Navigate to the decompressed folder by typing:

          Code:
          cd ngs/applications/bwa-0.6.1
          Compile the application by typing:
          Code:
          make
          Lines of code will start appearing under your command. Make sure that no errors are listed!

          You can confirm that the installation was successful by typing:
          Code:
          ./bwa
          This should bring up a window with the BWA command options. (The first line is “Program: bwa (alignment via Burrows-Wheeler transformation)”.)

          Copy "bwa" to your path directory by typing:
          Code:
          cp bwa $HOME/local/bin
          Now Typing "bwa" into terminal at any point in any folder will launch the bwa program

          SAMtools (http://sourceforge.net/projects/samtools/; change naming in instructions based on SAMtools version)
          • Click on the link above and download the newest version (called " samtools-0.1.18.tar.bz2").
          • Move the "samtools-0.1.18.tar.bz2" file to your "ngs/applications" folder.
          • Decompress the file by double clicking on it.
          • Open Terminal (if previously open, ensure you are in your home directory).
          • Navigate to the decompressed folder by typing:


          Code:
          cd ngs/applications/samtools-0.1.18
          Compile the application by typing:
          Code:
          make
          Lines of code will start appearing under your command. Make sure that no errors are listed!

          You can confirm that the installation was successful by typing:
          Code:
          ./samtools
          This should bring up a window with the SAMtools command options. (The first line is “Program: samtools (Tools for alignments in the SAM format)”.)

          Copy "samtools" and other valuable applicationsto your path directory by typing:
          Code:
          cp samtools $HOME/local/bin
          cp bcftools $HOME/local/bin
          cp vcfutils.pl $HOME/local/bin
          (We are assuming you followed our path directory here. If not, then change “$HOME/local/bin” to your location of choice.)

          Note: To save space, we have reduced the number of specific instructions, so instead of writing the exact lines of code required for commands, we will simply summarize them. This applies to decompressing the file, navigating to the decompressed folder, compiling the application, and copying to your path directory.

          GATK (ftp://ftp.broadinstitute.org/pub/gsa...latest.tar.bz2)

          According to the website (http://www.broadinstitute.org/gsa/wi...ading_the_GATK, “Outside the Broad Institute”), before you install GATK, you need to install three applications: JVM (Java Virtual Machine), Apache Ant, and Git. GATK requires that your version of JVM is 1.6 or greater, and your version of Apache Ant is 1.7.1 or greater.

          JVM (Java Virtual Machine)

          You should have JVM already installed on your computer. To confirm this, open Terminal and type:
          Code:
          java –version
          Three lines of code should appear, starting with java version “1.6.0_29”. To update Java, search “Java” on the Apple website and find the most recent version that corresponds to your operating system.

          Ant (http://ant.apache.org/)

          You should already have Apache Ant installed on your computer. To confirm this, open Terminal and type:
          Code:
          ant –version
          You should see something like this: “Apache Ant(TM) version 1.8.2 compiled on October 14 2011”. If that doesn’t work, here’s how to install Ant manually:

          Click on the link above and download the latest version. (This version will probably be “apache-ant-1.8.2-bin.tar.bz2”.)
          Move the “apache-ant-1.8.2-bin.tar.bz2” file to your “ngs/applications” folder.
          Decompress the file.
          Follow the somewhat complex instructions in the manual. To access the manual, click on the decompressed folder and look under docs/manual/install.html.

          Git (http://git-scm.com/download)

          Click on the link above and download the latest version. (This version will probably be “git-1.7.9.1-intel-universal-snow-leopard.dmg”.)
          Install like any ordinary Mac application. (You thought it would be more complicated, right? You’re welcome!)

          Now, onto installing GATK:

          Click on the link above.
          Move the “GenomeAnalysisTK-latest.tar.bz2” file to your "ngs/applications" folder.
          Decompress the file and navigate to it.

          To confirm this, Type in “java –jar GenoneAnalysisTK.jar --help”. (Do not copy this text! You will need to handtype it.)

          You should see a message like: The Genome Analysis Toolkit (GATK) v1.4-30-gf2ef8d1, Compiled 2012/02/17 20:18:04.


          Bowtie (http://sourceforge.net/projects/bowtie-bio/files/bowtie)

          Click on the link above and download the latest version. (This version will probably be “bowtie-0.12.7-src.zip”.)
          Move the “bowtie-0.12.7-src.zip” file to your “ngs/applications” folder.
          Decompress the file and navigate to it.
          Compile the application (“make”).
          Copy "bowtie", "bowtie-build", and "bowtie-inspect" to your path directory.

          To test the installation, navigate to the bowtie folder and type:
          Code:
          bowtie indexes/e_coli reads/e_coli_1000.fq
          You should see a bunch of information stream onto the screen, and at the bottom, you should see:

          Code:
          # reads processed: 1000
          # reads with at least one reported alignment: 699 (69.90%)
          # reads that failed to align: 301 (30.10%)
          Reported 699 alignments to 1 output stream(s)
          Boost (http://www.boost.org/)[Prerequisites: SAMtools, $PATH configuration.]

          WARNIHG: Do not download the newest version of Boost (i.e., version 1.48.0)! This version will not natively work with this protocol. Instead, install any earlier version of Boost—we recommend version 1.47.0—and follow the instructions below. (For more information, and instructions on how to modify the latest version of Boost, please visit: http://seqanswers.com/forums/showthread.php?t=16637.)

          Click on the link above and download the latest version. (MAKE SURE THIS IS VERSION “boost_1_47_0.tar.bz2” OR EARLIER.)
          Move the “boost_1_47_0.tar.bz2” file to your “ngs/applications” folder.
          Decompress the file and navigate to it.
          Build/bootstrap the package by typing:
          Code:
          ./bootstrap.sh
          Type in the following command:
          Code:
          ./bjam --prefix=$HOME/local --toolset=darwin architecture=x86 address-model=32_64 link=static runtime-link=static --layout=versioned stage install
          This command will take awhile, so take your coworkers out for cappuccinos or something while you wait. Once it’s finished, the command will create “include” and “lib” subfolders in $HOME/local. You might get some error messages for which targets failed or were skipped, but ignore that because it won’t affect your other applications.
          In the new "include" folder, create a subfolder "bam".
          Using Terminal, navigate to the SAMtools folder within ngs/applications.
          Copy the "libbam.a" file in the SAMtools folder to $HOME/local/lib by typing:
          Code:
          cp libbam.a $HOME/local/lib
          Copy the header files (files ending in .h) in the SAMtools folder to $HOME/local/include/bam by typing:
          Code:
          cp *.h $HOME/local/include/bam
          Tophat (http://tophat.cbcb.umd.edu/) [Prerequisites: Bowtie, SAMtools.]


          Click on the link above and download the latest version. (This version will probably be “tophat-1.4.1.tar.gz”. Click on the option that says “Source Code.”)
          Move the “tophat-1.4.1.tar.gz” file to your “ngs/applications” folder.
          Decompress the file and navigate to it.
          Build the package by typing
          Code:
          ./configure --prefix=$HOME/local --with-bam=$HOME/local
          [/li]

          Compile the application (by typing “make”).
          Make the executable available in your $PATH directory by typing:
          Code:
          make install
          To test the Tophat installation, please visit the download website (http://tophat.cbcb.umd.edu/tutorial.html; search under “Testing the installation”) and follow these instructions:

          Click on the link above and download the file. (This file will probably be “test_data.tar.gz”.
          Decompress the folder and navigate to it.
          To process the data, type:
          Code:
          tophat -r 20 test_ref reads_1.fq reads_2.fq
          You should see lines of code after your command, beginning with something like the following:
          Code:
          [Mon May  4 11:07:23 2009] Beginning TopHat run (v1.1.1)
          -----------------------------------------------
          Cufflinks (http://cufflinks.cbcb.umd.edu/tutorial.html) [Prerequisites: Boost (SAMtools).]

          Click on the link above and download the latest version. (This version will probably be “cufflinks-1.3.0.tar.gz”. Click on the option that says “Source Code.”)
          Move the “cufflinks-1.3.0.tar.gz” file to your “ngs/applications” folder.
          Decompress the file and navigate to it.
          Build the package (with Boost, so different from Tophat instructions!) by typing
          Code:
          ./configure --prefix=$HOME/local --with-boost=$HOME/local --with-bam=$HOME/local
          [/li]

          Compile the application (by typing “make”).
          Make the executable available in your $PATH directory by typing:
          Code:
          make install
          To test the installation, you will need to download the cufflinks test data (http://cufflinks.cbcb.umd.edu/tutorial.html#ref; look under “Testing the installation). You can download the test text file anywhere (e.g. within your username folder) and navigate to that folder.

          Process the test data by typing:
          Code:
          cufflinks test_data.sam
          You should see the following at the beginning of your output:
          Code:
          You are using Cufflinks v1.3.0, which is the most recent release.
          [bam_header_read] EOF marker is absent. The input is probably truncated.
          VarScan (http://varscan.sourceforge.net/) (Prerequisites: Samtools?)

          [ol]
          [li]Click on the link above and download the latest version. (This version will probably be “VarScan.v2.2.8.jar”.)[/li]
          [li]Move the “VarScan.v2.2.8.jar” file to your “ngs/applications” folder.[/li]
          [li]Navigate to your “applications” folder.[/li]
          [/ol]

          To test the installation, type:

          Code:
          java -jar VarScan.v2.2.8.jar
          You should see the following at the beginning of your output:

          Code:
          VarScan v2.2
          
          USAGE: java net.sf.varscan.VarScan [COMMAND] [OPTIONS]
          Picard (http://picard.sourceforge.net/)

          [ol]
          [li]Click on the link above and download the latest version. (This version will probably be “picard-tools-1.62.zip”.)[/li]
          [li]Move the “picard-tools-1.62.zip” file to your “ngs/applications” folder.[/li]
          [li]Decompress the file.[/li]
          [li]Copy all .jar applications to your $PATH directory by typing:

          Code:
          cp .jar $HOME/local/bin
          While this step isn’t required, it makes things easier and the pipelines we provide use this concept.[/li]
          [/ol]

          snpEff (http://snpeff.sourceforge.net/download.html)

          To install snpEff, you must install both the program and the corresponding reference genome. These instructions include installing the most recent human genome from Ensembl (which is provided on their website). If you use a different genome, make sure that your genome version matches your snpEff version. (In other words, in this example, the genome version is for “v2_0_5” and the snpEff version is for “v2_0_5d”.)

          [ol]
          [li]Click on the link above and download the latest version of snpEff. (This version will probably be “snpEff_v2_0_5d_core.zip”.)[/li]
          [li]Move the “snpEff_v2_0_5d_core.zip” file to your “ngs/applications” folder.[/li]
          [li]Decompress the file.[/li]
          [li]In the link above, download the latest version of the reference genome (This version will probably be “snpEff_v2_0_5_GRCh37.65.zip”.)[/li]
          [li]Move the “snpEff_v2_0_5_GRCh37.65.zip” file to your “ngs/applications” folder.[/li]
          [li]Decompress the file.[/li]
          [/ol]


          At this point, you’re probably feeling comfortable with these instructions, maybe even patting yourself on the back for understanding them. Well, prepare for more confusion, because we’re entering the wonderful seafaring world of ports!

          For the following applications, you will need to install additional ports on your computer. There are two websites you can use to install them: MacPorts (http://www.macports.org/) and fink (http://www.finkproject.org/). The difference between them is that, in general, fink is more conservative about upgrading packages than MacPorts, so while the MacPorts version will be newer, the fink version might be more stable. We selected MacPorts for installing our packages, so our instructions will be tailored toward that program.

          MacPorts (http://www.macports.org/install.php) [Prerequisites: XCode.]

          To install MacPorts, please visit that website. Choose your operating system under the “Mac OS X Package (.pkg) Installer” section. Install like any ordinary software application.

          To test the installation, close Terminal, meaning completely quit the application, and restart to run MacPorts. To begin the program, type in “sudo port”. You should see:

          Code:
          MacPorts 2.0.3
          Entering interactive mode... ("help" for help, "quit" to quit)
          To install any port, type:

          Code:
          install program
          where “program” is name of port you’re installing. This is the method for installing any of the ports used by the subsequent applications. As the program indicates, to exit MacPorts, type “quit” and press enter.

          FastX (http://hannonlab.cshl.edu/fastx_toolkit/download.html)

          MacPorts: Install “pkgconfig”. (The program is called “pkgconfig 0.26”, found on page 171 of the MacPorts website.)

          [ol]
          [li]Click on the link above and download libgtextutils. (This version will probably be “libgtextutils-0.6.tar.bz2”.)[/li]
          [li]Move the “libgtextutils-0.6.tar.bz2” file to your “ngs/applications” folder.[/li]
          [li]Decompress the file and navigate to it.[/li]
          [li]To install the program, like installing previous programs, type in “./configure” and press enter.
          [li]To compile the application completely, type in “make” and press enter, then type in “sudo make install” and press enter.
          [li]Make sure the program can identify gtextutils” by typing:

          Code:
          export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:$PKG_CONFIG_PATH
          [/li]
          [li]Once that command is processed, type:

          Code:
          pkg-config --cflags gtextutils-0.1
          You should see the following response:

          Code:
          -I/usr/local/include/gtextutils-0.1/
          (If you have any questions about this step, or have any troubleshooting concerns about installing this application, please visit: http://hannonlab.cshl.edu/fastx_tool...nfig_email.txt.)[/li]

          [li]Click on the link above and download the latest version of FastX. (This version will probably be “fastx_toolkit-0.0.13.tar.bz2”.)[/li]
          [li]Move the “fastx_toolkit-0.0.13.tar.bz2” file to your “ngs/applications” folder.[/li]
          [li]Decompress the file and navigate to it.[/li]
          [li]To install the program, like installing previous programs, type in “./configure”, then “make”, then “make install”.[/li]
          [/ol]
          Circos (http://mkweb.bcgsc.ca/circos/software/download/)
          Before installing Circos, you will need to update your perl distribution to install all of Circos’s required packages. To install the packages, type the following in Terminal:

          Code:
          sudo perl -MCPAN -e shell
          When it asks if you would like the program to configure things automatically, and choose the best CPAN mirror sites, type “yes”.

          To install any package, type:

          Code:
          install program
          where “program” is name of package you’re installing. Before installing these packages, however, you will need to install GD. (I know, it’s like Inception, with a program installation within a program installation within a program….)

          GD (http://code.google.com/p/google-desk...tar.gz&can=2&q)

          [ol]
          [li]Click on the link above and download the latest version. (This version will probably be “gd-2.0.35.tar.gz”.)[/li]
          [li]Move the “gd-2.0.35.tar.gz” file to your “ngs/applications” folder.[/li]
          [li]Decompress the file and navigate to it.[/li]
          [li]To install the program, like installing previous programs, type in “./configure”, then “make”, then “sudo make install”.[/li]
          [/ol]

          Here is a list of the packages you will need to install (please install them in the following order because some of the packages require other packages):

          YAML
          Config::General (v2.50 or later)
          GD::Polyline (requires YAML)
          List::MoreUtils
          Math::Bezier
          Math::Round
          Math::VecStat
          Params::Validate
          Readonly
          Regexp::Common
          Set::IntSpan (v1.16 or later)
          Clone
          Text::Format

          Also, if you get the message that says something like this:

          Code:
          New CPAN.pm version (v1.9800) available.
            [Currently running version is v1.9456]
          then type “install CPAN”, then “reload CPAN”, to update to the latest CPAN version. (This process takes a couple minutes.)

          All right, here are the instructions for installing Circos:

          [ol]
          [li]Click on the link above and download the bug fixes version. (This version will be something like“circos-0.56-1.tgz”.)[/li]
          [li]Move the “circos-0.56-1.tgz” file to your “ngs/applications” folder.[/li]
          [li]Decompress the file.[/li]
          [li]Click on the link above and download the latest version. (This version will probably be “circos-0.56.tgz”.)[/li]
          [li]Move the “circos-0.56.tgz” file to your “ngs/applications” folder.[/li]
          [li]Decompress the file.[/li]
          [li]Drag the decompressed file within the bug fixes version into the file of the latest Circos version. When prompted, choose “replace file”.[/li]
          [/ol]

          To test the Circos installation, please visit this website (http://circos.ca/software/download/tutorials/) and follow these instructions:

          [ol]
          [li]Click on the link above and download the tutorial file (This version will be something like“circos-tutorials-0.56.tgz”.)[/li]
          [li]Move the “circos-tutorials-0.56.tgz” file to your “ngs/applications” folder.[/li]
          [li]Decompress the file.[/li]
          [li]Drag the decompressed tutorial file into the file of the latest Circos version. When prompted, choose “replace file”.[/li]
          [li]Navigate to the “circos-0.56” folder.[/li]
          [li]Access the tutorial by typing:

          Code:
          cd tutorials/2/2
          [/li]
          [li]Test the tutorial by typing:

          Code:
          ../../../bin/circos -conf ./circos.conf
          [/li]
          [/ol]

          You should see a series of commands flash onto the screen, eventually ending with:

          Code:
          debuggroup summary,output 4.85s created PNG image ./circos.png (839 kb)
          debuggroup summary,output 4.86s created SVG image ./circos.svg (356 kb)
          If you navigate to that folder manually (“circos-0.56/tutorials/2/2”) and click on the “circos.png” file, you should see a circular graph of each human chromosome in different colors.

          Finally, we copied the binary and library files to your path directory so you can just type "circos" instead of "bin/circos" each time you run the program. If you follow our folder hierarchy, then type the following commands in sequential order:

          Code:
          cd ngs/applications/circos-0.56/bin
          cp circos $HOME/local/bin
          cd ../lib
          cp circos.pm $HOME/local/lib
          Also, within the circos folder, to create a couple directories for your personal use, type the following commands in sequential order:

          Code:
          cd ngs/applications/circos-0.52
          mkdir my_plots
          mkdir my_reference_files
          mkdir my_config_files
          mkdir my_data_files
          Once you’ve created those directories, you need to populate your reference files. (For more information, please visit: http://circos.ca/tutorials/.) When you visit that website, you can download the hg19 karyotype, decompress the corresponding file, and drag it into your newly created “my_reference_files” folder.
          BEDTools (http://code.google.com/p/bedtools/)

          [ol]
          [li]Click on the link above and download the latest version. (This version will probably be “BEDTools.v2.15.0.tar.gz”.)[/li]
          [li]Move the “BEDTools.v2.15.0.tar.gz” file to your “ngs/applications” folder.[/li]
          [li]Decompress the file and navigate to it. (NOTE: The file will be renamed to something like “BEDTools-Version-2.15.0”.)[/li]
          [li]To install the program, type in “make clean, then “make all”. You should see a series of commands being processed.[/li]
          [li]To list the available binaries and confirm that they installed, type “ls bin”. You should see columns of files beginning with “annotateBed” in the upper lefthand corner and ending with “windowMaker” in the lower righthand corner.[/li]
          [li]Copy the binaries to your PATH directory by typing:
          Code:
          cp bin/* $HOME/local/bin
          [/li]
          [/ol]
          Pairoscope (http://pairoscope.sourceforge.net/) [Prerequisite: SAMTools]
          Truthfully, installing this program is difficult, so brace yourselves, folks. Or as Samuel Jackson says in Jurassic Park, “hold onto your butts.”

          Before installing pairoscope, you need to install Cairo. To install Cairo, type:

          Code:
          sudo port install cairo
          You should get the following response:

          Code:
           --->  Computing dependencies for cairo
          --->  Cleaning cairo
          Also, before installing pairoscope, you need to install CMake (http://www.cmake.org/cmake/help/install.html). To install the program, click on the link above and download the latest version. (This version will probably be “cmake-2.8.7-Darwin64-universal.dmg”.) Simply install like you would a normal application. (Oh, and when the bouncing colorful triangle appears on your Dock, click to “install command line links”.)

          Finally, here are the instructions to install pairoscope:

          [ol]
          [li]Click on the link above and download the latest version. (This version will probably be “pairoscope-0.2.tgz”.)[/li]
          [li]Move the “pairoscope-0.2.tgz” file to your “ngs/applications” folder.[/li]
          [li]Decompress the file and navigate to the applications folder.
          To install pairoscope, type:

          Code:
          ccmake pairoscope-0.2
          The screen will transform and you will see a series of capitalized instructions on the left and corresponding answers written in white text on the right. To toggle advanced mode, type “t”.

          Scroll all the way down with the arrow keys until you reach “Page 2 of 2”. (NOTE: The following series of instructions are based on our folder architecture, and assume that you followed our instructions for installing SAMTools. If your folder architecture is different, please point ccmake to your corresponding SAMTools directories.)

          To edit the Samtools include and library locations, follow these instructions:

          [ul]
          [li]Under “Samtools_INCLUDE_DIR”, type “/-----/local/include/bam”.[/li]
          [li]Under “Samtools_LIBRARY”, type “/-----/local/lib/libbam.a”.[/li]
          [/ul]

          where “-----“ is the exact folder hierarchy of your computer. (To access that exact hierarchy, type in “cd” in the command line and type in “pwd”. The resulting line of code should be pasted into the “-----“ section described above.)

          To configure, type “c”. You should see a warning appear that starts with:

          Code:
          CMake Warning (dev) in CMakeLists.txt:
          You can ignore this warning, so type “e”. To generate and exit, type “g”.

          Now, pairoscope is ready. To make pairoscope, navigate to the “applications” folder and type:

          Code:
          cmake pairoscope-0.2
          You should see a series of commands ending with:

          Code:
           -- Build files have been written to: /-----/ngs/applications
          where the “-----“ is the same prefix described above.

          A new folder called “CMakeFiles” has been created in the “applications” folder. To make, navigate to the “applications” folder and type “make”. You will see a bunch of purple and green commands beginning with:

          Code:
          Scanning dependencies of target pairoscope
          Copy the newly-created pairoscope program to your $PATH by typing:

          Code:
          cp pairoscope $HOME/local/bin
          To test the installation, type “pairoscope”. You should see a series of commands beginning with the following:

          Code:
          Usage:   pairoscope [options] <align.bam> <chr> <start> <end> <align2.bam> <chr2> <start2> <end2>
          FastQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/)
          [ol]
          [li]Click on the link above and download the latest version. (This version will probably be “Source Code for FastQC v0.10.0 (zip file)”. Please download the Source Code version.)[/li]
          [li]Move the “fastqc_v0.10.0_source.zip” file to your “ngs/applications” folder.[/li]
          [li]Decompress the file and navigate to it. (NOTE: The file will be renamed “FastQC”.)[/li]

          And that’s it! (Seriously! According to the installation files: “Once unzipped it's ready to go.”)
          HTSeq (http://pypi.python.org/pypi/HTSeq)
          [ol]
          [li]Click on the link above and download the latest version. (This version will probably be “HTSeq-0.5.3p3.tar.gz”.)[/li]
          [li]Move the “HTSeq-0.5.3p3.tar.gz” file to your “ngs/applications” folder.[/li]
          [li]Decompress the file and navigate to it.[/li]
          [li]Install the program by typing:
          Code:
          sudo python setup.py install
          [/li]
          [/ol]

          You should see a series of commands being processed and ending with:
          Code:
          Finished processing dependencies for HTSeq==0.5.3p3
          (For more information about program installation, please visit: http://www-huber.embl.de/users/ander.../overview.html.)
          chimerascan (http://code.google.com/p/chimerascan/)
          [ol]
          [li]Click on the link above and download the latest version. (This version will probably be “chimerascan-0.4.5a.tar.gz”.)[/li]
          [li]Move the “chimerascan-0.4.5a.tar.gz” file to your “ngs/applications” folder.[/li]
          [li]Decompress the file and navigate to it.[/li]
          [li]Build the program by typing:
          Code:
          python setup.py build
          [/li]
          [li]Install the program by typing:
          Code:
          sudo python setup.py install
          [/li]

          To test the installation, you need to access python. To do that, leave the directory (you can type “cd ../” to move into the “applications” folder) and type:

          Code:
          python
          You should see something like:

          Code:
          Python 2.6.1
          Type "help", "copyright", "credits" or "license" for more information.
          To test that the chimerascan libraries are in your PYTHONPATH, type “import chimerascan”, then “chimerascan.__version__”. (Just in case that last command is obscured, you should type in “chimerascan” followed by a period, followed by two underscores, then “version”, then two underscores.) You should see the following:

          Code:
          '0.4.5'
          Success! To exit python, type:

          Code:
          exit()
          Congratulations! You now have a working computer that can handle just about any sequencing data you throw into it!
          If you have any problems during the installation process, I recommend that you search online for the error message you received. That’s how I managed to resolve many of the difficulties I encountered during this whole process.
          Additionally, you should read the README files (you can by typing “less README” when you are in the program’s directory) when you have problems, because they might give you helpful information about what’s going wrong with that program.
          Finally, please remember that this document is a work in progress. Right now, we have created a system that can manage the installation of the current application versions, but these versions often change, and with those changes come new program requirements or permissions. If you encounter any problems with future versions, please respond to this thread (preferably with a solution!) and we will make the corresponding updates.
          (This document was made with help from Venkata Yellapantula.)
          Last updated: March 7, 2012
          Last edited by Jon_Keats; 09-02-2012, 12:01 PM.

          Comment


          • #80
            HI JOn,
            have you ever used cummeRbund?I am new to R, linux and everything that is not windows...I like your posts and was wondering if you could dispense with some usefull commands...

            best,
            irene

            Comment


            • #81
              Somatic Variant Calling

              This is a bit of a shameless plug for one of our institutional tools developed for somatic variant calling. In our experience it is one of the best tools for paired tumor-normal/constitutional identification of somatic variants.

              Check it out:
              Access Google Sites with a personal Google account or Google Workspace account (for business use).


              On other notes, we have been doing a decent bit of testing comparing Tophat and STAR for RNA alignments. We should be posting more in the future but here are my take aways
              1) Using updated genomes and GTF dramatically improves the TOPHAT alignment rates (we moved to the new 1000G reference hs37d5 from the inital version and from ensembl 64 to ensembl 70)
              2) STAR is smoking fast
              3) STAR does a much better job of aligning indels, particularly large indels (this might be a tophat settings issue, it doesn't seem that we really leverage the bowtie2 gapped alignments in tophat... I could be wrong)
              4) Expression estimates of same samples aligned with STAR or TOPHAT are nearly identical (r2>0.97)

              Comment


              • #82
                Originally posted by Jon_Keats View Post
                Somatic Variant Calling

                This is a bit of a shameless plug for one of our institutional tools developed for somatic variant calling. In our experience it is one of the best tools for paired tumor-normal/constitutional identification of somatic variants.

                Check it out:
                Access Google Sites with a personal Google account or Google Workspace account (for business use).


                On other notes, we have been doing a decent bit of testing comparing Tophat and STAR for RNA alignments. We should be posting more in the future but here are my take aways
                1) Using updated genomes and GTF dramatically improves the TOPHAT alignment rates (we moved to the new 1000G reference hs37d5 from the inital version and from ensembl 64 to ensembl 70)
                2) STAR is smoking fast
                3) STAR does a much better job of aligning indels, particularly large indels (this might be a tophat settings issue, it doesn't seem that we really leverage the bowtie2 gapped alignments in tophat... I could be wrong)
                4) Expression estimates of same samples aligned with STAR or TOPHAT are nearly identical (r2>0.97)
                Create a separate post too otherwise it may not be noticed by people who are looking for a tool like this. Someone had a recent post with just this question.

                Comment


                • #83
                  Its been a very long time since I've updated this thread. I've recently added a couple of new MAC computers that I use for development and tertiary analysis. And I wanted an updated build protocol to install all the different applications we have outlined in earlier versions of this thread. If you want to see the updated build instructions they are available on my lab website (http://www.keatslab.org/computation/...ure-a-machine1)

                  Comment


                  • #84
                    For those people following this thread.

                    We are recruiting Post-Doctoral Fellows for one of our large cancer genomics programs. If you are interested in working in a cutting edge research institute focused on translational genomics and leading a major project please send a cover letter and CV to Dr Keats. (www.keatslab.org)(www.tgen.org)

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    66 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X