Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Seems like a pity why you pulled hadoop out of Goby. Any compelling reason?

    Also for the Aligner portion so far it only supports BWA ?
    http://kevin-gattaca.blogspot.com/

    Comment


    • #17
      Aligners support in Goby

      In addition to BWA, Goby also supports the Last aligner. Last can be downloaded from http://last.cbrc.jp/archive/last-96.zip (see http://last.cbrc.jp/ for information).
      Last is slower than bwa for short reads, but is very useful to align longer reads (i.e., in the several hundred bases and more). Last is also useful if you need very sensitive searches (e.g., align reads that match only in a third of their length, but are still mapping in a single location in the genome).

      In version 1.4, we added the ability to run a released version of last (we tested version 96). You should be able to switch the aligner by replacing bwa below, with last:
      java -jar goby.jar --mode align --aligner bwa --search

      This requires that you modify the configuration file config/goby.properties and specify where you have installed last on your system. Uncomment and change this line:
      #executables.path.last = /usr/local/last-96/src

      Please note that the developers of last have recently changed some of the file formats they generate. Goby has not yet been adapted to these changes, so for instance Last version 99 is known not to work with Goby 1.4.

      Comment


      • #18
        Is the Goby use the default setting of the bwa for the alignment? Thanks again.

        Comment


        • #19
          Hi Alex,

          For the most part the default options for bwa are used except for the seed length which gets set to 35 (-l 35) and the color space option to "bwa" (-c) will be set appropriately depending on whether or not the --color-space flag is present in the AlignMode option. Options can be passed directly to bwa by specifying the "--options" flag on the goby command line. This takes a comma separated list of options that will get passed onto the native aligner. So for example if you wanted to pass "-l 30 -d 8 -k 2" to bwa you would add "--options l=30,d=8,k=2". The same can be done for the last aligner.

          If you to add these to the sample listed at http://icbtools.med.cornell.edu/goby/ as follows:

          $ java -Xmx3g -jar goby.jar --mode align --aligner bwa --search --database-name chr1-index --reference data/reference/mm9/chr1.compact-reads --database-directory data/reference-index/mm9 --reads data/reads/goby-mouse-reads-sample.compact-reads --basename goby-sample --options l=30,d=8,k=2

          To verify actual bwa execution you should see all the options passed to bwa in the goby console log as something like:

          INFO BWAAligner - About to execute nice /home/marko/bwa-0.5.5/bwa aln -l 30 -d 8 -k 2 chr1-index goby-mouse-reads-sample.fasta

          Please let me know if you have any other questions.
          -- Marko

          Comment


          • #20
            Actually, the "issue" has more to do with differences naming conventions for chromosomes in the reference datasets used. The mouse reference file included with Goby was created from the UCSC MM9 dataset ("chr1", "chr2", "chr3", etc.) and the biomart annotation file uses the NCBI/ENSEMBL dataset ("1", "2", "3", etc.). The alignment to annotations mode doesn't output any results due to the naming differences. As Fabien mentioned, prepending chromosome names with "chr" should fix most of the entries, however this is not appropriate for all the name mappings.

            For example, 1 --> chr1, 2 --> chr2, X --> chrX, but MT --> chrM and as far as I can tell NT_166323 doesn't map anywhere. A more appropriate way to create the annotation file from the NCBI file would be:

            Code:
            sed -e 's/^\([0-9]\+\t\|X\t\|Y\t\)/chr\1/' -e 's/^MT\t/chrM\t/' data/biomart-mouse-exons-ensembl55-genes-NCBIM37.txt >data/biomart-mouse-exons-ensembl55-genes-NCBIM37-chr-fix.txt
            In the next release of Goby we will make this more clear in the documentation and add appropriate warning messages if no data is output from that mode.

            Originally posted by Fabien Campagne View Post
            Thanks for the detailed log. I was able to reproduce the problem with Goby version 1.4 and the files we distribute as examples. The problem is caused by an issue we fixed after 1.4.

            You can work around this issue by inserting the string "chr" in front of the chromosome id in the annotation file. The following awk script does the trick:

            awk '{print "chr"$0} ' data/biomart-mouse-exons-ensembl55-genes-NCBIM37.txt >data/biomart-mouse-exons-ensembl55-genes-NCBIM37-chr-fix.txt

            java -Xmx3g -jar goby.jar --mode alignment-to-annotation-counts goby-sample.entries --annotation data/biomart-mouse-exons-ensembl55-genes-NCBIM37-chr-fix.txt --include-annotation-types gene


            This command should then result in a file such as:

            head goby-sample.ann-counts.tsv
            basename main-id secondary-id type chro strand length start end in-count over-count RPKM log2(RPKM+1) expression num-exons
            goby-sample ENSMUSG00000073741 gene chr1 -1 681 6204693 6205373 2 2 39.0966 5.32541 2 1
            goby-sample ENSMUSG00000047021 gene chr1 -1 33520 74948654 74982173 3 3 5.50402 2.70133 1 41
            goby-sample ENSMUSG00000050625 gene chr1 -1 390 183440545 183440934 0 0 0.00000 0.00000 0 1
            goby-sample ENSMUSG00000064612 gene chr1 1 78 63225251 63225328 0 0 0.00000 0.00000 0 1
            goby-sample ENSMUSG00000049690 gene chr1 -1 916996 127810214 128727209 33 33 30.0156 4.95492 4 34
            goby-sample ENSMUSG00000047053 gene chr1 1 1267 155738922 155740188 0 0 0.00000 0.00000 0 1
            goby-sample ENSMUSG00000047067 gene chr1 1 1440 94803566 94805005 5 5 51.2409 5.70711 5 2
            goby-sample ENSMUSG00000047539 gene chr1 -1 28505 184243233 184271737 47 47 127.352 7.00397 39 5
            goby-sample ENSMUSG00000025774 gene chr1 -1 30712 18105272 18135983 0 0 0.00000 0.00000 0 32


            Please let us know if this work-around does not work with GRCh37 (I tested only NCBIM37). Goby 1.5 will work directly with annotation files as described previously. Sorry for the inconvenience.

            Comment


            • #21
              Hi,

              First I wanted to say thanks - this looks like a great tool! I'm trying to use Goby to filter out redundant reads in a very large data set. I just downloaded the package (tried 1.4 and 1.4.1), but am getting this error message for any method I try to run - Is there something I'm missing or also need to install? Machine is LINUX based and Java works for everything else I'm doing...

              Thanks!

              Command Line: java -jar goby.jar --help
              Exception in thread "main" java.lang.ClassFormatError: edu.cornell.med.icb.goby.modes.GobyDriver (unrecognized class file version)
              at java.lang.VMClassLoader.defineClass(libgcj.so.7rh)
              at java.lang.ClassLoader.defineClass(libgcj.so.7rh)
              at java.security.SecureClassLoader.defineClass(libgcj.so.7rh)
              at java.net.URLClassLoader.findClass(libgcj.so.7rh)
              at java.lang.ClassLoader.loadClass(libgcj.so.7rh)
              at java.lang.ClassLoader.loadClass(libgcj.so.7rh)
              at gnu.java.lang.MainThread.run(libgcj.so.7rh)

              Comment


              • #22
                Gcj?

                Originally posted by Ashok View Post
                Hi,

                First I wanted to say thanks - this looks like a great tool! I'm trying to use Goby to filter out redundant reads in a very large data set. I just downloaded the package (tried 1.4 and 1.4.1), but am getting this error message for any method I try to run - Is there something I'm missing or also need to install? Machine is LINUX based and Java works for everything else I'm doing...

                Thanks!

                Command Line: java -jar goby.jar --help
                Exception in thread "main" java.lang.ClassFormatError: edu.cornell.med.icb.goby.modes.GobyDriver (unrecognized class file version)
                at java.lang.VMClassLoader.defineClass(libgcj.so.7rh)
                at java.lang.ClassLoader.defineClass(libgcj.so.7rh)
                at java.security.SecureClassLoader.defineClass(libgcj.so.7rh)
                at java.net.URLClassLoader.findClass(libgcj.so.7rh)
                at java.lang.ClassLoader.loadClass(libgcj.so.7rh)
                at java.lang.ClassLoader.loadClass(libgcj.so.7rh)
                at gnu.java.lang.MainThread.run(libgcj.so.7rh)
                What version of Java are you running? Run

                java -version

                and report back.

                If you are using gcj (GNU Compiler for Java) I would recommend trying Sun's java. GCJ isn't a complete Java implementation and doesn't fully implement JDK 1.4 and only parts of JDK 1.5. We target JDK 1.5 and develop with Sun's JDK 1.6.

                Comment


                • #23
                  You are correct - I will update my machine accordingly. Thanks for the quick reply!

                  java -version
                  java version "1.4.2"
                  gij (GNU libgcj) version 4.1.1 20070105 (Red Hat 4.1.1-52)

                  Comment


                  • #24
                    Just to clarify what Kevin stated - Goby does in fact require version 1.6+ of the Sun JDK/JRE

                    Originally posted by Ashok View Post
                    You are correct - I will update my machine accordingly. Thanks for the quick reply!

                    java -version
                    java version "1.4.2"
                    gij (GNU libgcj) version 4.1.1 20070105 (Red Hat 4.1.1-52)

                    Comment


                    • #25
                      Goby 1.5

                      We've recently released Goby 1.5. New features include parsing and storage of sequence variations in the Goby file formats, non ambiguous storage of quality scores in reads format, support for the Bullard et al 2010 normalization method (and plugin infrastructure to add new normalization methods). The detailed change log is reproduced below.

                      Please note that the project is now hosted at http://goby.campagnelab.org. This new web site provides documentation for some of these new features.

                      * Added a mode to calculate counts and perform differential expression analysis for transcript
                      runs (alignment-to-transcript-counts). Transcript runs are performed against a cDNA library. They
                      finds matches through through exon-exon junctions represented in the input cDNA library. They are
                      a faster alternative to mapping the genome and exon-exon boundaries separately. Disadvantage is that
                      these searches will only map to transcripts represented in the input library.

                      * Changes to fasta-to-compact mode:
                      - Add parallel processing in fasta-to-compact mode. Use the --parallel flag to activate.
                      Will now only regenerate compact-reads that do not exist, or are older than the input file.

                      * Changes to CompactAlignmentToAnnotationCountsMode
                      - added new option --write-annotation-counts boolean, defaults to true. If set to
                      false the annotation counts intermediate files will not be written.
                      - Lines where "average count group *" values are ALL NaN or <= 0 will not be written
                      This makes it so lines that don't add anything to the output are just omitted.
                      - added new option --omit-non-informative-columns, defaults to false. If set to
                      true, columns in which all of the data is non-informative (values are ALL NaN or <= 0)
                      will be omitted.
                      - support for alternative global normalization methods. We currently provide an implementation of the
                      upper quartile normalization method by Bullard et al (BUQ) and the normalization method
                      provided in Goby 1.4 (CAC, normalize by the number of alignment record in a sample)
                      See the --normalization-methods argument. New normalization methods can be used with Goby by
                      creating an implementation of the edu.cornell.med.icb.goby.stats.NormalizationMethod interface,
                      and adding a jar on the classpath that defines a ServiceProvider (see build.xml goby-jar target
                      for an example of how this is done). When several normalization methods are given as an argument
                      to --normalization-methods Goby will produce derived statistics for each normalization method and
                      append them as new columns in the summary stats output. This makes it easy to compare alternative
                      normalization methods on the same dataset.

                      * Added support for sequence variations:
                      - changed the compact alignment format to support recording sequence variations.
                      - the new mode display-sequence-variations provides text output of sequence variations in several formats.
                      - the new mode sequence-variation-stats will print statistics about sequence variations found in a set of alignments.

                      * Added support for quality scores in reads format:

                      - Changed fasta-to-compact and compact-to-fasta to read and write with the Sanger or Illumina quality encoding.
                      - Modified aligners to indicate which format they require (bwa needs fastq format, lastag fasta format, lastal fastq format). This will need extensive testing as some of these changes can affect gobyweb.
                      We use the FASTQ-SANGER encoding to communicate with lastal.
                      We don't yet support the Solexa quality score encoding (it is a bit obsolete anyway).

                      Please note that the output format in compact-to-fasta now defaults to Fasta format.
                      This format has no quality scores, and consequently, we now never write quality scores
                      when Fasta is requested. The aligners that need quality scores must request FastaQ format
                      explicitely.
                      See also:


                      http://last.cbrc.jp/last/doc/last-manual.txt (look for FASTQ-SANGER)

                      * Other changes to the Compact formats:
                      - Store target/reference sequence lengths in the alignment header. This information is helpful when calculating
                      statistics such as RPKMs (transcript-level searches).
                      - Store constant query lengths as one integer. Goby 1.4.1- stored one length for each read. This can become very
                      memory consuming when the number of reads is very large. This change saves memory and storage.

                      Comment


                      • #26
                        Goby 1.6 (python API)

                        We've just released Goby 1.6. Of note is a python API to parse the Goby file formats (reads and alignments, with examples of use). See the complete change log at http://campagnelab.org/software/goby/change-log/

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Strategies for Sequencing Challenging Samples
                          by seqadmin


                          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                          03-22-2024, 06:39 AM
                        • seqadmin
                          Techniques and Challenges in Conservation Genomics
                          by seqadmin



                          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                          Avian Conservation
                          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                          03-08-2024, 10:41 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Yesterday, 06:37 PM
                        0 responses
                        7 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, Yesterday, 06:07 PM
                        0 responses
                        7 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-22-2024, 10:03 AM
                        0 responses
                        49 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 03-21-2024, 07:32 AM
                        0 responses
                        66 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X