Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    InterOp parsing in R and perl

    Hi all,

    I'm from Illumina and over the past few months, I've built a package in R and some scripts in perl to accurately parse the binary InterOp files.

    I'm not sure if you've fixed the C# DateTime problem specified by emartin (in fact, I remember receiving a question from tech support about this, and I think those words in red came directly from my email response back!), but this is done correctly in the R and perl scripts.

    These are unsupported, which means tech support will not be able to help you with them, but we've tested these internally and I've received approval from my manager to share them with those that ask. So please PM me if these scripts would be useful for you.

    Cheers,
    mchen1

    Comment


    • #17
      Hi Mchen1,

      Many thanks. As you might imagine, parsing the datetime isnt the most important metric, however it would be nice to get it in there eventually.
      I havent looked at that metric for quite a while, but ill get around in doing so relatively soon.
      Firstly I will integrate Metrix with our open source academic LIMS system GNomEx.

      Cheers,
      Bernd

      Comment


      • #18
        The CPAN module Bio::IlluminaSAV is a parser for Perl.

        Comment


        • #19
          Structural update

          Dear all,

          As the CommandProcessor module has been fully integrated, the parsing of the commands is much easier now.

          Several other things have changed.

          Added:
          • Generating and obtaining the QScore Distribution and TileMetrics can be done now by issuing a 'METRIC'-type command - command.setType("METRIC"). This can be output in three formats: Tab separated, XML or a POJO.
          • Detailed exceptions are thrown and logged for server and client. These range from EmptyResultSetCollection, Invalid Credentials, Missing Command Details, Unimplemented Commands used e.g.
          • FIX: Fixed an error when printing and passing on values of the type MutableLong. A reference was passed on instead of the actual value.
          • toXML, toString methods for several objects added.


          Todo:
          • METRIC: Intensity distribution
          • METRIC: Error Rate distribution
          • METRIC: Phasing and Prephasing


          Requested:
          • As several core facilities would like to access their metrics from outside their institute or workplace, a layer of security is needed before this can be opened up to the internet. Depending on future wishes I will first start of with a basic API key integration* - but a SSL key signing with certificates is not out of the question for future commits. This of course depends on the demand.


          *Preparations for API integrations have been made in the CommandProcessor (https://github.com/NKI-GCF/Metrix/bl...Processor.java)

          If you would like to see features added. Please do let me know.

          Bernd
          Last edited by Rhizosis; 06-18-2013, 01:20 AM.

          Comment


          • #20
            A quick update:

            Added:
            • METRIC: Corrected Intensity (+ Distribution)
            • METRIC: Prephasing / Phasing (+ Distribution)
            • METRIC: Index Metrics - Project, sample and index information combined with clusters per sample.
            • SERVER: Clients are now updated on the progress of their command. Mainly useful for large queries (retrieving multiple summaries or a specific state). The progress values and subject can also be obtained from the 'Update' object.


            Todo:
            • API Integration (+SSL)
            • METRIC: Error Rate distribution
            • METRIC: FWHM / Base intensity scores
            • PARSING: Introduce MetricFilter for advanced searching.
            • SERVER: Optimize storage / retrieval methods


            As always, do let me know if you would like to see a specific feature.

            Bernd

            Comment


            • #21
              FYI: an open source python package to do similar written by a colleage of mine at InVitae:


              ########
              Greetings all,

              I work at InVitae and we just publicly released a library called Illuminate.



              The purpose of Illuminate is to emulate the stats you see when you load a run data folder within Illumina SAV, providing programmatic access to these metrics for whatever purposes you may have -- data storage, analysis, automated machine monitoring, and so on.

              This is completely free, open source software (MIT License) written in Python with the intent to be used, tested, and improved upon by the bioinformatics community.

              Features:
              Simple command-line tool you can use to quickly inspect a run.
              Built to be easily integrated into other code.
              Easily extensible even if you think you are "not much of a programmer".
              Results standardized to pandas DataFrame objects (so if you know how to work in R, you can probably get up to speed quickly with this)

              Here's an example of the smallest python script you could get away with using this tool.

              Code:
              import illuminate
              myDataset = illuminate.InteropDataset('path/to/rundata/')
              print myDataset.meta
              print myDataset.IndexMetrics()
              print myDataset.TileMetrics()
              print myDataset.QualityMetrics()
              And here's an example of how you would use the command-line reporter to do the same thing:

              Code:
              python illuminate --meta --index --tile --quality /path/to/rundata
              You can even have illuminate open up in an interactive iPython shell, where the dataset will be loaded up into an InteropDataset object for you:

              Code:
              python illuminate -i /path/to/rundata
              Not all of the metrics objects are fully fleshed out yet, although all of the binary parsers are "feature complete" in that you can produce a data dictionary and a DataFrame from them.

              I'm hoping that some of you fine folks can pipe up and let me know what might be useful to you -- or better, submit contributions, bug reports, and so on that will help Illuminate become as full-featured as it needs to be.

              This library has been in our production pipeline for several months now, reporting on cluster density, quality, and yield so we can keep tabs on sequencing run quality in an automated fashion.

              If you use it, or you have questions about it, please comment here and let me know!

              Cheers,
              Naomi

              Comment


              • #22
                Hi Naomi,

                Even though I applaud and appreciate every application made by the bioinformatics community, would you be so kind to create a separate thread for the application you created?
                Just so I can keep this thread clean. Its not a problem at all if you post a link to your thread so people can find it.

                Thanks so much in advance

                Bernd

                Comment


                • #23
                  Hey I didn't post that, take it up with iamh20

                  Comment


                  • #24
                    Originally posted by nthmost View Post
                    Hey I didn't post that, take it up with iamh20
                    My apologies, as I read the message on my mobile I saw your name signed under the message. Now I see that it was a forwarded message

                    Comment


                    • #25
                      An update

                      Added:
                      • [ADDED]: Standalone application (Metrix.java) which generates a summary output of a selected run.
                      • [ADDED]: Support for MSSQL databases.
                      • [ADDED]: Error rate distribution
                      • [FIX]: (#METR-1): State assignment of paired end runs was fixed.
                      • [IMPROVEMENT]: Timed forced checking of inactive runs such as: paired end runs, timed out runs, network inactivity, error handling.
                      • [IMPROVEMENT]: Reworked logging to work via a wrapper.
                      • [IMPROVEMENT]: Parsers are better able to handle deprecated formats of RunInfo.xml files.
                      • [IMPROVEMENT]: Better handling of symlinked paths.
                      • [IMPROVEMENT]: Introduced a generic illumina parser class for greater flexibility.
                      • [IMPROVEMENT]: Added a property in metrix.properties to control the level of logging.
                      • [IMPROVEMENT]: Added a property to run Metrix as a daemon, only writing output to the log file.


                      Introduced soon:
                      • Live sequencing run analysis:
                        - Predicts whether a run will complete successfully or has (a) lane(s) containing errors.
                        - Predicts HiSeq / MiSeq sequencer behaviour.


                      Todo:
                      • API Integration (+SSL)
                      • METRIC: FWHM / Base intensity scores
                      • PARSING: Introduce MetricFilter for advanced searching.
                      • SERVER: Optimize storage / retrieval methods


                      As always, do let me know if you would like to see a specific feature.

                      Bernd
                      Last edited by Rhizosis; 10-01-2013, 12:47 AM.

                      Comment


                      • #26
                        Update

                        Added:
                        • [ADDED]: Preparations have been made to support automated post processing.
                        • [FIX]: Closed remaining open file handles, causing an exception.
                        • [IMPROVEMENT]: Reworked logging to output single line descriptors.


                        The github wiki will be updated soon.

                        Introduced soon:
                        • Live sequencing run analysis:
                          - Predicts whether a run will complete successfully or has (a) lane(s) containing errors.
                          - Predicts HiSeq / MiSeq sequencer behaviour.


                        Todo:
                        • METRIC: FWHM / Base intensity scores
                        • PARSING: Introduce MetricFilter for advanced searching.
                        • SERVER: Optimize storage / retrieval methods


                        As always, do let me know if you would like to see a specific feature.

                        Bernd
                        Last edited by Rhizosis; 10-24-2013, 08:01 AM.

                        Comment


                        • #27
                          Hi Bernd,

                          The server is running fine ... I can use the client or query directly the database.
                          As I am currently in the process of testing some OS LIMS for our group, I'd be interested if there is already some GNomEx integration on the way?

                          regards,
                          Sven

                          Comment


                          • #28
                            Hi Sven,

                            Good to hear. I read your message on the GnomEx discussion board. I hope you managed to get the EncryptionFactory working now.

                            Regarding integration with GNomEx. I have made a dashboard in Gnomex, which will fetch active runs (or all of them) in the form of summaries and display the actual progress of the active sequencers here.

                            My plan was to integrate all of Metrix in the reporting framework within Gnomex, however... Sadly enough my employer took me off the project and so I will no be continuing my part of feature development within gnomex.

                            I wrote Metrix outside this function so I will still be able to support you in this matter.

                            I know that one member of the team that developed GNomEx, is continuing on the integration with the dashboard into GNomEx this should be released in the near future.
                            However, one thing I do not know is in what degree Metrix will be used for reporting within GNomex in the future.

                            So, for now... you will be able to find the dashboard in the branches ('bernd' and 'integrate') here:


                            Keep in mind that these branch are not up to date with the trunk anymore.

                            If you would like to get a more detailed answer, you shall have to ask the team in Utah via the discussion forums.

                            I hope this helps.

                            Regards,
                            Bernd

                            Comment


                            • #29
                              Hi Bernd,

                              thanks for the informative answer.

                              regards,
                              Sven

                              Comment


                              • #30
                                Mavenization of Metrix.

                                Hi everybody.

                                Lately I, with the help of a fellow bioinformatician from the UK, have spent time converting the Metrix project to a Maven project for easier accessibility and deployment.

                                The master repository has been switched out with the maven branch should be fairly easy to install.

                                The README will be updated with the following:

                                Code:
                                Step 1. Clone the Metrix repository:
                                > git clone [url]https://github.com/NKI-GCF/Metrix.git[/url]
                                
                                Step 2. Install Maven for your platform
                                
                                Step 3. Go to the Metrix directory and run:
                                > mvn install
                                
                                Step 4. Insert the SQL table you find in the ./target/classes folder into your database.
                                
                                Step 5. Change the metrix.properties file located in the ./target/classes folder to reflect your environment.
                                -----------

                                >> To run Metrix as a server (MetrixDaemon) that continuously monitors and parses new run directories execute the following:

                                Code:
                                1. Go to the target/ folder.
                                2. 
                                > java -Done-jar.silent=true -Dproperties=classes/metrix.properties -jar MetrixDaemon.jar&
                                -----------
                                >> To singularly parse a Illumina run and output a summary overview execute the following:
                                Code:
                                1. Go to the target/ folder.
                                2. 
                                >java -Done-jar.silent=true -Dproperties=classes/metrix.properties -jar Metrix.jar {INSERT FLOWCELLID or RUNNAME}
                                -----------

                                In case you decided to use the MetrixDaemon the Metrix suite gives you a bit more flexibility with regards to interfacing run data with other applications or pipelines.

                                In order to prepare the MetrixDaemon for future queries from clients we need to initialize the database with as much preprocessed data as possible.
                                Each sequencing run contains a lot of information and Metrix can generate the most commonly used distributions for viewing or analysis purposes.
                                If you decide not to initialize Metrix future multi run queries (queries that retrieve runs by state) might take a long time to be processed.

                                Keep in mind that depending on how many runs the Illumina run folder contains, the initialization process might take several hours (> 500 runs).

                                To initialize the Metrix database run the following in the target directory:
                                Code:
                                > java -Done-jar.silent=true -Dproperties=classes/metrix.properties -jar MetrixInitialize.jar
                                On the MetrixDaemon (server) side you should see:
                                > [nki.parsers.metrix.CommandProcessor] : Initialization command received.
                                > [nki.io.DataStore] : Fetching all summaries.
                                ...
                                ...
                                ...

                                Once this is finished you can query Metrix using MetrixGCFdb.
                                MetrixGCFdb.jar for example is an application that contacts the configured MetrixDaemon instance (reflected in the metrix.properties file) and asks to parse and return the basic summary information of a run together with basic distributions such as >Q30%, cluster densities, intensities, error rates and index distributions.

                                You can run it using:
                                Code:
                                > java -Done-jar.silent=true -Dproperties=classes/metrix.properties -jar MetrixGCFdb.jar {INSERT (PART OF) FLOWCELLID or RUNNAME}

                                Once Metrix is initialized it is ideal for integration with any LIMS system that allows external application calls and accepts JSON or XML as input.
                                I will update this guide soon.
                                Please let me know if it works and helps you out.

                                HTH,

                                Bernd

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-27-2024, 06:37 PM
                                0 responses
                                13 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-27-2024, 06:07 PM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                53 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                69 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X