Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • iamh2o
    Junior Member
    • Mar 2009
    • 6

    Parser For Illumina InterOp Binary Metric Files

    Hi All-

    I'm trying to locate a code which will extract the various helpful QC metrics locked in the Illumina binary InterOp files.

    Specifically:::
    ControlMetrics.bin ExtractionMetrics.bin QMetrics.bin
    ControlMetricsOut.bin ExtractionMetricsOut.bin QMetricsOut.bin
    CorrectedIntMetrics.bin IndexMetrics.bin TileMetrics.bin
    CorrectedIntMetricsOut.bin IndexMetricsOut.bin TileMetricsOut.bin

    I've taken a look at the picard libraries, but they seem to skip the metrics files.

    Any help would be appreciated!

    John
  • Rhizosis
    Member
    • Mar 2012
    • 41

    #2
    Hi John,

    I am currently working on this. Have you booked any progress in the meanwhile?
    What language are you writing the parser in?

    Regards,

    Bernd

    Comment

    • SeqOps
      Junior Member
      • Dec 2012
      • 2

      #3
      Interop Parser

      Either of you have any luck with this? Looking for the same thing myself.

      Comment

      • Rhizosis
        Member
        • Mar 2012
        • 41

        #4
        I have managed to build a fully functional server client model Illumina metrics parser which interprets data written by HiSeq's and MiSeq's in real time.

        However, I dont know yet if I am allowed to release it to the public. Need to talk to my supervisor about this first.

        Most likely it will be integrated with the GNomEx LIMS system in Q1 2013.
        Let me know if you need pointers.

        Comment

        • ECO
          --Site Admin--
          • Oct 2007
          • 1360

          #5
          I am also interested in this. It blows my mind that ILMN (or someone) hasn't released this code yet.

          Comment

          • Rhizosis
            Member
            • Mar 2012
            • 41

            #6
            I have talked to my supervisor and i've been given the green light to release this software under the GPLv3 license. I need to put it through some beta tests first, but it shouldn't be too long.

            In about a week or two I will create a separate thread for this in the bioinformatics section for feedback, support and bug reports.

            Comment

            • behavin
              Member
              • Oct 2011
              • 11

              #7
              This would be quite helpful for us as well, I'd love to be able to parse all the metrics without resorting to SAV.

              Comment

              • Rhizosis
                Member
                • Mar 2012
                • 41

                #8
                Hi everybody.

                As promised. The initial release of Metrix.
                For feedback, idea's, bug reports and future communication please see the Metrix thread.

                I hope this helps.

                Bernd

                Comment

                • iamh2o
                  Junior Member
                  • Mar 2009
                  • 6

                  #9
                  Thanks Bernd-

                  I'll reply with our experience, and contribute back to the project where I can.

                  j

                  Comment

                  • ECO
                    --Site Admin--
                    • Oct 2007
                    • 1360

                    #10
                    I was so annoyed with not being able to do get cluster density easily (Illumina, if you're reading this, you're retarded ), that I wrote the below to get it:

                    Code:
                    class TileMetrics:
                        def __init__(self,filename):
                            self.f = filename 
                            import pandas as pd
                            from bitstring import BitString
                                   
                            a = BitString(bytes=open(self.f, 'rb').read())
                            self.filever = a.read('uintle:8')  # version number == "2"
                            self.recordlen = a.read('uintle:8')  # length of each record == 10 (for TileMetrics)
                            a.pos = 16  # skip the above bytes which are invariant for this
                    
                            #setup data
                            self.data = {'lane' : [], 'tile' : [], 'met' : [], 'value' : []}
                            
                            #read records bytewise per specs in technote_rta_theory_operations.pdf from ILMN
                            for i in range(0,((a.len - 16) / (self.recordlen * 8 ))):  # 80 == record length in bits
                                self.data['lane'].append(a.read('uintle:16'))  #lane number
                                self.data['tile'].append(a.read('uintle:16'))  #tile number
                                self.data['met'].append(a.read('uintle:16'))  #metric code
                                self. data['value'].append(a.read('floatle:32')) #metric value
                            
                            #make it fuzzy
                            self.df = pd.DataFrame(self.data)
                            
                            #get some stuff
                            self.cdens = self.df[self.df.met == 100].reset_index(drop = 1)
                            self.pfcdens = self.df[self.df.met == 101].reset_index(drop = 1)
                            
                    
                    if __name__ == '__main__':
                        tm = TileMetrics('TileMetricsOut.bin')
                        print '###############'    
                        print 'filename %s' % tm.f
                        print '###############'
                        print 'average clusterdensity == %.2f' % tm.cdens.value.mean()  
                        print 'average perc pf clusters == %.2f' %  (100 * tm.pfcdens.value.mean() / tm.cdens.value.mean())
                    Easily extensible to other metrics files, I'm working on it (slowly) Metrix looks awesome but is beyond my skills to efficiently implement.

                    Comment

                    • earonesty
                      Member
                      • Mar 2011
                      • 52

                      #11
                      Module Bio::IlluminaSAV at CPAN is available for these files. Version 1.0, release 7013 works well (prev release was missing files). You can install using:

                      sudo cpanm -i Bio::IlluminaSAV
                      Last edited by earonesty; 05-23-2013, 05:23 PM. Reason: Add install info

                      Comment

                      • mchen1
                        Member
                        • May 2013
                        • 10

                        #12
                        InterOp parsers in R and perl

                        Hi,

                        I'm from Illumina and over the past few months, I've built a package in R and some scripts in perl to accurately parse the binary InterOp files.

                        These are unsupported, which means tech support will not be able to help you with them, but we've tested these internally and I've received approval from my manager to share them with those that ask. So please PM me if these scripts would be useful for you.

                        Cheers,
                        mchen1

                        Comment

                        • earonesty
                          Member
                          • Mar 2011
                          • 52

                          #13
                          Originally posted by earonesty View Post
                          Module Bio::IlluminaSAV at CPAN is available for these files. Version 1.0, release 7013 works well (prev release was missing files). You can install using:

                          sudo cpanm -i Bio::IlluminaSAV
                          Lastest version passes all the relevant CPAN test reports for MiSEQ and GAII data. the failed reports are on test systems that can't handle the volume of data. HiSeq passes as well, but is too large to upload to CPAN at all.

                          Comment

                          • ophir
                            Junior Member
                            • Nov 2013
                            • 1

                            #14
                            A new tool for the job is illumate - https://bitbucket.org/invitae/illuminate
                            Installation is pretty straightforward, was able to get it up and running in a few minutes.

                            Comment

                            • ploverso-pgdx
                              Junior Member
                              • Feb 2016
                              • 3

                              #15
                              Illuminate is not up-to-date, it only supports up through v5 of interOp files. There is a package called savR in Bioconductor which supports up through v6, for RTA versions 2.7 and up (for the HiSeq 4k, etc).

                              The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-26-2026, 10:12 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...