Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parser For Illumina InterOp Binary Metric Files

    Hi All-

    I'm trying to locate a code which will extract the various helpful QC metrics locked in the Illumina binary InterOp files.

    Specifically:::
    ControlMetrics.bin ExtractionMetrics.bin QMetrics.bin
    ControlMetricsOut.bin ExtractionMetricsOut.bin QMetricsOut.bin
    CorrectedIntMetrics.bin IndexMetrics.bin TileMetrics.bin
    CorrectedIntMetricsOut.bin IndexMetricsOut.bin TileMetricsOut.bin

    I've taken a look at the picard libraries, but they seem to skip the metrics files.

    Any help would be appreciated!

    John

  • #2
    Hi John,

    I am currently working on this. Have you booked any progress in the meanwhile?
    What language are you writing the parser in?

    Regards,

    Bernd

    Comment


    • #3
      Interop Parser

      Either of you have any luck with this? Looking for the same thing myself.

      Comment


      • #4
        I have managed to build a fully functional server client model Illumina metrics parser which interprets data written by HiSeq's and MiSeq's in real time.

        However, I dont know yet if I am allowed to release it to the public. Need to talk to my supervisor about this first.

        Most likely it will be integrated with the GNomEx LIMS system in Q1 2013.
        Let me know if you need pointers.

        Comment


        • #5
          I am also interested in this. It blows my mind that ILMN (or someone) hasn't released this code yet.

          Comment


          • #6
            I have talked to my supervisor and i've been given the green light to release this software under the GPLv3 license. I need to put it through some beta tests first, but it shouldn't be too long.

            In about a week or two I will create a separate thread for this in the bioinformatics section for feedback, support and bug reports.

            Comment


            • #7
              This would be quite helpful for us as well, I'd love to be able to parse all the metrics without resorting to SAV.

              Comment


              • #8
                Hi everybody.

                As promised. The initial release of Metrix.
                For feedback, idea's, bug reports and future communication please see the Metrix thread.

                I hope this helps.

                Bernd

                Comment


                • #9
                  Thanks Bernd-

                  I'll reply with our experience, and contribute back to the project where I can.

                  j

                  Comment


                  • #10
                    I was so annoyed with not being able to do get cluster density easily (Illumina, if you're reading this, you're retarded ), that I wrote the below to get it:

                    Code:
                    class TileMetrics:
                        def __init__(self,filename):
                            self.f = filename 
                            import pandas as pd
                            from bitstring import BitString
                                   
                            a = BitString(bytes=open(self.f, 'rb').read())
                            self.filever = a.read('uintle:8')  # version number == "2"
                            self.recordlen = a.read('uintle:8')  # length of each record == 10 (for TileMetrics)
                            a.pos = 16  # skip the above bytes which are invariant for this
                    
                            #setup data
                            self.data = {'lane' : [], 'tile' : [], 'met' : [], 'value' : []}
                            
                            #read records bytewise per specs in technote_rta_theory_operations.pdf from ILMN
                            for i in range(0,((a.len - 16) / (self.recordlen * 8 ))):  # 80 == record length in bits
                                self.data['lane'].append(a.read('uintle:16'))  #lane number
                                self.data['tile'].append(a.read('uintle:16'))  #tile number
                                self.data['met'].append(a.read('uintle:16'))  #metric code
                                self. data['value'].append(a.read('floatle:32')) #metric value
                            
                            #make it fuzzy
                            self.df = pd.DataFrame(self.data)
                            
                            #get some stuff
                            self.cdens = self.df[self.df.met == 100].reset_index(drop = 1)
                            self.pfcdens = self.df[self.df.met == 101].reset_index(drop = 1)
                            
                    
                    if __name__ == '__main__':
                        tm = TileMetrics('TileMetricsOut.bin')
                        print '###############'    
                        print 'filename %s' % tm.f
                        print '###############'
                        print 'average clusterdensity == %.2f' % tm.cdens.value.mean()  
                        print 'average perc pf clusters == %.2f' %  (100 * tm.pfcdens.value.mean() / tm.cdens.value.mean())
                    Easily extensible to other metrics files, I'm working on it (slowly) Metrix looks awesome but is beyond my skills to efficiently implement.

                    Comment


                    • #11
                      Module Bio::IlluminaSAV at CPAN is available for these files. Version 1.0, release 7013 works well (prev release was missing files). You can install using:

                      sudo cpanm -i Bio::IlluminaSAV
                      Last edited by earonesty; 05-23-2013, 05:23 PM. Reason: Add install info

                      Comment


                      • #12
                        InterOp parsers in R and perl

                        Hi,

                        I'm from Illumina and over the past few months, I've built a package in R and some scripts in perl to accurately parse the binary InterOp files.

                        These are unsupported, which means tech support will not be able to help you with them, but we've tested these internally and I've received approval from my manager to share them with those that ask. So please PM me if these scripts would be useful for you.

                        Cheers,
                        mchen1

                        Comment


                        • #13
                          Originally posted by earonesty View Post
                          Module Bio::IlluminaSAV at CPAN is available for these files. Version 1.0, release 7013 works well (prev release was missing files). You can install using:

                          sudo cpanm -i Bio::IlluminaSAV
                          Lastest version passes all the relevant CPAN test reports for MiSEQ and GAII data. the failed reports are on test systems that can't handle the volume of data. HiSeq passes as well, but is too large to upload to CPAN at all.

                          Comment


                          • #14
                            A new tool for the job is illumate - https://bitbucket.org/invitae/illuminate
                            Installation is pretty straightforward, was able to get it up and running in a few minutes.

                            Comment


                            • #15
                              Illuminate is not up-to-date, it only supports up through v5 of interOp files. There is a package called savR in Bioconductor which supports up through v6, for RTA versions 2.7 and up (for the HiSeq 4k, etc).

                              The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              48 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X