SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Transcriptome similarity metric? ucpete RNA Sequencing 6 12-05-2012 10:13 AM
Samtools Pileup Parser Graham Etherington Bioinformatics 5 08-24-2012 07:15 AM
bowtie command line for Illumina Hiseq 2000 with Illumina 1.5+ quality encoding files rworthi Illumina/Solexa 4 09-28-2011 11:25 AM
Combine multiple binary files by Plink ardmore Bioinformatics 2 08-02-2011 06:31 AM
Unified plain/gzip'ed fasta/fastq parser in C/C++ (for developers only) lh3 Bioinformatics 5 11-21-2009 03:30 PM

Reply
 
Thread Tools
Old 08-13-2012, 04:25 PM   #1
iamh2o
Junior Member
 
Location: San Francisco

Join Date: Mar 2009
Posts: 6
Default Parser For Illumina InterOp Binary Metric Files

Hi All-

I'm trying to locate a code which will extract the various helpful QC metrics locked in the Illumina binary InterOp files.

Specifically:::
ControlMetrics.bin ExtractionMetrics.bin QMetrics.bin
ControlMetricsOut.bin ExtractionMetricsOut.bin QMetricsOut.bin
CorrectedIntMetrics.bin IndexMetrics.bin TileMetrics.bin
CorrectedIntMetricsOut.bin IndexMetricsOut.bin TileMetricsOut.bin

I've taken a look at the picard libraries, but they seem to skip the metrics files.

Any help would be appreciated!

John
iamh2o is offline   Reply With Quote
Old 09-03-2012, 06:23 AM   #2
Rhizosis
Member
 
Location: Amsterdam

Join Date: Mar 2012
Posts: 41
Default

Hi John,

I am currently working on this. Have you booked any progress in the meanwhile?
What language are you writing the parser in?

Regards,

Bernd
Rhizosis is offline   Reply With Quote
Old 12-20-2012, 06:23 AM   #3
SeqOps
Junior Member
 
Location: Huntsville, AL

Join Date: Dec 2012
Posts: 2
Default Interop Parser

Either of you have any luck with this? Looking for the same thing myself.
SeqOps is offline   Reply With Quote
Old 12-20-2012, 06:28 AM   #4
Rhizosis
Member
 
Location: Amsterdam

Join Date: Mar 2012
Posts: 41
Default

I have managed to build a fully functional server client model Illumina metrics parser which interprets data written by HiSeq's and MiSeq's in real time.

However, I dont know yet if I am allowed to release it to the public. Need to talk to my supervisor about this first.

Most likely it will be integrated with the GNomEx LIMS system in Q1 2013.
Let me know if you need pointers.
Rhizosis is offline   Reply With Quote
Old 12-20-2012, 07:51 PM   #5
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,355
Default

I am also interested in this. It blows my mind that ILMN (or someone) hasn't released this code yet.
ECO is offline   Reply With Quote
Old 01-31-2013, 03:36 AM   #6
Rhizosis
Member
 
Location: Amsterdam

Join Date: Mar 2012
Posts: 41
Default

I have talked to my supervisor and i've been given the green light to release this software under the GPLv3 license. I need to put it through some beta tests first, but it shouldn't be too long.

In about a week or two I will create a separate thread for this in the bioinformatics section for feedback, support and bug reports.
Rhizosis is offline   Reply With Quote
Old 02-07-2013, 11:58 AM   #7
behavin
Member
 
Location: TN

Join Date: Oct 2011
Posts: 11
Default

This would be quite helpful for us as well, I'd love to be able to parse all the metrics without resorting to SAV.
behavin is offline   Reply With Quote
Old 02-11-2013, 04:39 AM   #8
Rhizosis
Member
 
Location: Amsterdam

Join Date: Mar 2012
Posts: 41
Default

Hi everybody.

As promised. The initial release of Metrix.
For feedback, idea's, bug reports and future communication please see the Metrix thread.

I hope this helps.

Bernd
Rhizosis is offline   Reply With Quote
Old 02-25-2013, 01:25 PM   #9
iamh2o
Junior Member
 
Location: San Francisco

Join Date: Mar 2009
Posts: 6
Default

Thanks Bernd-

I'll reply with our experience, and contribute back to the project where I can.

j
iamh2o is offline   Reply With Quote
Old 02-27-2013, 02:00 PM   #10
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,355
Default

I was so annoyed with not being able to do get cluster density easily (Illumina, if you're reading this, you're retarded ), that I wrote the below to get it:

Code:
class TileMetrics:
    def __init__(self,filename):
        self.f = filename 
        import pandas as pd
        from bitstring import BitString
               
        a = BitString(bytes=open(self.f, 'rb').read())
        self.filever = a.read('uintle:8')  # version number == "2"
        self.recordlen = a.read('uintle:8')  # length of each record == 10 (for TileMetrics)
        a.pos = 16  # skip the above bytes which are invariant for this

        #setup data
        self.data = {'lane' : [], 'tile' : [], 'met' : [], 'value' : []}
        
        #read records bytewise per specs in technote_rta_theory_operations.pdf from ILMN
        for i in range(0,((a.len - 16) / (self.recordlen * 8 ))):  # 80 == record length in bits
            self.data['lane'].append(a.read('uintle:16'))  #lane number
            self.data['tile'].append(a.read('uintle:16'))  #tile number
            self.data['met'].append(a.read('uintle:16'))  #metric code
            self. data['value'].append(a.read('floatle:32')) #metric value
        
        #make it fuzzy
        self.df = pd.DataFrame(self.data)
        
        #get some stuff
        self.cdens = self.df[self.df.met == 100].reset_index(drop = 1)
        self.pfcdens = self.df[self.df.met == 101].reset_index(drop = 1)
        

if __name__ == '__main__':
    tm = TileMetrics('TileMetricsOut.bin')
    print '###############'    
    print 'filename %s' % tm.f
    print '###############'
    print 'average clusterdensity == %.2f' % tm.cdens.value.mean()  
    print 'average perc pf clusters == %.2f' %  (100 * tm.pfcdens.value.mean() / tm.cdens.value.mean())
Easily extensible to other metrics files, I'm working on it (slowly) Metrix looks awesome but is beyond my skills to efficiently implement.
ECO is offline   Reply With Quote
Old 04-29-2013, 06:24 AM   #11
earonesty
Member
 
Location: United States of America

Join Date: Mar 2011
Posts: 52
Default

Module Bio::IlluminaSAV at CPAN is available for these files. Version 1.0, release 7013 works well (prev release was missing files). You can install using:

sudo cpanm -i Bio::IlluminaSAV

Last edited by earonesty; 05-23-2013 at 05:23 PM. Reason: Add install info
earonesty is offline   Reply With Quote
Old 05-24-2013, 08:50 AM   #12
mchen1
Member
 
Location: San Diego, CA

Join Date: May 2013
Posts: 10
Default InterOp parsers in R and perl

Hi,

I'm from Illumina and over the past few months, I've built a package in R and some scripts in perl to accurately parse the binary InterOp files.

These are unsupported, which means tech support will not be able to help you with them, but we've tested these internally and I've received approval from my manager to share them with those that ask. So please PM me if these scripts would be useful for you.

Cheers,
mchen1
mchen1 is offline   Reply With Quote
Old 06-04-2013, 09:54 AM   #13
earonesty
Member
 
Location: United States of America

Join Date: Mar 2011
Posts: 52
Default

Quote:
Originally Posted by earonesty View Post
Module Bio::IlluminaSAV at CPAN is available for these files. Version 1.0, release 7013 works well (prev release was missing files). You can install using:

sudo cpanm -i Bio::IlluminaSAV
Lastest version passes all the relevant CPAN test reports for MiSEQ and GAII data. the failed reports are on test systems that can't handle the volume of data. HiSeq passes as well, but is too large to upload to CPAN at all.
earonesty is offline   Reply With Quote
Old 11-12-2013, 11:37 PM   #14
ophir
Junior Member
 
Location: Israel

Join Date: Nov 2013
Posts: 1
Default

A new tool for the job is illumate - https://bitbucket.org/invitae/illuminate
Installation is pretty straightforward, was able to get it up and running in a few minutes.
ophir is offline   Reply With Quote
Old 02-09-2016, 07:41 AM   #15
ploverso-pgdx
Junior Member
 
Location: Baltimore

Join Date: Feb 2016
Posts: 3
Default

Illuminate is not up-to-date, it only supports up through v5 of interOp files. There is a package called savR in Bioconductor which supports up through v6, for RTA versions 2.7 and up (for the HiSeq 4k, etc).

https://www.bioconductor.org/package...html/savR.html
ploverso-pgdx is offline   Reply With Quote
Old 09-05-2016, 10:52 PM   #16
alito
Junior Member
 
Location: Melbourne

Join Date: May 2015
Posts: 1
Default

The code in savR doesn't seem to support version 6 from what I can see (https://github.com/Bioconductor-mirr...savR-methods.R) although it seems to support v5.

FWIW, here's the format for V6 (very similar to v5 as described in https://tracker.tgac.ac.uk/browse/MISO-138):


byte 0: file version number (6)
byte 1: length of each record
byte 2: quality score binning (byte flag representing if binning was on), if (byte 2 == 1) // quality score binning on
byte 3: number of quality score bins, B
bytes 4 - (4+B-1): lower boundary of quality score bins
bytes (4+B) - (4+2*B-1): upper boundary of quality score bins
bytes (4+2*B) - (4+3*B-1): remapped scores of quality score bins (QRemapped)
The remaining bytes are for the records, with each record in this format:
2 bytes: lane number (uint16)
2 bytes: tile number (uint16)
2 bytes: cycle number (uint16)
4 x B bytes: number of clusters assigned score (uint32) (QRemapped)


For the MiniSeq, B = 7.
alito is offline   Reply With Quote
Old 09-06-2016, 01:27 AM   #17
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,479
Default

Here's the original github repository that people can make PRs against and issues for.
dpryan is offline   Reply With Quote
Old 09-06-2016, 05:40 AM   #18
ploverso-pgdx
Junior Member
 
Location: Baltimore

Join Date: Feb 2016
Posts: 3
Default

Quote:
Originally Posted by alito View Post
The code in savR doesn't seem to support version 6 from what I can see (https://github.com/Bioconductor-mirr...savR-methods.R) although it seems to support v5.

FWIW, here's the format for V6 (very similar to v5 as described in https://tracker.tgac.ac.uk/browse/MISO-138):


byte 0: file version number (6)
byte 1: length of each record
byte 2: quality score binning (byte flag representing if binning was on), if (byte 2 == 1) // quality score binning on
byte 3: number of quality score bins, B
bytes 4 - (4+B-1): lower boundary of quality score bins
bytes (4+B) - (4+2*B-1): upper boundary of quality score bins
bytes (4+2*B) - (4+3*B-1): remapped scores of quality score bins (QRemapped)
The remaining bytes are for the records, with each record in this format:
2 bytes: lane number (uint16)
2 bytes: tile number (uint16)
2 bytes: cycle number (uint16)
4 x B bytes: number of clusters assigned score (uint32) (QRemapped)


For the MiniSeq, B = 7.
SavR definitely supports version 6- I'm the one that wrote the code to support it. Here's the commit that added support: https://github.com/bcalder/savR/comm...506336250cec1c
ploverso-pgdx is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 12:32 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO