![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Transcriptome similarity metric? | ucpete | RNA Sequencing | 6 | 12-05-2012 11:13 AM |
Samtools Pileup Parser | Graham Etherington | Bioinformatics | 5 | 08-24-2012 08:15 AM |
bowtie command line for Illumina Hiseq 2000 with Illumina 1.5+ quality encoding files | rworthi | Illumina/Solexa | 4 | 09-28-2011 12:25 PM |
Combine multiple binary files by Plink | ardmore | Bioinformatics | 2 | 08-02-2011 07:31 AM |
Unified plain/gzip'ed fasta/fastq parser in C/C++ (for developers only) | lh3 | Bioinformatics | 5 | 11-21-2009 04:30 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: San Francisco Join Date: Mar 2009
Posts: 6
|
![]()
Hi All-
I'm trying to locate a code which will extract the various helpful QC metrics locked in the Illumina binary InterOp files. Specifically::: ControlMetrics.bin ExtractionMetrics.bin QMetrics.bin ControlMetricsOut.bin ExtractionMetricsOut.bin QMetricsOut.bin CorrectedIntMetrics.bin IndexMetrics.bin TileMetrics.bin CorrectedIntMetricsOut.bin IndexMetricsOut.bin TileMetricsOut.bin I've taken a look at the picard libraries, but they seem to skip the metrics files. Any help would be appreciated! John |
![]() |
![]() |
![]() |
#2 |
Member
Location: Amsterdam Join Date: Mar 2012
Posts: 41
|
![]()
Hi John,
I am currently working on this. Have you booked any progress in the meanwhile? What language are you writing the parser in? Regards, Bernd |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: Huntsville, AL Join Date: Dec 2012
Posts: 2
|
![]()
Either of you have any luck with this? Looking for the same thing myself.
|
![]() |
![]() |
![]() |
#4 |
Member
Location: Amsterdam Join Date: Mar 2012
Posts: 41
|
![]()
I have managed to build a fully functional server client model Illumina metrics parser which interprets data written by HiSeq's and MiSeq's in real time.
However, I dont know yet if I am allowed to release it to the public. Need to talk to my supervisor about this first. Most likely it will be integrated with the GNomEx LIMS system in Q1 2013. Let me know if you need pointers. |
![]() |
![]() |
![]() |
#5 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
I am also interested in this. It blows my mind that ILMN (or someone) hasn't released this code yet.
|
![]() |
![]() |
![]() |
#6 |
Member
Location: Amsterdam Join Date: Mar 2012
Posts: 41
|
![]()
I have talked to my supervisor and i've been given the green light to release this software under the GPLv3 license. I need to put it through some beta tests first, but it shouldn't be too long.
In about a week or two I will create a separate thread for this in the bioinformatics section for feedback, support and bug reports. |
![]() |
![]() |
![]() |
#7 |
Member
Location: TN Join Date: Oct 2011
Posts: 11
|
![]()
This would be quite helpful for us as well, I'd love to be able to parse all the metrics without resorting to SAV.
|
![]() |
![]() |
![]() |
#9 |
Junior Member
Location: San Francisco Join Date: Mar 2009
Posts: 6
|
![]()
Thanks Bernd-
I'll reply with our experience, and contribute back to the project where I can. j |
![]() |
![]() |
![]() |
#10 |
--Site Admin--
Location: SF Bay Area, CA, USA Join Date: Oct 2007
Posts: 1,358
|
![]()
I was so annoyed with not being able to do get cluster density easily (Illumina, if you're reading this, you're retarded
![]() ![]() ![]() Code:
class TileMetrics: def __init__(self,filename): self.f = filename import pandas as pd from bitstring import BitString a = BitString(bytes=open(self.f, 'rb').read()) self.filever = a.read('uintle:8') # version number == "2" self.recordlen = a.read('uintle:8') # length of each record == 10 (for TileMetrics) a.pos = 16 # skip the above bytes which are invariant for this #setup data self.data = {'lane' : [], 'tile' : [], 'met' : [], 'value' : []} #read records bytewise per specs in technote_rta_theory_operations.pdf from ILMN for i in range(0,((a.len - 16) / (self.recordlen * 8 ))): # 80 == record length in bits self.data['lane'].append(a.read('uintle:16')) #lane number self.data['tile'].append(a.read('uintle:16')) #tile number self.data['met'].append(a.read('uintle:16')) #metric code self. data['value'].append(a.read('floatle:32')) #metric value #make it fuzzy self.df = pd.DataFrame(self.data) #get some stuff self.cdens = self.df[self.df.met == 100].reset_index(drop = 1) self.pfcdens = self.df[self.df.met == 101].reset_index(drop = 1) if __name__ == '__main__': tm = TileMetrics('TileMetricsOut.bin') print '###############' print 'filename %s' % tm.f print '###############' print 'average clusterdensity == %.2f' % tm.cdens.value.mean() print 'average perc pf clusters == %.2f' % (100 * tm.pfcdens.value.mean() / tm.cdens.value.mean()) ![]() |
![]() |
![]() |
![]() |
#11 |
Member
Location: United States of America Join Date: Mar 2011
Posts: 52
|
![]()
Module Bio::IlluminaSAV at CPAN is available for these files. Version 1.0, release 7013 works well (prev release was missing files). You can install using:
sudo cpanm -i Bio::IlluminaSAV Last edited by earonesty; 05-23-2013 at 06:23 PM. Reason: Add install info |
![]() |
![]() |
![]() |
#12 |
Member
Location: San Diego, CA Join Date: May 2013
Posts: 10
|
![]()
Hi,
I'm from Illumina and over the past few months, I've built a package in R and some scripts in perl to accurately parse the binary InterOp files. These are unsupported, which means tech support will not be able to help you with them, but we've tested these internally and I've received approval from my manager to share them with those that ask. So please PM me if these scripts would be useful for you. Cheers, mchen1 |
![]() |
![]() |
![]() |
#13 |
Member
Location: United States of America Join Date: Mar 2011
Posts: 52
|
![]()
Lastest version passes all the relevant CPAN test reports for MiSEQ and GAII data. the failed reports are on test systems that can't handle the volume of data. HiSeq passes as well, but is too large to upload to CPAN at all.
|
![]() |
![]() |
![]() |
#14 |
Junior Member
Location: Israel Join Date: Nov 2013
Posts: 1
|
![]()
A new tool for the job is illumate - https://bitbucket.org/invitae/illuminate
Installation is pretty straightforward, was able to get it up and running in a few minutes. |
![]() |
![]() |
![]() |
#15 |
Junior Member
Location: Baltimore Join Date: Feb 2016
Posts: 3
|
![]()
Illuminate is not up-to-date, it only supports up through v5 of interOp files. There is a package called savR in Bioconductor which supports up through v6, for RTA versions 2.7 and up (for the HiSeq 4k, etc).
https://www.bioconductor.org/package...html/savR.html |
![]() |
![]() |
![]() |
#16 |
Junior Member
Location: Melbourne Join Date: May 2015
Posts: 1
|
![]()
The code in savR doesn't seem to support version 6 from what I can see (https://github.com/Bioconductor-mirr...savR-methods.R) although it seems to support v5.
FWIW, here's the format for V6 (very similar to v5 as described in https://tracker.tgac.ac.uk/browse/MISO-138): byte 0: file version number (6) byte 1: length of each record byte 2: quality score binning (byte flag representing if binning was on), if (byte 2 == 1) // quality score binning on byte 3: number of quality score bins, B bytes 4 - (4+B-1): lower boundary of quality score bins bytes (4+B) - (4+2*B-1): upper boundary of quality score bins bytes (4+2*B) - (4+3*B-1): remapped scores of quality score bins (QRemapped) The remaining bytes are for the records, with each record in this format: 2 bytes: lane number (uint16) 2 bytes: tile number (uint16) 2 bytes: cycle number (uint16) 4 x B bytes: number of clusters assigned score (uint32) (QRemapped) For the MiniSeq, B = 7. |
![]() |
![]() |
![]() |
#17 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
Here's the original github repository that people can make PRs against and issues for.
|
![]() |
![]() |
![]() |
#18 | |
Junior Member
Location: Baltimore Join Date: Feb 2016
Posts: 3
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|