SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
bcl2fastq2 install issue garchon Bioinformatics 11 01-10-2017 04:04 AM
NCBI/GenBank BLAST Output XML Parser Tool cement_head Bioinformatics 4 08-20-2013 02:09 AM
parser for sam? bioinfo308 Bioinformatics 1 10-08-2012 05:59 AM
Exon Match/ Genome parser danjg Bioinformatics 2 09-13-2012 08:08 PM
Samtools Pileup Parser Graham Etherington Bioinformatics 5 08-24-2012 07:15 AM

Reply
 
Thread Tools
Old 03-05-2018, 06:53 AM   #1
lac302
Member
 
Location: DE

Join Date: Dec 2012
Posts: 64
Default parser needed for bcl2fastq2 output

I'm looking for a parser to compile basic stats from the various html/xml/text output files from NextSeq runs. Looking for a text file to archive with fastq files that includes sample name, sample ID, indexes, total clusters per sample not per lane since NextSeq lanes are not independent. Thanks.
lac302 is offline   Reply With Quote
Old 03-05-2018, 08:14 AM   #2
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,764
Default

Take a look at MultiQC if you want to look at many runs.

Following is true for local bcl2fastq analysis. I am not familiar with BaseSpace but I suppose a similar structure can be found there as well.

Otherwise the index.html file found in (FCID/Unaligned/Reports/html) directory has all the stats or if you prefer a standalone file then (FCID/Unaligned/Reports/html/FC_BARCODE/all/all/all/laneBarcode.html).

JSON format results are in (FCID/Unaligned/Stats/Json.stats).

Last edited by GenoMax; 03-05-2018 at 10:22 AM.
GenoMax is offline   Reply With Quote
Old 03-05-2018, 09:54 AM   #3
lac302
Member
 
Location: DE

Join Date: Dec 2012
Posts: 64
Default

Thanks GenoMax. We are running our NextSeq standalone.

The html file displays the metrics per lane per sample...I'd like to have the information for each sample combined into one row, e.g. Lane 1 Sample A, Lane 2 Sample A, Lane 3 Sample A, Lane 4 Sample A -> Sample A.

If you know of an html to csv or json to csv script I could handle this in Excel.
lac302 is offline   Reply With Quote
Old 03-05-2018, 11:45 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,764
Default

While there appear to be many (?) online tools following may be safer.

1. Download Atom programmers editor here.
2. Find "Settings" tab after installing Atom and click on +Install.
3. Search for a package called "json-converter" and install it.
4. Download and open Stats.json file in Atom.
5. Use Packages menu drop down, find "Json Converter" and select "Json to csv".
6. Write the converted data out to file and then do what you need to in Excel.
GenoMax is offline   Reply With Quote
Old 03-05-2018, 12:29 PM   #5
moatman
Junior Member
 
Location: Maryland

Join Date: Dec 2016
Posts: 5
Default

As a side-note, when running bcl2fastq you can use the option "--no-lane-splitting" to create fastqs that don't separate your samples by lane. This is useful to avoid having to combine them downstream.
moatman is offline   Reply With Quote
Old 03-06-2018, 07:15 AM   #6
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,147
Default

Quote:
Originally Posted by lac302 View Post
I'm looking for a parser to compile basic stats from the various html/xml/text output files from NextSeq runs. Looking for a text file to archive with fastq files that includes sample name, sample ID, indexes, total clusters per sample not per lane since NextSeq lanes are not independent. Thanks.
Hello lac302,

I have attached a perl script I use for this purpose, however when dealing with NextSeq run data the stats are still divided by lane. The script reads the DemultiplexingStats.xml and ConversionStats.xml files within the Stats/ directory created by bcl2fastq2. It requires Perl Modules Getopt::Long and XML::LibXML. It has one mandatory input, the path to the Stats/ directory and one optional argument for parsing data from single end runs.

Code:
# parseBcl2FastqStatsXml.pl [-s] -i <path>/Stats/ > output.txt

The argument for -i must end in /Stats/
-s is optional for single read runs
The output is a tab delimited text file with relevant per sample, per lane stats.
Attached Files
File Type: pl parseBcl2Fastq2StatsXml.pl (3.9 KB, 2 views)

Last edited by kmcarr; 03-06-2018 at 08:06 AM.
kmcarr is offline   Reply With Quote
Old 03-06-2018, 07:47 AM   #7
lac302
Member
 
Location: DE

Join Date: Dec 2012
Posts: 64
Default

Guess what?...You can copy and paste from html to excel.

I was making it harder than it needed to be.
lac302 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:23 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO