SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to create an output file using perl vineetha Bioinformatics 3 07-01-2015 02:51 PM
HTS microrna analysis : a brief summary NicoBxl Bioinformatics 15 06-10-2015 02:46 AM
how to create output directory folder on velvet suhalsuhaimi Bioinformatics 5 12-28-2014 04:04 AM
How to create count table for SOAP output? ahmadsam Bioinformatics 1 08-26-2014 08:51 AM
graphical summary of blast output papori Bioinformatics 0 10-01-2013 07:58 AM

Reply
 
Thread Tools
Old 11-08-2015, 06:41 AM   #1
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 26
Default MultiQC - a new tool to create summary reports from any analysis output

Hi everyone,

I've recently released a new tool called MultiQC - it's a simple and quick command line tool that you point at a directory containing output from your analysis. It runs through all of the files and finds output that it recognises, building a single HTML report to summarise this information.

I wrote it because of the difficulty we have when generating hundreds of FastQC reports - it's impossible to go through all of these, so MultiQC makes one report with all of the samples plotted together in one place. This makes it very easy to spot any outliers and see trends.

This approach is generalised in MultiQC, with 12 other programs currently supported (see below). Output from any of these modules will be combined into a single report, meaning that you can follow the progress of each sample through your analysis pipeline.

You can find MultiQC here:

If you have Python & pip on your system, you can install MultiQC as follows:

Code:
# install MultiQC
pip install multiqc
# run MultiQC
multiqc <analysis_dir>
See the documentation for more in depth instructions.


You can see a mini-poster that I made about the tool below. Any questions / requests / feedback welcome!

Phil



Currently supported tools:
  1. FastQC
  2. FastQ Screen
  3. Cutadapt
  4. Bismark
  5. STAR
  6. Tophat
  7. Bowtie
  8. Bowtie 2
  9. Subread featureCounts
  10. Picard MarkDuplicates
  11. Preseq
  12. Qualimap
Attached Files
File Type: pdf MultiQC_poster.pdf (3.07 MB, 18 views)
ewels is offline   Reply With Quote
Old 11-08-2015, 06:14 PM   #2
N311V
Member
 
Location: Australia

Join Date: Jul 2013
Posts: 34
Default

Looks great! I have a heap of fastQC and Tophat files I can test it out on.
N311V is offline   Reply With Quote
Old 11-09-2015, 12:11 AM   #3
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 26
Default

Great - let me know how you get on!

Note that fkrueger redefined what I thought "a lot of reports" meant last night by running MultiQC on over 3000 samples in one go. Note that the HMTL report will probably fall over if you use sample numbers in this order of magnitude, but MultiQC also creates tab-delimited text files which you can open in Excel / use downstream.
ewels is offline   Reply With Quote
Old 11-09-2015, 06:44 AM   #4
NeilPearson
Junior Member
 
Location: UK

Join Date: Feb 2014
Posts: 1
Default

Tried it out earlier. Very impressed!
NeilPearson is offline   Reply With Quote
Old 11-09-2015, 12:08 PM   #5
N311V
Member
 
Location: Australia

Join Date: Jul 2013
Posts: 34
Default

Quote:
Originally Posted by tallphil View Post
Great - let me know how you get on!

Note that fkrueger redefined what I thought "a lot of reports" meant last night by running MultiQC on over 3000 samples in one go.
That redefines what I mean by a lot also, I was talking about 142 samples! I'll hopefully get a chance to run it today.
N311V is offline   Reply With Quote
Old 11-12-2015, 03:30 AM   #6
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 26
Default

Quote:
Originally Posted by NeilPearson View Post
Tried it out earlier. Very impressed!
Great, glad you liked it!

N311V - 142 samples is still quite a bit (it will probably not render all of the plots by default to prevent locking up the browser), but it should work fine I think

Note that I'm hoping to make a new template at some point in the future that has flat pre-rendered plots instead of interactive JavaScript plots. This should mean that reports with huge numbers of samples will work and don't have huge filesizes. See the GitHub issue about this here.
ewels is offline   Reply With Quote
Old 12-08-2015, 07:33 PM   #7
N311V
Member
 
Location: Australia

Join Date: Jul 2013
Posts: 34
Default

Hi ewels,

I've finally got time to run MultiQC but having trouble providing a list of directories to search. Do I need top put my fastqc and tophat results all in the same directory to produce a single report for both?

P.S. It's actually 144 samples of paired-end data. Each end is a separate file so MultiQC had to compile 288 fastqc results files. Everything looks great (love it!) and MultiQC does not appear to have had any trouble with that many files. I'm using chrome on a laptop with 16 GB of RAM so that is likely helping to stop the browser from crashing.

Last edited by N311V; 12-08-2015 at 08:13 PM. Reason: added P.S.
N311V is offline   Reply With Quote
Old 12-08-2015, 09:58 PM   #8
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 26
Default

Great that it's working, and even better that you like it

You can either supply MultiQC with a parent directory that contains all files (it searches recursively through child directories), or give it multiple paths:

Code:
multiqc fastqc_dir tophat_dir
You can even give it a massive list of files if you want to:

Code:
multiqc *fastqc.zip *_tophat
Hope that helps..

Phil
ewels is offline   Reply With Quote
Old 12-08-2015, 11:15 PM   #9
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 778
Default

Great idea, thanks. Any chance you could add in kallisto and kraken support?
gringer is offline   Reply With Quote
Old 12-08-2015, 11:20 PM   #10
zinky
Member
 
Location: china

Join Date: Dec 2011
Posts: 48
Default

nice job, i starred already
zinky is offline   Reply With Quote
Old 12-08-2015, 11:39 PM   #11
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 26
Default

Quote:
Originally Posted by gringer View Post
Great idea, thanks. Any chance you could add in kallisto and kraken support?
Hi gringer, I can do yeah - I've noted these down as GitHub issues here and here.

If you have some typical log files that you could add, that would really help. Saves me from having to set up and run the programs myself (though I've been meaning to try them both out anyway).
ewels is offline   Reply With Quote
Old 12-09-2015, 12:03 AM   #12
zinky
Member
 
Location: china

Join Date: Dec 2011
Posts: 48
Default

Hi ewels, i run MultiQC on my 6 tophat output folders. i found all log files were parsed and it also gives a bowtie2 plot. So dose MultiQC check the log files fisrst then parsed keywords like "bowtie","tophat" to determine types of output? if so, i can modify file names manually and get exactly what i want(one item in plot with each sample) .
zinky is offline   Reply With Quote
Old 12-09-2015, 12:10 AM   #13
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 26
Default

Hi zinky,

The strategy for parsing files varies for each module. Unfortunately bowtie has no consistent file name structure and its output is very generic. Also, as many other programs use it then its output often crops up inside other programs. I can't think of any way to know the difference between log files generated by bowtie and those generated by programs that use bowtie. If you have any suggestions I'd love to hear them!

Anyway - the easiest fix for you is to just stop the bowtie modules from running. You can do this with the -e / --exclude parameter:

Code:
multiqc -e bowtie1 -e bowtie2 .
Let me know if you have any problems with this.

Phil
ewels is offline   Reply With Quote
Old 12-09-2015, 12:34 AM   #14
zinky
Member
 
Location: china

Join Date: Dec 2011
Posts: 48
Default

Hi Phil, Thanks for your quick reply; Well , your suggestion is a good choice for me; Technically, parsing output of those tools to generated summary report is a lightweight job, and it therefore raise a request to the software developers. This means ask them (well inluding me) to generate logs with ID and someother markers, that's not easy. so why not to suggest users runnig MultiQC which give an interface to call third-party tools inside , rather than the original tools. I have been working on workflow building for years, i think this kinds of exprience could be better for users
zinky is offline   Reply With Quote
Old 12-11-2015, 06:40 AM   #15
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 26
Default

Hi Zinky,

I think you're suggesting that I make MultiQC into a workflow / pipeline tool of some sort? I'd prefer to keep it focussed and as simple as possible I think, so that it can be easily added to the end of any workflow and used with data generated in any manner from these tools.

I've written and use a workflow tool called Cluster Flow, so MultiQC should inevitably work pretty well with that. But it should work well with everything.

Phil
ewels is offline   Reply With Quote
Old 06-17-2016, 04:58 AM   #16
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 26
Default

MultiQC has been published! You can find the manuscript about it in Bioinformatics, DOI 10.1093/bioinformatics/btw354.

Version v0.6 now available through PyPI / Bioconda / GitHub / at http://multiqc.info - v0.7 due to be released soon.
ewels is offline   Reply With Quote
Old 06-17-2016, 05:03 AM   #17
zinky
Member
 
Location: china

Join Date: Dec 2011
Posts: 48
Default

well. excelent work. i can see many novel modules added in your published version. I like this tool
zinky is offline   Reply With Quote
Old 06-17-2016, 08:02 AM   #18
r.rosati
Member
 
Location: Brazil

Join Date: Aug 2015
Posts: 33
Default

Congratulations for the great work!
Can I ask a naif question - how does publishing work in cases like this one, where the software is made widely available for use (i.e. published to users) before the "official" publication? Are there any specific caveats? Copyright issues? Prior publishing issues?
Thank you!
r.rosati is offline   Reply With Quote
Old 06-17-2016, 08:05 AM   #19
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 26
Default

Hi r.rosati

Thanks! As long as there is no published manuscript describing the software, then I think it's fine. In fact, I think having the software already open source and covered by an open-source licence at the time of submission is recommended (I usually request this when reviewing other people's papers).

Basically - the journal doesn't own any copyright over the software itself, only the manuscript.

Phil
ewels is offline   Reply With Quote
Old 07-04-2016, 05:55 AM   #20
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 26
Default

Hi everyone,

Version 0.7 of MultiQC has just been released! There's a new module for Kallisto, plus a lot of tweaking, tidying, bugfixing and new features. See the release for the full changelog.

You can now get MultiQC in Galaxy too (work by @devengineson, @yvanlebras & @cmonjeau).

MultiQC v0.7 is now available through PyPI / Bioconda / GitHub and http://multiqc.info
ewels is offline   Reply With Quote
Reply

Tags
bismark, fastqc, multiqc, quality control, reporting

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO