Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MultiQC - a new tool to create summary reports from any analysis output

    Hi everyone,

    I've recently released a new tool called MultiQC - it's a simple and quick command line tool that you point at a directory containing output from your analysis. It runs through all of the files and finds output that it recognises, building a single HTML report to summarise this information.

    I wrote it because of the difficulty we have when generating hundreds of FastQC reports - it's impossible to go through all of these, so MultiQC makes one report with all of the samples plotted together in one place. This makes it very easy to spot any outliers and see trends.

    This approach is generalised in MultiQC, with 12 other programs currently supported (see below). Output from any of these modules will be combined into a single report, meaning that you can follow the progress of each sample through your analysis pipeline.

    You can find MultiQC here:


    If you have Python & pip on your system, you can install MultiQC as follows:

    Code:
    # install MultiQC
    pip install multiqc
    # run MultiQC
    multiqc <analysis_dir>
    See the documentation for more in depth instructions.


    You can see a mini-poster that I made about the tool below. Any questions / requests / feedback welcome!

    Phil



    Currently supported tools:
    1. FastQC
    2. FastQ Screen
    3. Cutadapt
    4. Bismark
    5. STAR
    6. Tophat
    7. Bowtie
    8. Bowtie 2
    9. Subread featureCounts
    10. Picard MarkDuplicates
    11. Preseq
    12. Qualimap
    Attached Files

  • #2
    Looks great! I have a heap of fastQC and Tophat files I can test it out on.

    Comment


    • #3
      Great - let me know how you get on!

      Note that fkrueger redefined what I thought "a lot of reports" meant last night by running MultiQC on over 3000 samples in one go. Note that the HMTL report will probably fall over if you use sample numbers in this order of magnitude, but MultiQC also creates tab-delimited text files which you can open in Excel / use downstream.

      Comment


      • #4
        Tried it out earlier. Very impressed!

        Comment


        • #5
          Originally posted by tallphil View Post
          Great - let me know how you get on!

          Note that fkrueger redefined what I thought "a lot of reports" meant last night by running MultiQC on over 3000 samples in one go.
          That redefines what I mean by a lot also, I was talking about 142 samples! I'll hopefully get a chance to run it today.

          Comment


          • #6
            Originally posted by NeilPearson View Post
            Tried it out earlier. Very impressed!
            Great, glad you liked it!

            N311V - 142 samples is still quite a bit (it will probably not render all of the plots by default to prevent locking up the browser), but it should work fine I think

            Note that I'm hoping to make a new template at some point in the future that has flat pre-rendered plots instead of interactive JavaScript plots. This should mean that reports with huge numbers of samples will work and don't have huge filesizes. See the GitHub issue about this here.

            Comment


            • #7
              Hi ewels,

              I've finally got time to run MultiQC but having trouble providing a list of directories to search. Do I need top put my fastqc and tophat results all in the same directory to produce a single report for both?

              P.S. It's actually 144 samples of paired-end data. Each end is a separate file so MultiQC had to compile 288 fastqc results files. Everything looks great (love it!) and MultiQC does not appear to have had any trouble with that many files. I'm using chrome on a laptop with 16 GB of RAM so that is likely helping to stop the browser from crashing.
              Last edited by N311V; 12-08-2015, 09:13 PM. Reason: added P.S.

              Comment


              • #8
                Great that it's working, and even better that you like it

                You can either supply MultiQC with a parent directory that contains all files (it searches recursively through child directories), or give it multiple paths:

                Code:
                multiqc fastqc_dir tophat_dir
                You can even give it a massive list of files if you want to:

                Code:
                multiqc *fastqc.zip *_tophat
                Hope that helps..

                Phil

                Comment


                • #9
                  Great idea, thanks. Any chance you could add in kallisto and kraken support?

                  Comment


                  • #10
                    nice job, i starred already

                    Comment


                    • #11
                      Originally posted by gringer View Post
                      Great idea, thanks. Any chance you could add in kallisto and kraken support?
                      Hi gringer, I can do yeah - I've noted these down as GitHub issues here and here.

                      If you have some typical log files that you could add, that would really help. Saves me from having to set up and run the programs myself (though I've been meaning to try them both out anyway).

                      Comment


                      • #12
                        Hi ewels, i run MultiQC on my 6 tophat output folders. i found all log files were parsed and it also gives a bowtie2 plot. So dose MultiQC check the log files fisrst then parsed keywords like "bowtie","tophat" to determine types of output? if so, i can modify file names manually and get exactly what i want(one item in plot with each sample) .

                        Comment


                        • #13
                          Hi zinky,

                          The strategy for parsing files varies for each module. Unfortunately bowtie has no consistent file name structure and its output is very generic. Also, as many other programs use it then its output often crops up inside other programs. I can't think of any way to know the difference between log files generated by bowtie and those generated by programs that use bowtie. If you have any suggestions I'd love to hear them!

                          Anyway - the easiest fix for you is to just stop the bowtie modules from running. You can do this with the -e / --exclude parameter:

                          Code:
                          multiqc -e bowtie1 -e bowtie2 .
                          Let me know if you have any problems with this.

                          Phil

                          Comment


                          • #14
                            Hi Phil, Thanks for your quick reply; Well , your suggestion is a good choice for me; Technically, parsing output of those tools to generated summary report is a lightweight job, and it therefore raise a request to the software developers. This means ask them (well inluding me) to generate logs with ID and someother markers, that's not easy. so why not to suggest users runnig MultiQC which give an interface to call third-party tools inside , rather than the original tools. I have been working on workflow building for years, i think this kinds of exprience could be better for users

                            Comment


                            • #15
                              Hi Zinky,

                              I think you're suggesting that I make MultiQC into a workflow / pipeline tool of some sort? I'd prefer to keep it focussed and as simple as possible I think, so that it can be easily added to the end of any workflow and used with data generated in any manner from these tools.

                              I've written and use a workflow tool called Cluster Flow, so MultiQC should inevitably work pretty well with that. But it should work well with everything.

                              Phil

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X