Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • FastQC: A quality control application for FastQ data

    I have just put up on our website the first release of an application we have developed to perform QC checks on high throughput sequence data.

    FastQC runs a series of tests and will flag up and potential problems with your data.

    The program can either be run as an interactive GUI application or it can run in an unattended offline mode where it generates HTML versions of its reports.

    We've been using this on some of our data for a few weeks and have found it really useful for looking at aspects of your data which the standard instrument QC checks may miss.

    FastQC is free software under the GPLv3. You can download it from:

    http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

    ..where there is also a sample report which you can look at.

    [Please note that the rather aggressive BBSRC cache may show you old versions of some pages - if you can't see FastQC on some of our pages please press shift+refresh in your browser to force an update which bypasses the cache].

    We are keen to get feedback from other sites - in particular we'd like to know:
    • Are there other tests you think would be useful
    • Are the criteria we're using to warn about potentially bad data any good (and can you suggest improvements)


    I hope this proves useful to some people here.

    Simon.

  • #2
    Hi Simon,

    I would really like to use FastQC for my project but am getting the following error message when I try to run it non-interactively on our Linux cluster:

    $ java -Xmx250m -cp ~/bin/fastqc/FastQC

    uk.ac.bbsrc.babraham.FastQC.FastQCApplication testFastQC.fastq
    Exception in thread "main" java.awt.HeadlessException:
    No X11 DISPLAY variable was set, but this program performed an operation which requires it.
    at java.awt.GraphicsEnvironment.checkHeadless(GraphicsEnvironment.java:159)
    at java.awt.Window.<init>(Window.java:431)
    at java.awt.Frame.<init>(Frame.java:403)
    at java.awt.Frame.<init>(Frame.java:368)
    at javax.swing.JFrame.<init>(JFrame.java:158)
    at uk.ac.bbsrc.babraham.FastQC.FastQCApplication.<init>(FastQCApplication.java:197)
    at uk.ac.bbsrc.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:63)

    These are the details for our java installation:

    [sensh@saturn FastQC]$ java -version
    java version "1.6.0_17"
    Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
    Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)

    Any pointers?

    Thanks,

    Shurjo

    Comment


    • #3
      Your cluster setup is blocking X11 / the graphical interface (or you are not exporting the display to your local machine if it does).

      Comment


      • #4
        You might want to try setting a DISPLAY environment variable (export DISPLAY=:0.0) even if you're running on a headless system. If you specify a filename when launching FastQC then no windows should open, but since the program uses some swing classes behind the scenes then java might be getting itself confused.

        I'll try to replicate this on one of our servers and see if I can trigger the same problem.

        Comment


        • #5
          I also tried putting -Djava.awt.headless=true with the java command but it didn't work. The export DISPLAY works though.
          Last edited by lletourn; 04-27-2010, 06:47 AM.

          Comment


          • #6
            OK, it turns out there are two problems here.

            One, as lletourn pointed out is that if you're running a headless server you need to add -Djava.awt.headless=true

            There is also a change I need to make internally to FastQC to stop it trying to set up a graphical window (which it never displays) if it's running non-interactively.

            I'll try to get an update out tomorrow with a fix for the internal problem and better instructions.

            Comment


            • #7
              the error message


              C:\download\FastQC>java -Xmx250m -classpath . uk.ac.bbsrc.babraham.FastQC.FastQC
              Application

              Exception in thread "Thread-5" java.lang.IllegalArgumentException: No knonwn enc odings with chars < 33
              at uk.ac.bbsrc.babraham.FastQC.Sequence.PhredEncoding.getFastQEncodingOf
              fset(PhredEncoding.java:30)
              at uk.ac.bbsrc.babraham.FastQC.Modules.PerBaseQualityScores.getPercentag
              es(PerBaseQualityScores.java:65)
              at uk.ac.bbsrc.babraham.FastQC.Modules.PerBaseQualityScores.getResultsPa
              nel(PerBaseQualityScores.java:56)
              at uk.ac.bbsrc.babraham.FastQC.Results.ResultsPanel.analysisComplete(Res
              ultsPanel.java:117)
              at uk.ac.bbsrc.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunne
              r.java:84)
              at java.lang.Thread.run(Unknown Source)

              Comment


              • #8
                Originally posted by simonandrews View Post
                OK, it turns out there are two problems here.

                One, as lletourn pointed out is that if you're running a headless server you need to add -Djava.awt.headless=true

                There is also a change I need to make internally to FastQC to stop it trying to set up a graphical window (which it never displays) if it's running non-interactively.

                I'll try to get an update out tomorrow with a fix for the internal problem and better instructions.
                Thanks! I will wait eagerly.

                Comment


                • #9
                  Originally posted by cadlag View Post
                  the error message
                  java.lang.IllegalArgumentException: No knonwn encodings with chars < 33
                  That's interesting. What is the source for the FastQ file which failed? According to wikipedia (so it must be true), there aren't any quality encoding variants which use characters lower than 33.

                  If you'd be happy to let me have a copy of the FastQ file which is failing I'll take a look - contact me off list ([email protected]). If not I'll add some more debugging to the next release so it will still fail but might give more of a clue as to the parameters it's seeing.

                  Comment


                  • #10
                    FastQC v0.1.1 is now up on our website. This contains a fix for the problem with headless operation. You should just be able to run the program as described in the original install document (no need to add extra property settings) and it should now work as long as you specify the file(s) to process on the command line.

                    [If you can't see the update on our site please press shift+refresh in your browser to force it to update the cache]

                    Please let me know if this fixes things.

                    Comment


                    • #11
                      Looks great Simon, I'm suddenly aware of the skewed base-composition of some of our runs in the beginning of the sequences, but they level off and become basically uniform at 25% around base 15 and onwards. The runs are otherwise fine, has anyone seen similar results/artefacts?

                      Comment


                      • #12
                        We've seen similarly odd biases, both in sequence composition and unusually low qualities at the start of some runs, and I know of other groups who've also seen this. Normally it's only a minor effect, but in samples which are of generally poorer quality it can be really noticeable.

                        I don't know of an explanation for this. If it affects qualities as well as base calls I'd guess it would be a bias in the sequencing chemistry or the cluster calling?

                        Comment


                        • #13
                          The qualities look fine so it's not an issue of bad base calling. I think you could be right that the cluster calling and/or sequencing chemistry could explain some of it. Could perhaps explain why certain sequences in the genome are less likely to be sequenced, we often see peaks and valleys in exons in our RNA-seq runs which are most likely explained by sequencing artefacts.

                          Comment


                          • #14
                            Hi! Thanks for sharing this program - I like the idea of getting a summary look at the FASTQ's

                            I tried from 3 different linux boxes, but get the same error :


                            java -Xmx250m -classpath . uk.ac.bbsrc.babraham.FastQC.FastQCApplication

                            Exception in thread "main" java.lang.NoClassDefFoundError: uk/ac/bbsrc/babraham/FastQC/FastQCApplication
                            Caused by: java.lang.ClassNotFoundException: uk.ac.bbsrc.babraham.FastQC.FastQCApplication
                            at java.net.URLClassLoader$1.run(Unknown Source)
                            at java.security.AccessController.doPrivileged(Native Method)
                            at java.net.URLClassLoader.findClass(Unknown Source)
                            at java.lang.ClassLoader.loadClass(Unknown Source)
                            at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
                            at java.lang.ClassLoader.loadClass(Unknown Source)
                            . Program will exit.in class: uk.ac.bbsrc.babraham.FastQC.FastQCApplication

                            is there something obvious I'm missing here? sorry - it's been ages since I've programmed in Java.

                            Comment


                            • #15
                              Originally posted by simonandrews View Post
                              FastQC v0.1.1 is now up on our website. This contains a fix for the problem with headless operation. You should just be able to run the program as described in the original install document (no need to add extra property settings) and it should now work as long as you specify the file(s) to process on the command line.

                              [If you can't see the update on our site please press shift+refresh in your browser to force it to update the cache]

                              Please let me know if this fixes things.
                              Hi Simon,

                              The download site still shows FastQC v0.1 even after clearing my cache. Am I missing something here?

                              Thanks,

                              Shurjo

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM
                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin



                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-14-2024, 06:13 AM
                              0 responses
                              33 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-08-2024, 08:03 AM
                              0 responses
                              72 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-07-2024, 08:13 AM
                              0 responses
                              81 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-06-2024, 09:51 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X