SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
quality control from fastq to vcf dongshenglulv Bioinformatics 3 11-05-2014 02:08 PM
Quality control of genomic resequencing data from a HiSeq gavin.oliver Genomic Resequencing 2 06-30-2013 01:48 AM
Webinar on Quality Control of NGS Data - FREE Strand SI Events / Conferences 0 09-09-2011 06:33 PM
TileQC: a system for tile-based quality control of Solexa data ScottC Illumina/Solexa 0 06-03-2008 04:54 PM
PubMed: TileQC: a system for tile-based quality control of Solexa data. Newsbot! Literature Watch 0 05-30-2008 08:21 AM

Reply
 
Thread Tools
Old 07-16-2010, 08:42 AM   #81
sowmyai
Member
 
Location: America

Join Date: Jan 2010
Posts: 27
Default

I have sent you an email with the reports. Thank you for taking the time to explain.
sowmyai is offline   Reply With Quote
Old 07-19-2010, 08:08 AM   #82
lparsons
Member
 
Location: NJ

Join Date: Nov 2008
Posts: 28
Lightbulb Clearer icon

Quote:
Originally Posted by simonandrews View Post
... the per-sequence quality check won't actually issue a warning or a fail - it just shows you the results and lets you decide. There are a couple of tests like this (the GC plot I think is another one). ...
One suggestion would be to change the icon for tests that do not issue a warning or a fail to a blue "i" icon for info instead of the green check mark. That would make it obvious to people that there is no check for this test.

Great software, btw. Thanks for solid work.
lparsons is offline   Reply With Quote
Old 07-19-2010, 10:14 AM   #83
Greg
Member
 
Location: British Columbia

Join Date: Oct 2009
Posts: 31
Default

This is a really awesome program. I really like how easy it is to use and how clearly it summarizes everything.

One thing, I cant seem to copy and paste out overrepresented sequences. This would be very useful as the first thing I want to do it figure out where the reads come from.
Greg is offline   Reply With Quote
Old 07-19-2010, 11:21 PM   #84
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by lparsons View Post
One suggestion would be to change the icon for tests that do not issue a warning or a fail to a blue "i" icon for info instead of the green check mark. That would make it obvious to people that there is no check for this test.
I'm actually looking at adding in checks for all tests in the next version. It's just a matter of finding the right measures and cutoffs. Suggestions welcome!
simonandrews is offline   Reply With Quote
Old 07-22-2010, 05:38 AM   #85
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by sowmyai View Post
How can most reads have an average quality of 34 while the individual base qualities are very poor ?
Having looked into this, it's a bug. The per base quality plot was using the lowest observed quality as an offset instead of the offset determined from the analysis of the encoding used.

In effect this means that the scale on the left of the plot was shifted downwards by whatever the difference in these values was. For Illumina fastq files they were off by 2 Phred units, but it could have been more in other formats.

This will be fixed in the next (impending) release.
simonandrews is offline   Reply With Quote
Old 07-26-2010, 12:55 AM   #86
Bruins
Member
 
Location: Groningen

Join Date: Feb 2010
Posts: 78
Default

Dear Simon,

How would you like to be cited? It is for a report that will (most likely) not be published.

Thanks,
Wil
Bruins is offline   Reply With Quote
Old 07-26-2010, 02:01 AM   #87
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by Bruins View Post
How would you like to be cited? It is for a report that will (most likely) not be published.
There isn't a paper to cite for FastQC as yet, so it's probably best to cite the project website:

http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc
simonandrews is offline   Reply With Quote
Old 07-26-2010, 03:18 AM   #88
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default FastQC v0.4.2 released

I've just put FastQC v0.4.2 up on our website.

This fixes the per-base quality plot bug which caused the y-axis to show an offset scale. It also adds more strict parsing of FastQ files to spot incorrectly formatted files, and more cleanly distinguish base called and colorspace files.

I've also now added fail / warn checks to all of the QC modules and improved some of the existing checks which would fail for libraries with unusual GC contents. As part of this I've added a modelled distribution into the per-sequence GC plot so you can see how well your observed distribution fits.

Finally, I've changed the scaling on the graphs in the HTML reports so that wider graphs will be generated for libraries with long reads so you don't get squashed graphs.

You can get the new version from:

http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

[If you don't see the new version of any page hit shift+refresh to force our cache to update]

Last edited by simonandrews; 07-26-2010 at 03:43 AM.
simonandrews is offline   Reply With Quote
Old 07-29-2010, 06:27 AM   #89
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,165
Default

Quote:
Originally Posted by simonandrews View Post
I've just put FastQC v0.4.2 up on our website.
Simon,

There appears to be a bug introduced in v0.4.2 related to the "Total Sequences" count reported in the Basic Statistics. The new version consistently under reports the number of reads in the file. Previous versions correctly reported the count.

Looking at the documentation I see that there was a planned feature for sampling just a subset of reads in a file and then reporting an estimate of the total number of reads. Could this have something to do with it?
kmcarr is offline   Reply With Quote
Old 07-29-2010, 11:58 PM   #90
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by kmcarr View Post
There appears to be a bug introduced in v0.4.2 related to the "Total Sequences" count reported in the Basic Statistics. The new version consistently under reports the number of reads in the file. Previous versions correctly reported the count.
You're right - depending on your file the total sequence count may be off by a few percent (either up or down). It looks like this bug has been in place since v0.2 though!


Quote:
Originally Posted by kmcarr View Post
Looking at the documentation I see that there was a planned feature for sampling just a subset of reads in a file and then reporting an estimate of the total number of reads. Could this have something to do with it?
Yes - that was the basic cause. In an early version we had the option to sample only the first x bases and extrapolate from them. We therefore make an estimate of the total number of sequences based on the record size and the filesize as well as taking a proper count. These days the estimate should only be used for calculating the %complete progress, but I was incorrectly using the estimated rather than the real value in the basic stats.

Thanks for spotting this. I'll put out v0.4.3 later today with a fix for this problem.
simonandrews is offline   Reply With Quote
Old 07-30-2010, 01:39 AM   #91
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

I've just put up v0.4.3 on our website which fixes the sequence count problem.
simonandrews is offline   Reply With Quote
Old 08-11-2010, 09:41 PM   #92
flower6991
Junior Member
 
Location: china

Join Date: Jan 2010
Posts: 1
Default find contaminant sequence

hello Simon,

I use FastQC to evauate my sequence data.
The last part is contaminant(overrepresented sequences)

Total Sequences 9265299
Sequence length 42

It like this:
{
>>Overrepresented sequences fail
#SequenceCountPercentagePossible Source
GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTAGATCGGAAG 119288 1.2874705932317998 Illumina Single End Apapter 2 (96% over 32bp)
GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAAAAAAAA 112538 1.2146181143209733 Illumina Single End Apapter 2 (100% over 33bp)
AATTCGAATATCTGCCGAATGCCGTGTGGACGTAAGCGTGAA 29127 0.3143665412200945 No Hit
GATCGGAAGAGCTGTATGCCGTCTTCTGCTTAGATCGGAAGA 24460 0.2639957976531572 No Hit
AATTCACAGGTGTTCTCCCGTATTGTTGACATGCCAGCGGGT 20305 0.21915104952360417 No Hit
AATTCCCCTTGATTGCAAGGGGAACGAAATAGACAGATCGCT 17190 0.18553097962623763 No Hit
}

How can I find these contaminant sequences from all data?
use fastQC or bioperl module? or other algorithms?

Is this data's quality too poor that we can not use it to analysis ?


Thank you very much
flower6991 is offline   Reply With Quote
Old 08-11-2010, 11:49 PM   #93
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by flower6991 View Post
hello Simon,

I use FastQC to evauate my sequence data.
The last part is contaminant(overrepresented sequences)

Total Sequences 9265299
Sequence length 42

It like this:
{
>>Overrepresented sequences fail
#SequenceCountPercentagePossible Source
GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTAGATCGGAAG 119288 1.2874705932317998 Illumina Single End Apapter 2 (96% over 32bp)
GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAAAAAAAA 112538 1.2146181143209733 Illumina Single End Apapter 2 (100% over 33bp)
AATTCGAATATCTGCCGAATGCCGTGTGGACGTAAGCGTGAA 29127 0.3143665412200945 No Hit
GATCGGAAGAGCTGTATGCCGTCTTCTGCTTAGATCGGAAGA 24460 0.2639957976531572 No Hit
AATTCACAGGTGTTCTCCCGTATTGTTGACATGCCAGCGGGT 20305 0.21915104952360417 No Hit
AATTCCCCTTGATTGCAAGGGGAACGAAATAGACAGATCGCT 17190 0.18553097962623763 No Hit
}
So this is saying that you have some adapter contamination in your sample. You've probably lost 5-10% of your sequences to this contamination, but there's no reason to think that the rest of it won't be usable.

Quote:
Originally Posted by flower6991 View Post
How can I find these contaminant sequences from all data?
use fastQC or bioperl module? or other algorithms?
FastQC is not intended to be a filter - merely just to report on the state of your data. There are plenty of other tools out there which you can use to remove these contaminants if you need to do that before running the rest of your analyses.

Quote:
Originally Posted by flower6991 View Post
Is this data's quality too poor that we can not use it to analysis ?
There's nothing in this result to suggest that - it simply shows that the data is contaminated. You need to look at the rest of the results as well to assess the overall quality of your data.

FastQC output shouldn't be taken too literally. Just because you get a red cross against one or more tests doesn't necessarily mean that you should throw your data away. I can think of legitimate reasons why some data sets would fail every single one of the tests - and that's OK. What the program aims to do is to point things out to you ("Did you know that 3 sequences make up 50% of your data?" etc). Beyond that it's really up to you to decide if this means that the data is too poor to use, if you go ahead - but bear the FastQC results in mind in your interpretation, or if you decide the warning is spurious for the type of data you're analysing.

For example - every one of our PhiX control lanes now fails QC as assessed by FastQC because the degree of sequence duplication is ridiculously high. This is both a correct and irrelevant result. In a supposedly diverse library this would indicate a real problem, but in a PhiX lane we expect that. You have to judge the results based on your knowledge of the experiment.
simonandrews is offline   Reply With Quote
Old 09-23-2010, 06:16 AM   #94
fabrice
Member
 
Location: paris

Join Date: Oct 2009
Posts: 86
Default

Fastqc: Version 0.5.0

When I run fastqc in the home directory ~/bin/FastQC, I got this error.

java -Xmx250m -classpath . uk.ac.bbsrc.babraham.FastQC.FastQCApplication


Exception in thread "main" java.lang.NoClassDefFoundError: uk/ac/bbsrc/babraham/FastQC/FastQCApplication

java version "1.5.0_17"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_17-b04)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_17-b04, mixed mode)
fabrice is offline   Reply With Quote
Old 09-23-2010, 07:26 AM   #95
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by fabrice View Post
Fastqc: Version 0.5.0

When I run fastqc in the home directory ~/bin/FastQC, I got this error.

java -Xmx250m -classpath . uk.ac.bbsrc.babraham.FastQC.FastQCApplication


Exception in thread "main" java.lang.NoClassDefFoundError: uk/ac/bbsrc/babraham/FastQC/FastQCApplication
This will be because you have an existing classpath defined and you need to add the new directory to it, rather than replacing it.

If you're running fastqc on a unix system from the command line it's much easier to use the wrapper script which is included in the distribution.

In your case you'd initially need to do:

chmod 755 ~/bin/FastQC/fastqc

..then in future you can do:

~/bin/FastQC/fastqc [your list of files]
simonandrews is offline   Reply With Quote
Old 09-23-2010, 07:38 AM   #96
fabrice
Member
 
Location: paris

Join Date: Oct 2009
Posts: 86
Default

The script fastqc does not work for command line.
On mac:
java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02-279-10M3065)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01-279, mixed mode)

./fastqc aa.txt
Exception in thread "main" java.lang.NoClassDefFoundError: uk/ac/bbsrc/babraham/FastQC/FastQCApplication
Caused by: java.lang.ClassNotFoundException: uk.ac.bbsrc.babraham.FastQC.FastQCApplication
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)

On debian:

java -version
java version "1.5.0"
gij (GNU libgcj) version 4.3.2

Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

./fastqc a.txt
Exception in thread "main" java.lang.NoClassDefFoundError: uk.ac.bbsrc.babraham.FastQC.FastQCApplication
at gnu.java.lang.MainThread.run(libgcj.so.90)
Caused by: java.lang.ClassNotFoundException: uk.ac.bbsrc.babraham.FastQC.FastQCApplication not found in gnu.gcj.runtime.SystemClassLoader{urls=[file:./,file:~/bin/FastQC/,file:~/bin/FastQC/], parent=gnu.gcj.runtime.ExtensionClassLoader{urls=[], parent=null}}
at java.net.URLClassLoader.findClass(libgcj.so.90)
at java.lang.ClassLoader.loadClass(libgcj.so.90)
at java.lang.ClassLoader.loadClass(libgcj.so.90)
at gnu.java.lang.MainThread.run(libgcj.so.90)


Quote:
Originally Posted by simonandrews View Post
This will be because you have an existing classpath defined and you need to add the new directory to it, rather than replacing it.

If you're running fastqc on a unix system from the command line it's much easier to use the wrapper script which is included in the distribution.

In your case you'd initially need to do:

chmod 755 ~/bin/FastQC/fastqc

..then in future you can do:

~/bin/FastQC/fastqc [your list of files]
fabrice is offline   Reply With Quote
Old 09-23-2010, 07:40 AM   #97
fabrice
Member
 
Location: paris

Join Date: Oct 2009
Posts: 86
Default

On unbantu:

java -version
java version "1.6.0_18"
Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)

Exception in thread "main" java.lang.NoClassDefFoundError: uk/ac/bbsrc/babraham/FastQC/FastQCApplication
Caused by: java.lang.ClassNotFoundException: uk.ac.bbsrc.babraham.FastQC.FastQCApplication
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
Could not find the main class: uk.ac.bbsrc.babraham.FastQC.FastQCApplication. Program will exit.
fabrice is offline   Reply With Quote
Old 09-23-2010, 11:08 AM   #98
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by fabrice View Post
Exception in thread "main" java.lang.NoClassDefFoundError: uk/ac/bbsrc/babraham/FastQC/FastQCApplication
Could you by any chance have downloaded the source distribution instead of the compiled version? The errors are all saying that java can't find the initial class file, which it should be able to if the classpath is set correctly.

Can you look in uk/ac/bbsrc/babraham/FastQC/ and see if you see a file called FastQCApplication.class. If you see a file called FastQCApplication.java then you've got the source files rather than the binaries.
simonandrews is offline   Reply With Quote
Old 09-23-2010, 12:15 PM   #99
fabrice
Member
 
Location: paris

Join Date: Oct 2009
Posts: 86
Default

The files are:

Analysis FastQCApplication.java Graphs Modules Resources Sequence
Dialogs FastQCMenuBar.java Help Report Results Statistics

Quote:
Originally Posted by simonandrews View Post
Could you by any chance have downloaded the source distribution instead of the compiled version? The errors are all saying that java can't find the initial class file, which it should be able to if the classpath is set correctly.

Can you look in uk/ac/bbsrc/babraham/FastQC/ and see if you see a file called FastQCApplication.class. If you see a file called FastQCApplication.java then you've got the source files rather than the binaries.
fabrice is offline   Reply With Quote
Old 09-23-2010, 11:02 PM   #100
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by fabrice View Post
The files are:
FastQCApplication.java
Those are the source code files (which is why it won't run). You need to download the compiled version, either the generic zip file or the Mac application bundle.
simonandrews is offline   Reply With Quote
Reply

Tags
fastq, quality, report

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:36 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO