SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
quality control from fastq to vcf dongshenglulv Bioinformatics 3 11-05-2014 03:08 PM
Quality control of genomic resequencing data from a HiSeq gavin.oliver Genomic Resequencing 2 06-30-2013 02:48 AM
Webinar on Quality Control of NGS Data - FREE Strand SI Events / Conferences 0 09-09-2011 07:33 PM
TileQC: a system for tile-based quality control of Solexa data ScottC Illumina/Solexa 0 06-03-2008 05:54 PM
PubMed: TileQC: a system for tile-based quality control of Solexa data. Newsbot! Literature Watch 0 05-30-2008 09:21 AM

Reply
 
Thread Tools
Old 05-28-2010, 08:45 AM   #41
Bruins
Member
 
Location: Groningen

Join Date: Feb 2010
Posts: 78
Default

</snip>

And suddenly posts appeared... I'll need to use f5 more... My shell-script would have been similar to Simon's perl. No alias if you wish to pass arguments. Stick with the perl.

Wil

ps Simon thank you for all your very quick replies!
Bruins is offline   Reply With Quote
Old 06-01-2010, 05:25 PM   #42
ScottC
Senior Member
 
Location: Monash University, Melbourne, Australia.

Join Date: Jan 2008
Posts: 246
Default

I'd be interested in seeing some reports from other people's runs. I'm always keen to compare our instrument's data output, even in a very summarised format such as this, with other instruments around the world. Is anyone interested in sharing reports? Perhaps even as an anonymous format?
ScottC is offline   Reply With Quote
Old 06-01-2010, 08:27 PM   #43
ECO
--Site Admin--
 
Location: SF Bay Area, CA, USA

Join Date: Oct 2007
Posts: 1,352
Default

Quote:
Originally Posted by ScottC View Post
I'd be interested in seeing some reports from other people's runs. I'm always keen to compare our instrument's data output, even in a very summarised format such as this, with other instruments around the world. Is anyone interested in sharing reports? Perhaps even as an anonymous format?
That would be an excellent resource for group-troubleshooting. I'll download it and play with it myself and see if we can't whip up some sort of wiki-compilation of outputs.

edit: What would also be great is if this could be adapted to display multiple reports over time...so one can track performance and even compare directly against old runs.
ECO is offline   Reply With Quote
Old 06-02-2010, 12:58 AM   #44
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 869
Default

I'd be really interested to see some more report from other sites. Apart from anything else I'd like to know whether the criteria I've set to call a test as a warn or fail are sensible on other people's data.

The idea of having some larger integrated reporting system for QC reports is tempting but I'm wary of stretching the original idea of FastQC too far. I kind of like the idea of connecting an install of the program to a reporting server so that the data from every report generated goes into a central system, but I'm not sure I'm going to find time to do that in the near future.

A quicker solution would be to have a site where you could upload the report zip files and make a browsable site where you could view these reports split by technology / chemistry / sample source etc. Would people be interested in this? More importantly would anyone be willing to put in some of their own data (anonymously of course)?
simonandrews is offline   Reply With Quote
Old 06-02-2010, 04:04 AM   #45
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

I think this is an interesting idea, although I think the novelty of comparing plots might wear off pretty soon. For example, people don't collect chromatograms from sanger sequencing any more

I agree, though, that it would be interesting to see things such as barcoded samples, extreme GC content, different machines, etc.
NGSfan is offline   Reply With Quote
Old 06-02-2010, 04:35 PM   #46
ScottC
Senior Member
 
Location: Monash University, Melbourne, Australia.

Join Date: Jan 2008
Posts: 246
Default

I'd be interested in that, and I'd be prepared to submit some data.

NGSfan: I'm mainly interested in benchmarking our system against others. Because I'm running the machine, I'm always interested in how well it's performing in comparison to other sequencing service providers!
ScottC is offline   Reply With Quote
Old 06-03-2010, 09:23 AM   #47
lparsons
Member
 
Location: NJ

Join Date: Nov 2008
Posts: 28
Default

Excellent utility Simon. Thank you.

I'm running into what looks like an old bug, however. I'm using FASTQC version 0.3.1 on a SunOS 5.10 server and I'm getting a HeadlessException. Any tips on solving this?

Code:
Exception in thread "main" java.awt.HeadlessException: 
No X11 DISPLAY variable was set, but this program performed an operation which requires it.
        at sun.java2d.HeadlessGraphicsEnvironment.getDefaultScreenDevice(HeadlessGraphicsEnvironment.java:65)
        at javax.swing.RepaintManager.getVolatileOffscreenBuffer(RepaintManager.java:583)
        at javax.swing.JComponent.paintDoubleBuffered(JComponent.java:4911)
        at javax.swing.JComponent.paint(JComponent.java:996)
        at uk.ac.bbsrc.babraham.FastQC.Graphs.QualityBoxPlot.paint(QualityBoxPlot.java:81)
        at uk.ac.bbsrc.babraham.FastQC.Graphs.QualityBoxPlot.paint(QualityBoxPlot.java:75)
        at uk.ac.bbsrc.babraham.FastQC.Modules.PerBaseQualityScores.makeReport(PerBaseQualityScores.java:184)
        at uk.ac.bbsrc.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:63)
        at uk.ac.bbsrc.babraham.FastQC.Analysis.OfflineRunner.processFile(OfflineRunner.java:82)
        at uk.ac.bbsrc.babraham.FastQC.Analysis.OfflineRunner.<init>(OfflineRunner.java:28)
        at uk.ac.bbsrc.babraham.FastQC.FastQCApplication.main(FastQCApplication.java:71)

Last edited by lparsons; 06-03-2010 at 09:30 AM.
lparsons is offline   Reply With Quote
Old 06-03-2010, 12:16 PM   #48
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 869
Default

Quote:
Originally Posted by lparsons View Post
I'm running into what looks like an old bug, however. I'm using FASTQC version 0.3.1 on a SunOS 5.10 server and I'm getting a HeadlessException. Any tips on solving this?

Code:
Exception in thread "main" java.awt.HeadlessException: 
No X11 DISPLAY variable was set, but this program performed an operation which requires it.
        at sun.java2d.HeadlessGraphicsEnvironment.getDefaultScreenDevice(HeadlessGraphicsEnvironment.java:65)
That's really strange. It's throwing a Headless exception from within the HeadlessGraphicsEnvironment! That means that the headless environment is being correctly set. (which was the original bug which was fixed in an earlier revision).

At first glance this looks like it has to be a bug in the core java class - especially as it seems to be SunOS specific.

As a test can you try setting a DISPLAY environment variable and see if it then works. It may be a redundant check for something which isn't actually required.
simonandrews is offline   Reply With Quote
Old 06-04-2010, 06:33 AM   #49
colindaven
Senior Member
 
Location: Germany

Join Date: Oct 2008
Posts: 392
Default

Nice work Simon, this is a simple and easy to use package.
colindaven is offline   Reply With Quote
Old 06-07-2010, 08:07 AM   #50
antoniou
Junior Member
 
Location: Long Island, NY

Join Date: Oct 2008
Posts: 7
Default

Quote:
Originally Posted by Thomas Doktor View Post
The qualities look fine so it's not an issue of bad base calling. I think you could be right that the cluster calling and/or sequencing chemistry could explain some of it. Could perhaps explain why certain sequences in the genome are less likely to be sequenced, we often see peaks and valleys in exons in our RNA-seq runs which are most likely explained by sequencing artefacts.
I have seen the same phenomena but only with our mRNA-Seq libraries. Our genomic libraries do not show any biases. Has anyone else experienced this? Could it be an artifact of the Illumina library preparation protocol, may be at the fragmentation step?
antoniou is offline   Reply With Quote
Old 06-07-2010, 08:26 AM   #51
lletourn
Member
 
Location: Montreal

Join Date: Oct 2009
Posts: 63
Default

The illumina RNA protocol uses random hexamers to amplify the RNA. The thing is they are not 100% random so the beginning looks skewed for base composition, but that's because of the amplification.

For mapping it's no problem. For assembly it might confuse some assemblers. (When assembling I would trim the 5' of RNA, not for mapping)
lletourn is offline   Reply With Quote
Old 06-07-2010, 09:01 AM   #52
antoniou
Junior Member
 
Location: Long Island, NY

Join Date: Oct 2008
Posts: 7
Default

I just came across a reference to the following article in a different thread.
http://www.ncbi.nlm.nih.gov/pubmed/2...?dopt=Citation
It also attributes the biases to random priming.


Eric
antoniou is offline   Reply With Quote
Old 06-18-2010, 03:46 AM   #53
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 869
Default FastQC v0.4 released

I've just put FastQC v0.4 up on our website.

FastQC v0.4 introduces a new analysis module, an easier way to launch the program from the command line and a new output file, as well as fixing a few minor bugs.

The new analysis module is the sequence duplication level module. This is a complement to the existing overrepresented sequences module in that it looks at sequences which occur more than once in your data. The new module takes a more global view and says what proportion of all of your sequences occur once, twice, three times etc. In a diverse library most sequences should occur only once. A highly enriched library may have some duplication, but higher levels of duplication may indicate a problem, such as a PCR overamplification.

In response to several requests we've also now introduced a new output file into the report. This is a text based, tab delimited file which includes all of the data show in the graphs in the graphical report. This would allow people
running pipelines to store the data generated by fastQC and analyse it systematically rather than just taking the pass/fail/warn summary, or reviewing the reports manually.

Finally, if you're running fastqc from the command line we've now included a 'fastqc' wrapper script which you can launch directly rather than having to construct a java launch command. You can still pass -Dxxx options through to the program, but for simple analyses you can now simply run:

fastqc [some files]

..once you have included the FastQC install directory into your path. More details are in the install document.

You can get the new version from:

http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/

[If you don't see the new version of any page hit control+refresh to force our cache to update]
simonandrews is offline   Reply With Quote
Old 06-18-2010, 11:16 AM   #54
NGSfan
Senior Member
 
Location: Austria

Join Date: Apr 2009
Posts: 181
Default

Fantastic! I really like the command line ability - really good for pipelines.

Also nice that you display the Quality score type (Illumina v#/Sanger) in your output - helps to sort out confusion quickly when going through older data, especially after all of Illumina's schizophrenic quality score changes .
NGSfan is offline   Reply With Quote
Old 06-20-2010, 03:09 AM   #55
agc
Member
 
Location: Jerusalem

Join Date: May 2010
Posts: 26
Default

I'd like to run FastQC on SOLiD reads. I saw that someone did this using solid2fastq. Is it possible to do it without running solid2fastq? IE, would it work with only the SOLiD 'quals' file?

EDIT: After running FastQC on SOLiD files converted to fastq files via solid2fastq, the results file says (under basic statistics):
File type Conventional base calls

Should it have recognized it as colorspace?

Thanks!

Last edited by agc; 06-20-2010 at 04:59 AM.
agc is offline   Reply With Quote
Old 06-20-2010, 06:47 PM   #56
mard
Member
 
Location: Melbourne

Join Date: Jan 2010
Posts: 21
Default

Hi Simon,

Thanks for the new features in FastQC v0.4.
I just installed v0.4 but got the error below when running it on a fastq file (I had previously run v0.3 on this file with no issues.)

Processing sequence.fastq
Approx 5% complete for sequence.fastq
Exception in thread "AWT-AppKit" Exception in thread "Thread-3" java.lang.OutOfMemoryError: Java heap space
mard is offline   Reply With Quote
Old 06-21-2010, 12:07 AM   #57
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 869
Default

Quote:
Originally Posted by mard View Post
Processing sequence.fastq
Approx 5% complete for sequence.fastq
Exception in thread "AWT-AppKit" Exception in thread "Thread-3" java.lang.OutOfMemoryError: Java heap space
The error is because the program ran out of memory. The new version will use a bit more memory than the previous version since it looks at more sequences for the overrepresented sequence module. I've tested it with up to four 20million+ files open at the same time though and it was OK.

Can you let me know the exact command you are using to launch the program. If you're using the full java command you need to ensure that you add the -Xmx250m option to allocate a larger than default memory block to the program. If you use the fastqc wrapper then this should be added automatically.
simonandrews is offline   Reply With Quote
Old 06-21-2010, 12:16 AM   #58
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 869
Default

Quote:
Originally Posted by agc View Post
I'd like to run FastQC on SOLiD reads. I saw that someone did this using solid2fastq. Is it possible to do it without running solid2fastq? IE, would it work with only the SOLiD 'quals' file?
It will work with colorspace fastq files - you don't need to convert to base calls. I don't work with SOLID data directly so I'm not sure whether this is produced directly by the pipeline or not. I'm happy to look at other alternatives for SOLID data, but the program is fairly tied to fastq format (ie needs to work with a sequence and an encoded quality string).

Quote:
Originally Posted by agc View Post
EDIT: After running FastQC on SOLiD files converted to fastq files via solid2fastq, the results file says (under basic statistics):
File type Conventional base calls

Should it have recognized it as colorspace?
It depends on the conversion. If you look in the file you'll either see conventional base calls (something like GATCTCTAGATCTCT) or colorspace calls (something like G1324132431432434312). If you see colorspace calls and the report says conventional calls then can you send me the top few lines of the file and I can see why it's going wrong. It may be that your conversion program converted to base calls already though.

It FastQC gets the file type wrong it's normally pretty obvious since most of the graphs will show very weird results.
simonandrews is offline   Reply With Quote
Old 06-21-2010, 12:20 AM   #59
mard
Member
 
Location: Melbourne

Join Date: Jan 2010
Posts: 21
Default

Quote:
Originally Posted by simonandrews View Post
The error is because the program ran out of memory. The new version will use a bit more memory than the previous version since it looks at more sequences for the overrepresented sequence module. I've tested it with up to four 20million+ files open at the same time though and it was OK.

Can you let me know the exact command you are using to launch the program. If you're using the full java command you need to ensure that you add the -Xmx250m option to allocate a larger than default memory block to the program. If you use the fastqc wrapper then this should be added automatically.

Thanks for the quick reply Simon.

The command I'm using is:

Code:
java -Xmx250m -classpath /Tools/FastQC/ uk.ac.bbsrc.babraham.FastQC.FastQCApplication sequence.fastq
and the sequence.fastq file I'm running it on is 2.9Gb (~17million 75bp reads)
mard is offline   Reply With Quote
Old 06-21-2010, 12:31 AM   #60
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 869
Default

Quote:
Originally Posted by mard View Post
Thanks for the quick reply Simon.

The command I'm using is:

Code:
java -Xmx250m -classpath /Tools/FastQC/ uk.ac.bbsrc.babraham.FastQC.FastQCApplication sequence.fastq
and the sequence.fastq file I'm running it on is 2.9Gb (~17million 75bp reads)
Maybe it's the longer sequence length which is causing the problem. Can you try changing the -Xmx250m to -Xmx500m and see if that works.
simonandrews is offline   Reply With Quote
Reply

Tags
fastq, quality, report

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:27 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO