SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
quality control from fastq to vcf dongshenglulv Bioinformatics 3 11-05-2014 02:08 PM
Quality control of genomic resequencing data from a HiSeq gavin.oliver Genomic Resequencing 2 06-30-2013 01:48 AM
Webinar on Quality Control of NGS Data - FREE Strand SI Events / Conferences 0 09-09-2011 06:33 PM
TileQC: a system for tile-based quality control of Solexa data ScottC Illumina/Solexa 0 06-03-2008 04:54 PM
PubMed: TileQC: a system for tile-based quality control of Solexa data. Newsbot! Literature Watch 0 05-30-2008 08:21 AM

Reply
 
Thread Tools
Old 08-07-2015, 10:12 AM   #341
StefKaes
Junior Member
 
Location: Germany

Join Date: Aug 2015
Posts: 2
Default

Hi everyone,

I'm using the latest version of fastQC to examine simulated RNA-Seq data. The problem is, because it's simulated data there is no tile position in the header and I get a warning or error (depending on the read length).

So I know there should be the option of setting "tile" on ignore in the limits.txt file. But no matter how I try to include my adjusted limits file (for example adjusting the file in the original folder or including an adjusted copy of the limits file somewhere else via the "-l" argument), I'm still reproducing the same output: A warning about the per-tile qualities for short reads, and the out of memory error for longer reads.
Quote:
Too many tiles (>500) so giving up trying to do per-tile qualities since we're probably parsing the file wrongly
Quote:
Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at java.io.BufferedReader.readLine(BufferedReader.java:349)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:175)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:125)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
at java.lang.Thread.run(Thread.java:744)
Which I interpret as per-tile quality still being executed (at least until fastQC realises that it won't work).


I am using v0.11.3. and have no other version on my systems. Nevertheless, I tried setting "adapter" on ignore instead of "tile", to check if there was still the mix up with the parameters:
Quote:
Aaargh - I'd forgotten that one of the other pending fixes for the next release was that the disable didn't work for the per-tile module (it will actually disable it if you turn of the adapter module as it was reading the wrong parameter)
That worked, meaning the adapter module was turned off as it should be. So now I have to ask: Is it possible that in the latest version fastQC it is now not anymore reading the parameter belonging to the adapter module when it comes to the tile module, but that it is still somehow not reading the tile module parameter?

As my data is simulated and not uploaded anywhere I cannot post a link. But as this is a matter of whether the tile module is being executed or not it should be reproduceable with any fastq data.

I would be very glad if you could tell me if you can confirm this observation concerning the attempt to turn of the tile module, or explain how I should use the limits file in the correct manner if my incorrent use is causing this problem.
Thanks in advance!
StefKaes is offline   Reply With Quote
Old 08-11-2015, 03:06 AM   #342
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

I've had a look at this and can kind of confirm what you're seeing. Turning on the ignore flag for the per-tile module does now exclude that module from appearing in the report, however it wasn't stopping the statistics from being collected which is why you were seeing the same problem even having disabled it.

I've modified the code so that the module shouldn't collect any stats which should fix your problem. Can you please try out the development snapshot below and see if that does what you need (let me know if you need an OSX version).

http://www.bioinformatics.babraham.a...11.4_devel.zip

Cheers

Simon.
simonandrews is offline   Reply With Quote
Old 08-11-2015, 05:15 AM   #343
StefKaes
Junior Member
 
Location: Germany

Join Date: Aug 2015
Posts: 2
Default

Thanks for having a look!

Ok, I wouldn't know if the tile module was included in the report or not, because I would always get a warning and no tile report because of my headers

As far as I can see by now, the snapshot you posted seems to be doing exactly what it's supposed to do.
I'm neither getting a warning nor an error when I run it with "tile" set on ignore and all the other modules are still in the report.

(Sorry, in my last post I forgot to mention I'm running FastQC on a Linux system, so no, I don't need the OSX version.)
StefKaes is offline   Reply With Quote
Old 09-28-2015, 06:40 AM   #344
sridharacharya
Member
 
Location: Institute, WV

Join Date: May 2010
Posts: 24
Default

Hi Simon,

Is there a way in fastqc to turn on reporting for every position rather than the default, 5 bp window for some analyses like the "Per base seqeunce quality"?

Thanks,
Sridhar
sridharacharya is offline   Reply With Quote
Old 09-28-2015, 06:41 AM   #345
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Quote:
Originally Posted by sridharacharya View Post
Hi Simon,

Is there a way in fastqc to turn on reporting for every position rather than the default, 5 bp window for some analyses like the "Per base seqeunce quality"?

Thanks,
Sridhar
Yes, add the option "--nogroup" to your command line.
kmcarr is offline   Reply With Quote
Old 09-28-2015, 06:47 AM   #346
sridharacharya
Member
 
Location: Institute, WV

Join Date: May 2010
Posts: 24
Default

Thanks kmcarr!
sridharacharya is offline   Reply With Quote
Old 11-27-2015, 03:39 AM   #347
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

@frymor's question is also posted (and possibly answered) on Biostars: https://www.biostars.org/p/167555/
GenoMax is offline   Reply With Quote
Old 11-27-2015, 04:03 AM   #348
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

@GenoMax, good catch, I missed that it was posted here too.
dpryan is offline   Reply With Quote
Old 02-09-2016, 12:10 AM   #349
daanum
Member
 
Location: nz

Join Date: Nov 2015
Posts: 24
Default

Hi ,

I am using a Linux GNU server. How can I view the.html files which are generated as a result of fastqc, on the linux server?
Thank you.
daanum is offline   Reply With Quote
Old 02-09-2016, 12:51 AM   #350
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Quote:
Originally Posted by daanum View Post
Hi ,

I am using a Linux GNU server. How can I view the.html files which are generated as a result of fastqc, on the linux server?
Thank you.
You don't have to view them on the server. The files are self contained and can be transferred to local desktop PC/Mac for stand-alone examination. That said you can use any browser available on the server to view them. On some clusters/servers admins insist on not installing browsers so downloading them locally may be the best option.
GenoMax is offline   Reply With Quote
Old 02-09-2016, 01:05 AM   #351
dpryan
Devon Ryan
 
Location: Freiburg, Germany

Join Date: Jul 2011
Posts: 3,480
Default

Quote:
Originally Posted by GenoMax View Post
You don't have to view them on the server. The files are self contained and can be transferred to local desktop PC/Mac for stand-alone examination.
Indeed, probably the single nicest feature about FastQC is that the html files have the images embedded, making it really convenient to do this.
dpryan is offline   Reply With Quote
Old 04-07-2016, 06:40 AM   #352
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

Hello Simon & all,

first of all, I'd like to thank you for a fantastic tool - this is truly the most important tool for the most important step in whole NGS process, and it does its job fantastically well.

second, I would like to ask if there is a collection of people's failed (or peculiar) results, obtained with FastQC and later explained. I've already seen https://sequencing.qcfail.com and I'm studying it right now, but it also seems that everybody would benefit from a wiki-like resource, where everybody can contribute (and discuss) the results. What do you think?

Finally, I was curious about running Fastqc on IonTorrent results. It concerns me that the reads are all of different lengths & I never quite took the time to understand the exact math used in various Fastqc metrics (such as k-mer and overrepresented sequences evaluation). Thus if anything pops to anyone's mind about what sort of things to expect, what extra option to use, or what to do differently, I would greatly appreciate it.
apredeus is offline   Reply With Quote
Old 04-08-2016, 01:21 AM   #353
simonandrews
Simon Andrews
 
Location: Babraham Inst, Cambridge, UK

Join Date: May 2009
Posts: 871
Default

Quote:
Originally Posted by apredeus View Post
Hello Simon & all,

first of all, I'd like to thank you for a fantastic tool - this is truly the most important tool for the most important step in whole NGS process, and it does its job fantastically well.

second, I would like to ask if there is a collection of people's failed (or peculiar) results, obtained with FastQC and later explained. I've already seen https://sequencing.qcfail.com and I'm studying it right now, but it also seems that everybody would benefit from a wiki-like resource, where everybody can contribute (and discuss) the results. What do you think?

Finally, I was curious about running Fastqc on IonTorrent results. It concerns me that the reads are all of different lengths & I never quite took the time to understand the exact math used in various Fastqc metrics (such as k-mer and overrepresented sequences evaluation). Thus if anything pops to anyone's mind about what sort of things to expect, what extra option to use, or what to do differently, I would greatly appreciate it.
QCFail is our attempt to provide some kind of collation of the types of failures we see in sequencing experiements, not just stuff you could spot with FastQC, but all through the analysis pipeline. We discussed a lot of different options for how to handle this (wikis, databases and various automated systems), but in the end since the number of failure modes is relatively limited we decided that the sort of tagged blog format + search was the most useful starting point. It may be that once we've got the system populated that we might look to find ways to direct people to relevant articles more easily so the information is more readily accessible when you see your problematic data, but that's something to work out in the future (unless anyone else fancies having a go!).

For your Ion Torrent data you shouldn't need to do anything different to work with that. For the overrepresented sequences we only take up to the first 50bp of each read anyway as we're using an exact matching strategy to count duplicates, so if you allow longer lengths then your results get increasingly messed up by mis-calls, and 50bp is normally enough to establish that it's the same sequence.
simonandrews is offline   Reply With Quote
Old 04-08-2016, 04:39 AM   #354
apredeus
Senior Member
 
Location: Bioinformatics Institute, SPb

Join Date: Jul 2012
Posts: 151
Default

Simon, thank you for your answer.

I will try to popularize the QCFail among our sequencing and bioinformatics community, so they would consider contributing some interesting cases as well.
apredeus is offline   Reply With Quote
Old 06-02-2017, 04:38 AM   #355
Roxana
Junior Member
 
Location: Leicester

Join Date: May 2017
Posts: 3
Default fastQC

Quote:
Originally Posted by simonandrews View Post
Aaargh - I'd forgotten that one of the other pending fixes for the next release was that the disable didn't work for the per-tile module (it will actually disable it if you turn of the adapter module as it was reading the wrong parameter).

I've just put up a development snapshot at http://www.bioinformatics.babraham.a...11.3_devel.zip which contains the fix for both of these issues. You should be able to use that to process these files.
Dear Simon, I have the same problem with my fastq file (from nanopore sequencing). Please, could you provide the link for download the fastQC version0.11.3 for Mac? Many thanks!!!
Roxana is offline   Reply With Quote
Old 06-02-2017, 05:02 AM   #356
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 7,077
Default

Quote:
Originally Posted by Roxana View Post
Dear Simon, I have the same problem with my fastq file (from nanopore sequencing). Please, could you provide the link for download the fastQC version0.11.3 for Mac? Many thanks!!!
Currently available version is 0.11.5 (newer than this one) so it should have the fix in it. Can you try downloading that?
GenoMax is offline   Reply With Quote
Old 06-02-2017, 05:40 AM   #357
Roxana
Junior Member
 
Location: Leicester

Join Date: May 2017
Posts: 3
Default

Dear genoMax,
I tried to running in the version 0.11.5, but it is not working with my fastq file (from nanopore sequencing). Please, see below the error:
Code:
[rzz1@spectre14 ~]$ Picked up JAVA_TOOL_OPTIONS: -XX:MaxHeapSize=2048m
Exception in thread "Thread-1" java.lang.OutOfMemoryError: Java heap space
	at uk.ac.babraham.FastQC.Utilities.QualityCount.<init>(QualityCount.java:33)
	at uk.ac.babraham.FastQC.Modules.PerBaseQualityScores.processSequence(PerBaseQualityScores.java:141)
	at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:88)
	at java.lang.Thread.run(Thread.java:748)
my fastq file have the following structure:
Code:
@9157abfa-c2e8-4d54-a323-11162850dcbf runid=4243c40e101f8b1c6aa3337f2ef28b72eef6091c read=5 ch=78 start_time=2017-05-19T15:31:04Z
ATTTATGTTCTTGGCCCCCACACATTGTGGCCCCCATTGTTGTGTGTGTGTTATTTGACCCTTGTATTTGTATTGTTATTGTGTTATTGTTGGCCCCATTATTGTGTGTGTATTATTGTTATTTGACCCATTGTTGTATTGTGTGTGACTTGTGTGTGTGTATTATTGTTATTGTGTATTATTTGTGTGTGTGTGTGTGTGCTTGTTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTTCTCTATACTCTATATATATATACCTATATACTCTTCTTCTTCTTCTCTTCTTTCTCTCTTATGTCTTTCTCTTCTCTTCTTCTTATACCACTTCTTCTTCTATTCTAATAAATTAGGATGGGAGGAGGATGGATGGGTTCAAGGATAGTTCAATGAATACAAGGAGAGGAAAAAGGATC
+
$''$#&$&'$''""$$$$$#$#%%)+&)&%&(())'..')(%)&'$*&)#&+$(*('$%(')*%*"')+#'#*,$,*$*)%&%()#,(&1+('()('(+4'/*()$'#%#'#'+(./'.+$('(((,,,)08*,(#+',,(*'*'+*')0.)''*')%)%'"&&%(*#&'%*'%%#)#()).>-)++,*&(,,,+))'(%$&--&+.)+**)*+,,-++*-./22+,(,(-)*'.-.*)(&%,(%$),'($%%)%&%&)"*#'$*&&%%#$#%%%$'$%''*)&(,%('%(+&'$$&"$#%%&%&$(%$'"($&%*$'$''$'$'(%(+%($&%%&%%%&($'-$(+%&$%&#%%%%$&%##$%%$#$%$#$#$$##$####$$$#$##$$#$%$"$&#%$#"%&#%#%#$$&%&$%%%%&&%&$'#
This fastq file is a bit different from Illumina, and I am thinking that this fastq is not working in the last version of fastQC.
Many thanks for any advice.
Roxana

Last edited by GenoMax; 06-02-2017 at 06:11 AM. Reason: Added [CODE] tags
Roxana is offline   Reply With Quote
Old 06-02-2017, 05:59 AM   #358
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 32
Default

Hi Roxana,

We've been debugging the same error earlier this afternoon, also with nanopore data. The error is because FastQC is running out of memory due to the long reads (I presume).

You can allocate more memory for FastQC by increasing the number of threads - it gets 250MB memory per thread. Our data worked when we ran with four threads:

Code:
fastqc -t 4 input.fastq
Phil
ewels is offline   Reply With Quote
Old 06-02-2017, 07:04 AM   #359
Roxana
Junior Member
 
Location: Leicester

Join Date: May 2017
Posts: 3
Default

Dear Ewels,
Many thanks for your advice. I used the following code and working very well. I increased the number of threads until 25.

code
fastqc -t 25 input.fastq

Roxana
Roxana is offline   Reply With Quote
Old 06-02-2017, 07:05 AM   #360
ewels
Phil Ewels
 
Location: SciLifeLab, Stockholm, Sweden

Join Date: Mar 2011
Posts: 32
Default

Great, glad it worked!

Phil
ewels is offline   Reply With Quote
Reply

Tags
fastq, quality, report

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:28 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO