You're the man! Thank you so much. The command line and --threads 8 really helps for running multiple samples and so much faster both in setup and run time than clicking through with the interactive mode.
Unconfigured Ad
Collapse
This topic is closed.
X
X
-
Hello,
I get the following error trying to run Fastqc (v 0.11.2) on some of my files:
I had this problem with v0.11.1, thought updating would fix as memory issues were mentioned in the release notes, but I'm still getting the same problem. The files are not unusually large (around 2GB gzipped), and other files of similar size have been fine. Any ideas?fastqc --outdir Fastqc/ --noextract ctcf.cont.fq
Started analysis of ctcf.cont.fq
Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space
at uk.ac.babraham.FastQC.Utilities.QualityCount.<init>(QualityCount.java:13)
at uk.ac.babraham.FastQC.Modules.PerTileQualityScores.processSequence(PerTileQualityScores.java:258)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:88)
at java.lang.Thread.run(Thread.java:662)
I can't figure out how to run Fastqc so that I can specify the memory (I don't really know anything about java). I've tried various things I found in the thread archives, along the lines of the command below, but get errors along the lines of "Could not find the main class"
java -Xmx500m -cp /path/to/FastQC
Comment
-
-
The most likely cause of this unless your sequence file is really odd is that for some reason the program is trying to read the whole file as a single line. We've seen this happen when we have a fastq file with mac line endings (\r) which is then read on a linux host. The linux host doesn't recognise the end of line and reads everything in at once and dies. If this is the case then messing around with memory settings won't help. The only immediate fix would be to uncompress the file and run mac2unix [filename] to fix the line endings.Originally posted by liz_is View PostHello,
I can't figure out how to run Fastqc so that I can specify the memory (I don't really know anything about java). I've tried various things I found in the thread archives, along the lines of the command below, but get errors along the lines of "Could not find the main class"
I guess odd things could also happen if you had some really long sequences, but they would have to be *very* long to cause problems.
Could the line endings thing be what's happening in your case?
Comment
-
-
I just tried unzipping a couple of the files and converting the line endings using mac2unix, and I get the same error for one of them. The other gives a different but presumably related error:
This is data from a published paper and other fastq files from the same paper have worked fine...Code:fastqc --outdir Fastqc/ --noextract ctcf.chip.fq Started analysis of ctcf.chip.fq Exception in thread "Thread-2" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.String.toCharArray(String.java:2725)
I have just noticed that for these two files, at least at the top of the file, the records have quality scores that are all "B". I checked another file that did work, and that has more varied quality scores. This suggests to me there might be another problem with the files themselves.
Edit: Update: my colleague tried with v0.10.1 and it finished! There's a lot of poor-quality reads... So I guess I can use an older version but ideally I'd like to get this working.
I also tried with a subset of the reads - the head/tail 100,000 reads it runs fine, taking 1million it crashes ~20% of the way in. Taking 200,000 it says "Analysis complete for test.fq" but then also prints errors.
Code:Approx 95% complete for test.fq Analysis complete for test.fq Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232) at java.lang.StringCoding.encode(StringCoding.java:272) at java.lang.StringCoding.encode(StringCoding.java:284) at java.lang.String.getBytes(String.java:986) at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:144) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:163) at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110) at java.lang.Thread.run(Thread.java:662)Last edited by liz_is; 10-01-2014, 06:33 AM.
Comment
-
-
The errors being different isn't really a surprise, it's running out of memory and the exact operation which triggers that might be different in different cases. If it's happening with 100k reads then something really weird is going on.Originally posted by liz_is View PostI just tried unzipping a couple of the files and converting the line endings using mac2unix, and I get the same error for one of them. The other gives a different but presumably related error:
This is data from a published paper and other fastq files from the same paper have worked fine...Code:fastqc --outdir Fastqc/ --noextract ctcf.chip.fq Started analysis of ctcf.chip.fq Exception in thread "Thread-2" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.String.toCharArray(String.java:2725)
I have just noticed that for these two files, at least at the top of the file, the records have quality scores that are all "B". I checked another file that did work, and that has more varied quality scores. This suggests to me there might be another problem with the files themselves.
Edit: Update: my colleague tried with v0.10.1 and it finished! There's a lot of poor-quality reads... So I guess I can use an older version but ideally I'd like to get this working.
I also tried with a subset of the reads - the head/tail 100,000 reads it runs fine, taking 1million it crashes ~20% of the way in. Taking 200,000 it says "Analysis complete for test.fq" but then also prints errors.
Code:Approx 95% complete for test.fq Analysis complete for test.fq Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232) at java.lang.StringCoding.encode(StringCoding.java:272) at java.lang.StringCoding.encode(StringCoding.java:284) at java.lang.String.getBytes(String.java:986) at uk.ac.babraham.FastQC.Report.HTMLReportArchive.<init>(HTMLReportArchive.java:144) at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:163) at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110) at java.lang.Thread.run(Thread.java:662)
Could you possibly put a file which triggers this somewhere I can see it? If I can have a look at the data which causes this I stand a better chance of getting to the bottom of it. If you don't have a site you can upload to then drop me a mail to [email protected] and I'll send you login details for an FTP server you can push to.
Comment
-
-
The data is available on ENA here: http://www.ebi.ac.uk/ena/data/view/PRJEB3073
The first couple of files (which are the CTCF chip and input) are examples of files which are giving these errors. Some of the other files in this dataset work fine though, e.g the scc2 chip.
Thanks!
Comment
-
-
That's great - I managed to download that and could reproduce the error on our cluster.Originally posted by liz_is View PostThe data is available on ENA here: http://www.ebi.ac.uk/ena/data/view/PRJEB3073
The first couple of files (which are the CTCF chip and input) are examples of files which are giving these errors. Some of the other files in this dataset work fine though, e.g the scc2 chip.
Thanks!
I'll have a look now to see if I can find anything obvious, but unfortunately I'm away from the office for the rest of this week so I might not get to the bottom of this until next week when I can do some proper profiling to figure out what's going wrong on this data.
Comment
-
-
Hi Simon,
Can u please explain FastQC tile report in more detail?
I found this page:
I am not able to understand the meaning of
"This module will issue a warning if any tile shows a mean Phred score more than 2 less than the mean for that base across all tile"
What is the meaning of "mean Phred score more than 2 less than the mean for that base across all tile "?
Kindly help me out.
Thanks
Comment
-
-
It means that it's looking for cases where one tile looks much worse than the other tiles on the flowcell lane for a given sequencing chemistry cycle. If you had a cycle where the average phred score across the whole flowcell was 20, but on one particular tile the average phred score was only 17 then this tile would be flagged up.Originally posted by srikant_verma View PostHi Simon,
Can u please explain FastQC tile report in more detail?
I found this page:
I am not able to understand the meaning of
"This module will issue a warning if any tile shows a mean Phred score more than 2 less than the mean for that base across all tile"
What is the meaning of "mean Phred score more than 2 less than the mean for that base across all tile "?
Kindly help me out.
Thanks
The idea is that it shouldn't matter if the whole flowcell is good or bad, but all of the tiles should look roughly the same. If one is worse than the rest then this indicates that there is a specific problem which might need to be looked at.
Comment
-
-
Hi Liz - Sorry for taking a while to have a proper look at this, other things have been getting in the way. I've tracked down the problem and it's the per-tile quality module which was causing the runaway memory usage (which is why it worked in the old version since that module wasn't there).Originally posted by liz_is View PostThe data is available on ENA here: http://www.ebi.ac.uk/ena/data/view/PRJEB3073
The first couple of files (which are the CTCF chip and input) are examples of files which are giving these errors. Some of the other files in this dataset work fine though, e.g the scc2 chip.
Thanks!
The problem seems to be that these files use a variant of the Illumina header format, which is close enough to the ones we've seen before that the program tries to parse it, but then the field it extracts for the tile number is wrong and it predicts an enormous number of tiles, which makes everything die!
The formats we've seen before are either:
..where the 4th field is the tile, orCode:@HWI-1KL136:211:D1LGAACXX:1:1101:18518:48851 3:N:0:ATGTCA
..where the second field is the tile.Code:@HWUSI-EAS493_0001:2:1:1000:16900#0/1
The ids in the file you found looked like:
..where the format should be like my second example, except that the # and / have been replaced by :, which makes FastQC treat it like the first variant and pull out the wrong field.Code:@HWI-EAS212_1:8:1:4130:3711:0:1
The quick fix is that if you edit your limits.conf file in your fastqc installation (in the Configuration directory) you can turn off the per-tile quality module and you should be able to process these files.
Does anyone here know if this format is something which is actually generated by an Illumina sequencer, or is it something an individual or maybe the ENA have done to the file? I can add a quick fix to just abandon the module if too many tiles are predicted, but if this is a format which might be more generally about then I should try to cope with this properly.
Cheers
Simon.Last edited by simonandrews; 10-09-2014, 04:49 AM. Reason: Added code tags to remove smilies from illumina ids!
Comment
-
-
Thanks for the reply.
I've tried what you suggested but it doesn't help! I've tried both specifying a limits file using --limits and editing 'limits.txt' in the Configuration directory of the installed FastQC to include the lineI think that the change in the configuration isn't working to stop the per tile module being used, as the error message still makes reference to it:Code:tile ignore 1
Code:Started analysis of ctcf.cont.fq Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC overhead limit exceeded at uk.ac.babraham.FastQC.Modules.PerTileQualityScores.processSequence(PerTileQualityScores.java:258) at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:88) at java.lang.Thread.run(Thread.java:745)
Comment
-
-
Aaargh - I'd forgotten that one of the other pending fixes for the next release was that the disable didn't work for the per-tile module (it will actually disable it if you turn of the adapter module as it was reading the wrong parameter).Originally posted by liz_is View PostThanks for the reply.
I've tried what you suggested but it doesn't help! I've tried both specifying a limits file using --limits and editing 'limits.txt' in the Configuration directory of the installed FastQC to include the lineCode:tile ignore 1
I've just put up a development snapshot at http://www.bioinformatics.babraham.a...11.3_devel.zip which contains the fix for both of these issues. You should be able to use that to process these files.
Comment
-
-
Kmer overrepresentation and per base sequence content in Nextera XT libraries
Hi all,
After reading around on the forums and elsewhere on the internet, it seems like seeing weird results for Kmer overrepresentation and per base sequence content after running FastQC on Nextera XT libraries is common.
The data I have here are sequencing data (MiSeq V3, 300 bp reads) of mitochondrial genomes from wheat. The Nextera XT libraries were prepared from purified organellar DNA (~450 kb genome) so the coverage is really high (~400X after trimming).
The files with the no_trim_prefix are the raw data. You can see that the "per base sequence content" looks weird for the first few bases. Also, the Kmer content is high in the first few bases. I have tried blasting these sequences and get no hits. The "Sequence Duplication Levels" are high most likely because of the high coverage of a small genome. I suspect this because another library I sequenced has only 60X coverage and the duplication levels are fine.
The files with the trim_prefix are the trimmed data. The data were quality and length trimmed (min. length 250 bp) with Trimmomatic. Unfortunately the trimming did not make a difference in the per base content or the Kmer overrepresentation.
My question is, will this matter for mapping and assembly? I plan on mapping these reads to already available mitochondrial genomes, as well as performing de novo assembly with Geneious.
Thanks in advance for any suggestions you all may have!
Comment
-
Latest Articles
Collapse
-
by SEQadmin2
Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.
The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...-
Channel: Articles
Yesterday, 10:05 AM -
-
by SEQadmin2
With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.
Introduction
Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...-
Channel: Articles
05-22-2026, 06:42 AM -
-
by SEQadmin2
Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.
Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...-
Channel: Articles
05-06-2026, 09:04 AM -
ad_right_rmr
Collapse
News
Collapse
| Topics | Statistics | Last Post | ||
|---|---|---|---|---|
|
Started by SEQadmin2, Yesterday, 12:03 PM
|
0 responses
19 views
0 reactions
|
Last Post
by SEQadmin2
Yesterday, 12:03 PM
|
||
|
Started by SEQadmin2, Yesterday, 11:40 AM
|
0 responses
14 views
0 reactions
|
Last Post
by SEQadmin2
Yesterday, 11:40 AM
|
||
|
Started by SEQadmin2, 05-28-2026, 11:40 AM
|
0 responses
29 views
0 reactions
|
Last Post
by SEQadmin2
05-28-2026, 11:40 AM
|
||
|
Started by SEQadmin2, 05-26-2026, 10:12 AM
|
0 responses
31 views
0 reactions
|
Last Post
by SEQadmin2
05-26-2026, 10:12 AM
|
Comment