![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sam file smaller than fastq | scami | Bioinformatics | 6 | 10-01-2015 06:25 AM |
Read in mutiple gzipped FastQ files using R | Nicolas_15 | Bioinformatics | 4 | 09-04-2015 02:47 PM |
fastx quality trimmer and gzipped fastq | balsampoplar | Bioinformatics | 4 | 03-10-2014 07:53 AM |
Script for breaking large .fa files into smaller files of [N] sequences | lac302 | Bioinformatics | 3 | 02-21-2014 05:49 PM |
Split fastq into smaller files | lorendarith | Bioinformatics | 10 | 12-13-2012 05:28 AM |
![]() |
|
Thread Tools |
![]() |
#101 |
Junior Member
Location: Ames, IA Join Date: Dec 2014
Posts: 9
|
![]()
So I resubmitted the job on a node with 40 processors and 1TB of memory and I received two very similar exceptions and the job is hanging again.
Exception in thread "Thread-147" java.lang.AssertionError at clump.KmerSort3$FetchThread3.fetchNext_inner(KmerSort3.java:706) at clump.KmerSort3$FetchThread3.fetchNext(KmerSort3.java:655) at clump.KmerSort3$FetchThread3.run(KmerSort3.java:577) -- Exception in thread "Thread-146" java.lang.AssertionError at clump.KmerSort3$FetchThread3.fetchNext_inner(KmerSort3.java:706) at clump.KmerSort3$FetchThread3.fetchNext(KmerSort3.java:655) at clump.KmerSort3$FetchThread3.run(KmerSort3.java:577) |
![]() |
![]() |
![]() |
#102 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,081
|
![]()
Can you provide the exact command line you are using? Is this being submitted via a job scheduler?
|
![]() |
![]() |
![]() |
#103 |
Junior Member
Location: Ames, IA Join Date: Dec 2014
Posts: 9
|
![]()
It is submitted to a SLURM queue via the attached script.
These reads are a collection of concatenated interleaved paired end libraries The same script worked well on the individual libraries, but I wanted to do an assembly with all of the reads together so I concatenated them all with Code:
cat *fq.gz > ALL.fq.gz Code:
clumpify.sh in=ALL_temp.fq.gz out=ALL.eccc.fq.gz ecc passes=4 reorder bbmerge plows through these reads with no complaints just prior to clumpify Code:
bbmerge.sh in=ALL_temp.fq.gz out=ALL.ecco.fq.gz ecco mix vstrict ordered ihist=ALL_ihist_merge1.txt |
![]() |
![]() |
![]() |
#104 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,081
|
![]()
I think you should follow the order of tools that Brian has in his script example. Do clumpify job first. Since you are merging the reads first I am going to speculate that clumpify is unable to identify duplicates properly. If your data in not from a patterned flowcell you could remove the "optical" flag for clumpify.
|
![]() |
![]() |
![]() |
#105 |
Junior Member
Location: Ames, IA Join Date: Dec 2014
Posts: 9
|
![]()
Thank you for the quick advice. I had attempted to merge many samples together at the front end of the pipeline so that I could to all the QC and error correction at once. My problem was fixed when I did QC and error correction on each sample individually and then merged for a co-assembly.
Thanks again. |
![]() |
![]() |
![]() |
#106 |
Junior Member
Location: Europe Join Date: Feb 2019
Posts: 4
|
![]()
Hi all,
I was wondering why the default for spantiles is set to false. If a read for instance has coordinates (1000,1000) and the dupedist is set to 2500, (see sketch attached), there's a possible overlap with 3 other tiles. So even if it's not a NextSeq, but a HiSeq4000 for instance, there are no tile-edge duplicates, however there's still a possibility that optical duplicates end up on neighboring tiles (or even further). Can anyone elucidate on this? Thanks in advance! Attachment: The dot represents the "original read", the circle represents the distance of 2500 around the "original read". Rectangles represent tiles. Last edited by DCZ; 05-23-2019 at 08:27 AM. |
![]() |
![]() |
![]() |
#107 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,081
|
![]()
Illumina's software pre-processing takes care of clusters that may be showing mixed signals etc so they may never pass that step. Spantiles=t is mainly for nextSeq, where the clusters are hugh (relatively) and as a result there is a chance they will cross tiles. I believe this was done based on empirical observation Brian had done when he was developing clumpify.
|
![]() |
![]() |
![]() |
#108 |
Junior Member
Location: Europe Join Date: Feb 2019
Posts: 4
|
![]()
Thanks for your reply. I'm still confused though. Just like there can be empty wells on the same tile, there can also be empty wells on neighboring tiles (correct me if i'm wrong). I suppose these wells would not show a mixed signal but would just get filled with a duplicate in the same way as the optical duplicates get formed on the same tile.
|
![]() |
![]() |
![]() |
#109 |
Junior Member
Location: Nebraska Join Date: May 2017
Posts: 5
|
![]()
Hi, I've been using clumpify for sometime now. Thanks!
Seem to have encountered a strange and unexpected result. pigz -dc test.fna.gz | grep "^>" | wc -l #4149 ~/bbmap/clumpify.sh in=test.fna.gz out=test_dd.fna.gz dedupe subs=0 #Version 38.51 #Read Estimate: 352386 ... #Reads In: 2 #Clumps Formed: 2 #Duplicates Found: 0 #Reads Out: 2 ... pigz -dc test_dd.fna.gz | grep "^>" | wc -l #2 Any idea what might have happened? |
![]() |
![]() |
![]() |
#110 |
Junior Member
Location: Nebraska Join Date: May 2017
Posts: 5
|
![]()
Looks like everything went fine after I 'unwrapped' the input fasta.
|
![]() |
![]() |
![]() |
Tags |
bbduk, bbmap, bbmerge, clumpify, compression, pigz, reformat, tadpole |
Thread Tools | |
|
|