Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Chief_Lazy_Bison
    Junior Member
    • Dec 2014
    • 9

    Thank you for the quick advice. I had attempted to merge many samples together at the front end of the pipeline so that I could to all the QC and error correction at once. My problem was fixed when I did QC and error correction on each sample individually and then merged for a co-assembly.

    Thanks again.

    Comment

    • DCZ
      Junior Member
      • Feb 2019
      • 4

      Hi all,

      I was wondering why the default for spantiles is set to false. If a read for instance has coordinates (1000,1000) and the dupedist is set to 2500, (see sketch attached), there's a possible overlap with 3 other tiles. So even if it's not a NextSeq, but a HiSeq4000 for instance, there are no tile-edge duplicates, however there's still a possibility that optical duplicates end up on neighboring tiles (or even further). Can anyone elucidate on this?

      Thanks in advance!

      Attachment: The dot represents the "original read", the circle represents the distance of 2500 around the "original read". Rectangles represent tiles.
      Attached Files
      Last edited by DCZ; 05-23-2019, 07:27 AM.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        Illumina's software pre-processing takes care of clusters that may be showing mixed signals etc so they may never pass that step. Spantiles=t is mainly for nextSeq, where the clusters are hugh (relatively) and as a result there is a chance they will cross tiles. I believe this was done based on empirical observation Brian had done when he was developing clumpify.

        Comment

        • DCZ
          Junior Member
          • Feb 2019
          • 4

          Thanks for your reply. I'm still confused though. Just like there can be empty wells on the same tile, there can also be empty wells on neighboring tiles (correct me if i'm wrong). I suppose these wells would not show a mixed signal but would just get filled with a duplicate in the same way as the optical duplicates get formed on the same tile.

          Comment

          • phylloxera
            Junior Member
            • May 2017
            • 5

            Hi, I've been using clumpify for sometime now. Thanks!
            Seem to have encountered a strange and unexpected result.
            pigz -dc test.fna.gz | grep "^>" | wc -l #4149
            ~/bbmap/clumpify.sh in=test.fna.gz out=test_dd.fna.gz dedupe subs=0
            #Version 38.51
            #Read Estimate: 352386
            ...
            #Reads In: 2
            #Clumps Formed: 2
            #Duplicates Found: 0
            #Reads Out: 2
            ...
            pigz -dc test_dd.fna.gz | grep "^>" | wc -l #2

            Any idea what might have happened?

            Comment

            • phylloxera
              Junior Member
              • May 2017
              • 5

              Looks like everything went fine after I 'unwrapped' the input fasta.

              Comment

              • stevekm
                Junior Member
                • Nov 2015
                • 1

                Is there any method available to run Clumpify directly from within another program? Such as a library that could be imported? I saw that the main Clumpify program is written in Java, however, I am not a Java programmer. Not sure what other options there might be if I want my own custom program, which outputs fastq data, to pass the output directly to Clumpify, especially considering the handling the paired-end files.

                Comment

                • duartemolha
                  Junior Member
                  • Mar 2011
                  • 2

                  I am having problems using clumpify with my fastqs and I beleive it is related to the UMI on the header of the fastq reads

                  Here is a read from my read1 fastq:

                  @VL00773:6:AAFVNLMM5:1:1101:21412:1000:CTGGTGGTT 1:N:0:ACTCTCGA+CTGTACCA
                  GTGGGCACTAGCATACTTCCCAAGCTTGGGGTAGGGCAATATAGGCAAGTCGATCAAGCTTGCAGCTGACTCCCTTTGGGATCTTGGGCTTAACCTCCTTGGGCTTTACGAGGGCCTCGATAGCCTTGGCACGTGCACTCATGGCCTTGGC
                  +
                  CCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCC;CCCCCCCCCCCCCCCCCCCCCCCC​

                  if I remove the :CTGGTGGTT from the end of the header I can use clumpify

                  but with it there it just fails:


                  clumpify.sh in1=sample1_R1_001.fastq.gz in2=sample1_R2_001.fastq.gz out1=sample1_dedup_R1_001.fastq.gz out
                  2=sample1_dedup_R2_001.fastq.gz dedupe=t optical=t dupedist=40 spany=t t=1 -Xmx100g -Xms100g

                  openjdk version "1.8.0_112"
                  OpenJDK Runtime Environment (Zulu 8.19.0.1-linux64) (build 1.8.0_112-b16)
                  OpenJDK 64-Bit Server VM (Zulu 8.19.0.1-linux64) (build 25.112-b16, mixed mode)
                  java -ea -Xmx100g -Xms100g -cp .../bbtools/lib/current/ clump.Clumpify in1=sample1_R1_001.fastq.gz in2=sample1_R2_001.fastq.gz out1=sample1_dedup_R1_001.fastq.gz out out2=sample1_dedup_R2_001.fastq.gz out dedupe=t optical=t dupedist=40 spany=t t=1 -Xmx100g -Xms100g
                  Executing clump.Clumpify [in1=sample1_R1_001.fastq.gz, in2=sample1_R2_001.fastq.gz, out1=sample1_dedup_R1_001.fastq.gz, out
                  2=sample1_dedup_R2_001.fastq.gz​, dedupe=t, optical=t, dupedist=40, spany=t, t=1, -Xmx100g, -Xms100g]


                  Clumpify version 37.62
                  Read Estimate: 21805466
                  Memory Estimate: 16636 MB
                  Memory Available: 80430 MB
                  Set groups to 1
                  Executing clump.KmerSort [in1=sample1_R1_001.fastq.gz, in2=sample1_R2_001.fastq.gz, out1=sample1_dedup_R1_001.fastq.gz, out
                  2=sample1_dedup_R2_001.fastq.gz​, groups=1, ecco=false, rename=false, shortname=f, unpair=false, repair=false, namesort=false, ow=true, dedupe=t, t=1, -Xmx100g, -Xms100g]

                  Set threads to 1
                  Making comparator.
                  Made a comparator with k=31, seed=1, border=1, hashes=4
                  Starting cris 0.
                  Fetching reads.
                  Making fetch threads.
                  Starting threads.
                  Waiting for threads.
                  Exception in thread "Thread-3" java.lang.AssertionError: VL00773:7:AAFYLV7M5:1:1101:18648:1000:TAACCCATC 1:N:0:ACTCCATC+GATCAAGG
                  at hiseq.FlowcellCoordinate.setFrom(FlowcellCoordinate.java:92)
                  at clump.ReadKey.<init>(ReadKey.java:46)
                  at clump.ReadKey.<init>(ReadKey.java:33)
                  at clump.ReadKey.makeKey(ReadKey.java:23)
                  at clump.KmerComparator.hash(KmerComparator.java:73)
                  at clump.KmerComparator.hash(KmerComparator.java:66)
                  at clump.KmerSort$FetchThread.run(KmerSort.java:816)
                  Fetch time: 0.076 seconds.
                  Closing input stream.
                  Combining thread output.
                  Combine time: 0.000 seconds.
                  Exception in thread "main" java.lang.AssertionError: 0, 400, true
                  at clump.KmerSort.fetchReads(KmerSort.java:718)
                  at clump.KmerSort.processInner(KmerSort.java:400)
                  at clump.KmerSort.process(KmerSort.java:320)
                  at clump.KmerSort.main(KmerSort.java:51)
                  at clump.Clumpify.process(Clumpify.java:247)
                  at clump.Clumpify.main(Clumpify.java:37)




                  Anyone has any solution to make this work without having to loose all my UMI information?

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    Originally posted by duartemolha View Post
                    I am having problems using clumpify with my fastqs and I beleive it is related to the UMI on the header of the fastq reads

                    Here is a read from my read1 fastq:

                    @VL00773:6:AAFVNLMM5:1:1101:21412:1000:CTGGTGGTT 1:N:0:ACTCTCGA+CTGTACCA
                    GTGGGCACTAGCATACTTCCCAAGCTTGGGGTAGGGCAATATAGGCAAGTCGATCAAGCTTGCAGCTGACTCCCTTTGGGATCTTGGGCTTAACCTCCTTGGGCTTTACGAGGGCCTCGATAGCCTTGGCACGTGCACTCATGGCCTTGGC
                    +
                    CCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC;CCCC;CCCCCCCCCCCCCCCCCCCCCCCC​

                    if I remove the :CTGGTGGTT from the end of the header I can use clumpify

                    but with it there it just fails:


                    clumpify.sh in1=sample1_R1_001.fastq.gz in2=sample1_R2_001.fastq.gz out1=sample1_dedup_R1_001.fastq.gz out
                    2=sample1_dedup_R2_001.fastq.gz dedupe=t optical=t dupedist=40 spany=t t=1 -Xmx100g -Xms100g

                    openjdk version "1.8.0_112"
                    OpenJDK Runtime Environment (Zulu 8.19.0.1-linux64) (build 1.8.0_112-b16)
                    OpenJDK 64-Bit Server VM (Zulu 8.19.0.1-linux64) (build 25.112-b16, mixed mode)
                    java -ea -Xmx100g -Xms100g -cp .../bbtools/lib/current/ clump.Clumpify in1=sample1_R1_001.fastq.gz in2=sample1_R2_001.fastq.gz out1=sample1_dedup_R1_001.fastq.gz out out2=sample1_dedup_R2_001.fastq.gz out dedupe=t optical=t dupedist=40 spany=t t=1 -Xmx100g -Xms100g
                    Executing clump.Clumpify [in1=sample1_R1_001.fastq.gz, in2=sample1_R2_001.fastq.gz, out1=sample1_dedup_R1_001.fastq.gz, out
                    2=sample1_dedup_R2_001.fastq.gz​, dedupe=t, optical=t, dupedist=40, spany=t, t=1, -Xmx100g, -Xms100g]


                    Clumpify version 37.62
                    Read Estimate: 21805466
                    Memory Estimate: 16636 MB
                    Memory Available: 80430 MB
                    Set groups to 1
                    Executing clump.KmerSort [in1=sample1_R1_001.fastq.gz, in2=sample1_R2_001.fastq.gz, out1=sample1_dedup_R1_001.fastq.gz, out
                    2=sample1_dedup_R2_001.fastq.gz​, groups=1, ecco=false, rename=false, shortname=f, unpair=false, repair=false, namesort=false, ow=true, dedupe=t, t=1, -Xmx100g, -Xms100g]

                    Set threads to 1
                    Making comparator.
                    Made a comparator with k=31, seed=1, border=1, hashes=4
                    Starting cris 0.
                    Fetching reads.
                    Making fetch threads.
                    Starting threads.
                    Waiting for threads.
                    Exception in thread "Thread-3" java.lang.AssertionError: VL00773:7:AAFYLV7M5:1:1101:18648:1000:TAACCCATC 1:N:0:ACTCCATC+GATCAAGG
                    at hiseq.FlowcellCoordinate.setFrom(FlowcellCoordinate.java:92)
                    at clump.ReadKey.<init>(ReadKey.java:46)
                    at clump.ReadKey.<init>(ReadKey.java:33)
                    at clump.ReadKey.makeKey(ReadKey.java:23)
                    at clump.KmerComparator.hash(KmerComparator.java:73)
                    at clump.KmerComparator.hash(KmerComparator.java:66)
                    at clump.KmerSort$FetchThread.run(KmerSort.java:816)
                    Fetch time: 0.076 seconds.
                    Closing input stream.
                    Combining thread output.
                    Combine time: 0.000 seconds.
                    Exception in thread "main" java.lang.AssertionError: 0, 400, true
                    at clump.KmerSort.fetchReads(KmerSort.java:718)
                    at clump.KmerSort.processInner(KmerSort.java:400)
                    at clump.KmerSort.process(KmerSort.java:320)
                    at clump.KmerSort.main(KmerSort.java:51)
                    at clump.Clumpify.process(Clumpify.java:247)
                    at clump.Clumpify.main(Clumpify.java:37)




                    Anyone has any solution to make this work without having to loose all my UMI information?
                    I just did a brief test with the sample you included above. I did not have an issue with using clumpify with a couple of reads. So likely the issue lies someplace else and not in the UMI,

                    Comment

                    Latest Articles

                    Collapse

                    • GATTACAT
                      Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by GATTACAT
                      Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                      07-01-2026, 11:43 AM
                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 07-02-2026, 11:08 AM
                    0 responses
                    10 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-30-2026, 05:37 AM
                    0 responses
                    13 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-26-2026, 11:10 AM
                    0 responses
                    20 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    54 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...