![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Introducing Reformat, a fast read format converter | Brian Bushnell | Bioinformatics | 39 | 02-03-2021 11:35 AM |
Introducing BBNorm, a read normalization and error-correction tool | Brian Bushnell | Bioinformatics | 53 | 08-12-2020 12:51 PM |
Introducing BBMerge: A paired-end read merger | Brian Bushnell | Bioinformatics | 132 | 06-19-2020 03:15 AM |
Introducing BBMap, a new short-read aligner for DNA and RNA | Brian Bushnell | Bioinformatics | 24 | 07-07-2014 09:37 AM |
![]() |
|
Thread Tools |
![]() |
#81 |
Member
Location: Louisiana Join Date: Nov 2013
Posts: 38
|
![]() |
![]() |
![]() |
![]() |
#82 |
Super Moderator
Location: Walnut Creek, CA Join Date: Jan 2014
Posts: 2,707
|
![]()
Hi Gopo,
I don't particularly recommend Tadpole for diploid (or higher) genomes, as it has absolutely no capability of dealing with heterozygous sites. However, it's really fast, so even with a huge genome 72 hours would be unusual (though possible; that one is pretty large after all) unless something went wrong. Are you certain that it not crash? Typically, if it crashed (due to running out of memory, for example) it would indicate that in the stderr output. You may find it helpful to perform error-correction with K=31 and add the flag "prefilter=2" to get rid of erroneous kmers and conserve memory with a Bloom filter. But as for finishing a massive assembly in 72 hours, I don't think that will help. Tadpole does not support checkpointing. I don't know what the best diploid eukaryotic assembler for Illumina reads is currently, but it's safe to bet that it's not Tadpole (unless all you care about is avoiding misassemblies and very low continuity is acceptable). There are some assemblers, though, like Ray and Hipmer, that can run distributed on a cluster to reduce the overall time as well as per-node memory requirements. Those might be worth trying in this case to fit into the 72-hour window. If your read pairs are mostly overlapping, you can also merge them first with BBMerge to reduce your data volume somewhat and increase quality, which will reduce both time and memory usage. Ray, for example, appears to benefit from merged reads, and I've been told by one of the developers that HipMer does as well. Last edited by Brian Bushnell; 10-13-2017 at 10:17 AM. |
![]() |
![]() |
![]() |
#83 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,091
|
![]()
Axolotl genome paper has used SOAPdenovo2.
|
![]() |
![]() |
![]() |
#84 | |
Member
Location: Louisiana Join Date: Nov 2013
Posts: 38
|
![]() Quote:
@GenoMax - Thank you. Unfortunately, they were unable to finish the assembly with SOAPDenovo2 (see https://images.nature.com/original/n...ep16413-s1.pdf) From "http://www.ambystoma.org/" This assembly represents a single individual from the AGSC and was generated using 600 Gb of HiSeq paired end reads and 640Gb of HiSeq mate pair reads. Reads were assembled using a modified version of SparseAssembler [Ye C, et al. 2012]. I might give SparseAssembler a try. |
|
![]() |
![]() |
![]() |
#85 |
Member
Location: Louisiana Join Date: Nov 2013
Posts: 38
|
![]()
Hi Brian,
I used bbmerge and am now trying to error correct my paired and merged reads with tadpole at the same, but I can't seem to get the right syntax for the input I tried the following like the example, Code:
~/bin/bbmap-37.56/tadpole.sh in=SRR2027504_1.fq.gz,SRR2027504_merged.fq.gz in2=SRR2027504_2.fq.gz,null out=ecc_SRR2027504_1.fq.gz,ecc_SRR2027504_merged.fq.gz out2=ecc_SRR2027504_2.fq.gz,null mode=correct Code:
Tadpole version 37.56 Exception in thread "main" java.lang.RuntimeException: Can't read file 'null' at shared.Tools.testInputFiles(Tools.java:628) at shared.Tools.testInputFiles(Tools.java:605) at assemble.Tadpole.<init>(Tadpole.java:624) at assemble.Tadpole1.<init>(Tadpole1.java:68) at assemble.Tadpole.makeTadpole(Tadpole.java:77) at assemble.Tadpole.main(Tadpole.java:64) |
![]() |
![]() |
![]() |
#86 |
Senior Member
Location: East Coast USA Join Date: Feb 2008
Posts: 7,091
|
![]()
Appears that tadpole is not able to take SE,PE reads at the same time.
|
![]() |
![]() |
![]() |
#87 |
Junior Member
Location: Sitzerland Join Date: Oct 2017
Posts: 7
|
![]()
Hallo,
I've a question. The unmerged pairs, are they trimmed or not? Do I have to do a quality trimming again on the unmerged pairs? The same question when using adapter removing during merge. |
![]() |
![]() |
![]() |
#88 |
Member
Location: Louisiana Join Date: Nov 2013
Posts: 38
|
![]()
Hi Brian,
I recreated the paired-end FASTQ files, performed adapter and quality trimming with bbduk, then used tadpole for error correction, and finally used tadpole in contig mode to de novo assemble the contigs within 72 hours. I used prefilter=2 and evaluated various values of K. Thank you for your help. This was the only de novo assembler that I tried that could finish within 72 hours, and yes it needed up to 1.4TB RAM and 48 cores. |
![]() |
![]() |
![]() |
#89 |
Member
Location: Vienna Join Date: Feb 2016
Posts: 35
|
![]()
Hi Brian,
After reading this whole thread, I still have some doubts about how the mode=extend works in tadpole. My understanding is: kmers of size k are extracted from the reads and, upon overlap, reads are extended. My expectation was that they were also merged together. Instead, I get the same number of reads in input and output, just extended. I am ok with the output but I would like some rationale to justify it in my workflow. Such extended reads should only be used in the context of assembly, right? Because I won't try to extract a positional coverage from them, as they are an extended version of themselves. |
![]() |
![]() |
![]() |
#90 |
Senior Member
Location: oceania Join Date: Feb 2014
Posts: 115
|
![]()
Hi,
I am getting a memory error with the following: tadpole.sh in=../fasta/2.fasta out=tad2.fa k=96 merge=t overwrite=t The fasta is interleaved and has 57767162 reads. This is a metagenome file. Reads are 150 bp paired illumina novaseq, qc'd and clipped. I have tried the -Xmx50g but it made no difference. I have 64GB RAM (about 62GB available, on Ubuntu 16.04) and 64GB swap, but the program doe snot seem to use the swap at all. Thanks for any help. s. output: Executing assemble.Tadpole2 [in=../fasta/2.fasta, out=tad2.fa, k=96, merge=t, overwrite=t, -Xmx50g] Version 37.88 [in=../fasta/2.fasta, out=tad2.fa, k=96, merge=t, overwrite=t, -Xmx50g] Using 8 threads. Executing ukmer.KmerTableSetU [in=../fasta/2.fasta, out=tad2.fa, k=96, merge=t, overwrite=t, -Xmx50g] Initial: Ways=31, initialSize=128000, prefilter=f, prealloc=f Memory: max=51450m, free=50913m, used=537m Initialization Time: 0.032 seconds. Loading kmers. Estimated kmer capacity: 585441055 After table allocation: Memory: max=51450m, free=50376m, used=1074m java.lang.OutOfMemoryError: Java heap space at shared.KillSwitch.allocLong2D(KillSwitch.java:234) at ukmer.AbstractKmerTableU.allocLong2D(AbstractKmerTableU.java:196) at ukmer.HashArrayU1D.resize(HashArrayU1D.java:187) at ukmer.HashArrayU1D.incrementAndReturnNumCreated(HashArrayU1D.java:90) at ukmer.HashBufferU.dumpBuffer_inner(HashBufferU.java:196) at ukmer.HashBufferU.dumpBuffer(HashBufferU.java:168) at ukmer.HashBufferU.incrementAndReturnNumCreated(HashBufferU.java:57) at ukmer.KmerTableSetU$LoadThread.addKmersToTable(KmerTableSetU.java:574) at ukmer.KmerTableSetU$LoadThread.run(KmerTableSetU.java:499) This program ran out of memory. Try increasing the -Xmx flag and using tool-specific memory-related parameters. |
![]() |
![]() |
![]() |
#91 |
Senior Member
Location: oceania Join Date: Feb 2014
Posts: 115
|
![]()
My mistake - I should have increased Xmx.. not decreased it. It still seems to be stalling though RAM stays full but cpu activity drops to zero.. assembly never finishes.
S. |
![]() |
![]() |
![]() |
#92 |
Registered Vendor
Location: Eugene, OR Join Date: May 2013
Posts: 523
|
![]()
Do you need to use k=96? That is a large kmer size and will increase memory demand. A metagenome plus sequence errors will create many 96-mers. What happens with the default k=31... does it run with that? I guess tadpole is estimating memory for k=96 and it should work but I would try a smaller k and see what happens.
__________________
Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com |
![]() |
![]() |
![]() |
#93 | |
Senior Member
Location: oceania Join Date: Feb 2014
Posts: 115
|
![]() Quote:
yes I get your point. I have reduced it now.. Will update progress. S. |
|
![]() |
![]() |
![]() |
#94 |
Senior Member
Location: oceania Join Date: Feb 2014
Posts: 115
|
![]() |
![]() |
![]() |
![]() |
#95 |
Member
Location: Thessaloniki, Greece Join Date: Jul 2018
Posts: 12
|
![]()
Hello!
Does RQCFilter has a parameter to normalise and error correct? Or I should do it afterwards? |
![]() |
![]() |
![]() |
#96 | |
Junior Member
Location: Turkey Join Date: Jun 2014
Posts: 2
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#97 |
David Eccles (gringer)
Location: Wellington, New Zealand Join Date: May 2011
Posts: 835
|
![]()
Wow! Joined in 2014, only one post, and you chose to make your first post something about mutant clouds in response to a 3-year old post of mine from the first page of this thread. I'm not sure if I should be pleased, or concerned. If you want a more recent update on that, try here in this thread (where I got a 3.6kb sequence with crazy-high coverage):
http://seqanswers.com/forums/showthr...561#post182561 But to be frank, it's been so long ago that I've forgotten the virus and host that I was looking at. Feel free to expand on what you meant (preferably with a reference to a preprint or other form of publication), but bear in mind that it's probably better in the future to try to respond only to the most recent posts in a thread. |
![]() |
![]() |
![]() |
#98 | |
Junior Member
Location: Turkey Join Date: Jun 2014
Posts: 2
|
![]() Quote:
![]() ![]() But thank you: You have given me an interesting idea. You have been a mirror for me. While responding to you now I found a solution for myself, So thank you in many countless times ![]() ![]() ![]() ![]() |
|
![]() |
![]() |
![]() |
#99 |
Junior Member
Location: singapore Join Date: May 2016
Posts: 9
|
![]()
Hello there,
I am running into an error whenever I tried to run Tadpole: Performing error-correction with Tadpole java -ea -Xmx117184m -Xms117184m -cp /scratch/shiming/tools/bbmap-v36.92/current/ assemble.Tadpole in1=CFC280618_S3_R1_001.fastq.gz in2=CFC280618_S3_R2_001.fastq.gz out1=tadpole/CFC280618_R1.tadpole.fastq.gz out2=tadpole/CFC280618_R2.tadpole.fastq.gz mode=correct ziplevel=9 Picked up _JAVA_OPTIONS: -Xmx1g Error occurred during initialization of VM Initial heap size set to a larger value than the maximum heap size java -ea -Xmx117184m -Xms117184m -cp /scratch/shiming/tools/bbmap-v36.92/current/ assemble.Tadpole in1=MFC280618_S2_R1_001.fastq.gz in2=MFC280618_S2_R2_001.fastq.gz out1=tadpole/MFC280618_R1.tadpole.fastq.gz out2=tadpole/MFC280618_R2.tadpole.fastq.gz mode=correct ziplevel=9 Picked up _JAVA_OPTIONS: -Xmx1g Error occurred during initialization of VM Initial heap size set to a larger value than the maximum heap size java -ea -Xmx117184m -Xms117184m -cp /scratch/shiming/tools/bbmap-v36.92/current/ assemble.Tadpole in1=SBW280618_S1_R1_001.fastq.gz in2=SBW280618_S1_R2_001.fastq.gz out1=tadpole/SBW280618_R1.tadpole.fastq.gz out2=tadpole/SBW280618_R2.tadpole.fastq.gz mode=correct ziplevel=9 Picked up _JAVA_OPTIONS: -Xmx1g Error occurred during initialization of VM Initial heap size set to a larger value than the maximum heap size (END) This was the command that I ran with: echo "Performing error-correction with Tadpole" echo for x in *_R1_001.fastq.gz do tadpole.sh in1=$x in2=${x%_R1_001*}_R2_001.fastq.gz out1=tadpole/${x%_S*_R1_001.*}_R1.tadpole.fastq.gz out2=tadpole/${x%_S*_R1_001.*}_R2.tadpole.fastq.gz mode=correct ziplev$ done Is there something wrong that I am doing here? Thanks |
![]() |
![]() |
![]() |
#100 | |
Junior Member
Location: Perú Join Date: Jul 2019
Posts: 6
|
![]() Quote:
HTML Code:
Extending contigs with reads could be done like this: tadpole.sh in=contigs.fa out=extended.fa el=100 er=100 mode=extend extra=reads.fq k=62 How should I put my two sets of reads? (mit_1.fastq y mit_2.fastq) Perhaps: extra=mit_1.fastq,mit_2.fastq ? or extra1=mit_1.fastq extra2=mit_2.fastq ? or should I interleaved my paired-end fastq files? edit: I interleaved my two fastq files and I executed: > tadpole.sh in=contigs.fasta out=extended.fa el=1000 er=1000 mode=extend extra=all.mt.interleaved.fq k=62 It gaves me this error: Exception in thread "main" java.lang.RuntimeException: Can't read file 'all.mt.interleaved.fq' at shared.Tools.testInputFiles(Tools.java:1121) at shared.Tools.testInputFiles(Tools.java:1089) at ukmer.KmerTableSetU.<init>(KmerTableSetU.java:345) at assemble.Tadpole2.<init>(Tadpole2.java:70) at assemble.Tadpole.makeTadpole(Tadpole.java:74) at assemble.Tadpole.main(Tadpole.java:57) Thank you in advance for you help ![]() Last edited by silverfox; 07-26-2019 at 09:55 PM. |
|
![]() |
![]() |
![]() |
Tags |
assembler, bbmap, bbmerge, bbnorm, bbtools, error correction, tadpole |
Thread Tools | |
|
|