SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Introducing BBMerge: A paired-end read merger Brian Bushnell Bioinformatics 112 10-14-2017 01:54 PM
Introducing BBNorm, a read normalization and error-correction tool Brian Bushnell Bioinformatics 45 01-13-2017 01:09 AM
Introducing Reformat, a fast read format converter Brian Bushnell Bioinformatics 18 06-15-2016 01:51 PM
Introducing BBMap, a new short-read aligner for DNA and RNA Brian Bushnell Bioinformatics 24 07-07-2014 10:37 AM

Reply
 
Thread Tools
Old 10-13-2017, 10:49 AM   #81
Gopo
Member
 
Location: Louisiana

Join Date: Nov 2013
Posts: 15
Default

Quote:
Originally Posted by GenoMax View Post
Have you tried other large genome assemblers? ALLPATHS-LG?
No I haven't, but based on the manual- I am out of luck as ALLPATHS-LG requires shotgun sequencing and mate pair libraries.
Gopo is offline   Reply With Quote
Old 10-13-2017, 11:14 AM   #82
Brian Bushnell
Super Moderator
 
Location: Walnut Creek, CA

Join Date: Jan 2014
Posts: 2,695
Default

Hi Gopo,

I don't particularly recommend Tadpole for diploid (or higher) genomes, as it has absolutely no capability of dealing with heterozygous sites. However, it's really fast, so even with a huge genome 72 hours would be unusual (though possible; that one is pretty large after all) unless something went wrong. Are you certain that it not crash? Typically, if it crashed (due to running out of memory, for example) it would indicate that in the stderr output.

You may find it helpful to perform error-correction with K=31 and add the flag "prefilter=2" to get rid of erroneous kmers and conserve memory with a Bloom filter. But as for finishing a massive assembly in 72 hours, I don't think that will help. Tadpole does not support checkpointing. I don't know what the best diploid eukaryotic assembler for Illumina reads is currently, but it's safe to bet that it's not Tadpole (unless all you care about is avoiding misassemblies and very low continuity is acceptable). There are some assemblers, though, like Ray and Hipmer, that can run distributed on a cluster to reduce the overall time as well as per-node memory requirements. Those might be worth trying in this case to fit into the 72-hour window.

If your read pairs are mostly overlapping, you can also merge them first with BBMerge to reduce your data volume somewhat and increase quality, which will reduce both time and memory usage. Ray, for example, appears to benefit from merged reads, and I've been told by one of the developers that HipMer does as well.

Last edited by Brian Bushnell; 10-13-2017 at 11:17 AM.
Brian Bushnell is offline   Reply With Quote
Old 10-13-2017, 11:15 AM   #83
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

Axolotl genome paper has used SOAPdenovo2.
GenoMax is offline   Reply With Quote
Old 10-13-2017, 11:40 AM   #84
Gopo
Member
 
Location: Louisiana

Join Date: Nov 2013
Posts: 15
Default

Quote:
Originally Posted by Brian Bushnell View Post
Hi Gopo,
Are you certain that it not crash? Typically, if it crashed (due to running out of memory, for example) it would indicate that in the stderr output.
Hi Brian, no it did not crash- unfortunately the job exceeded the allowed walltime. I'll try what you suggested for Tadpole first.

@GenoMax - Thank you. Unfortunately, they were unable to finish the assembly with SOAPDenovo2 (see https://images.nature.com/original/n...ep16413-s1.pdf)

From "http://www.ambystoma.org/"
This assembly represents a single individual from the AGSC and was generated using 600 Gb of HiSeq paired end reads and 640Gb of HiSeq mate pair reads. Reads were assembled using a modified version of SparseAssembler [Ye C, et al. 2012].

I might give SparseAssembler a try.
Gopo is offline   Reply With Quote
Old 10-13-2017, 01:49 PM   #85
Gopo
Member
 
Location: Louisiana

Join Date: Nov 2013
Posts: 15
Default

Hi Brian,

I used bbmerge and am now trying to error correct my paired and merged reads with tadpole at the same, but I can't seem to get the right syntax for the input

I tried the following like the example,

Code:
 ~/bin/bbmap-37.56/tadpole.sh in=SRR2027504_1.fq.gz,SRR2027504_merged.fq.gz in2=SRR2027504_2.fq.gz,null out=ecc_SRR2027504_1.fq.gz,ecc_SRR2027504_merged.fq.gz out2=ecc_SRR2027504_2.fq.gz,null mode=correct
but get,

Code:
Tadpole version 37.56
Exception in thread "main" java.lang.RuntimeException: Can't read file 'null'
        at shared.Tools.testInputFiles(Tools.java:628)
        at shared.Tools.testInputFiles(Tools.java:605)
        at assemble.Tadpole.<init>(Tadpole.java:624)
        at assemble.Tadpole1.<init>(Tadpole1.java:68)
        at assemble.Tadpole.makeTadpole(Tadpole.java:77)
        at assemble.Tadpole.main(Tadpole.java:64)
What am I doing wrong?
Gopo is offline   Reply With Quote
Old 10-14-2017, 05:54 AM   #86
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,550
Default

Appears that tadpole is not able to take SE,PE reads at the same time.
GenoMax is offline   Reply With Quote
Old 10-22-2017, 11:52 PM   #87
silask
Junior Member
 
Location: Sitzerland

Join Date: Oct 2017
Posts: 1
Default

Hallo,
I've a question.

Quote:
Originally Posted by Brian Bushnell View Post
bbmerge-auto.sh in=reads.fq out=merged.fq outu=unmerged.fq ihist=ihist.txt extend2=20 iterations=10 k=31 ecct qtrim2=r trimq=12 strict
The unmerged pairs, are they trimmed or not? Do I have to do a quality trimming again on the unmerged pairs? The same question when using adapter removing during merge.
silask is offline   Reply With Quote
Old 11-02-2017, 07:30 AM   #88
Gopo
Member
 
Location: Louisiana

Join Date: Nov 2013
Posts: 15
Default

Hi Brian,

I recreated the paired-end FASTQ files, performed adapter and quality trimming with bbduk, then used tadpole for error correction, and finally used tadpole in contig mode to de novo assemble the contigs within 72 hours. I used prefilter=2 and evaluated various values of K.

Thank you for your help. This was the only de novo assembler that I tried that could finish within 72 hours, and yes it needed up to 1.4TB RAM and 48 cores.
Gopo is offline   Reply With Quote
Reply

Tags
assembler, bbmap, bbmerge, bbnorm, bbtools, error correction, tadpole

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:45 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO