SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Introducing Reformat, a fast read format converter Brian Bushnell Bioinformatics 35 09-26-2019 03:18 AM
Introducing BBMerge: A paired-end read merger Brian Bushnell Bioinformatics 128 02-24-2019 06:49 AM
Introducing BBNorm, a read normalization and error-correction tool Brian Bushnell Bioinformatics 50 01-27-2019 06:23 AM
Introducing BBMap, a new short-read aligner for DNA and RNA Brian Bushnell Bioinformatics 24 07-07-2014 09:37 AM

Reply
 
Thread Tools
Old 07-27-2019, 06:53 AM   #101
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Quote:
mit_1.fastq y mit_2.fastq (I'm assembling a mitogenome) and I want to use them to extend my contig
Is this data in addition to contigs file or are they reads that were used to make the contig assembly?

Take a look at this tadpole usage guide to see if it helps.
GenoMax is offline   Reply With Quote
Old 07-27-2019, 07:45 AM   #102
silverfox
Junior Member
 
Location: Perú

Join Date: Jul 2019
Posts: 4
Default

Quote:
Originally Posted by GenoMax View Post
Is this data in addition to contigs file or are they reads that were used to make the contig assembly?

Take a look at this tadpole usage guide to see if it helps.
I have a lot more reads. The ones used in the assembly with SPAdes were a just a sampling (to have a 60x coverage of my reference genome, of 569630 bp).
These reads data that I gave tadpole are the whole set of reads (interleaved), including the ones used in the assembly but also new ones.

Last edited by silverfox; 07-27-2019 at 07:55 AM.
silverfox is offline   Reply With Quote
Old 07-27-2019, 08:47 AM   #103
silverfox
Junior Member
 
Location: Perú

Join Date: Jul 2019
Posts: 4
Default

Quote:
Originally Posted by Brian Bushnell View Post
In default mode, Tadpole assembles reads and produces contigs. In "extend" or "correct" mode, it will extend or correct input sequences - which can be reads or contigs, but it's designed for reads. When I use Tadpole for assembly, I often first correct the reads, then assemble them, which takes two passes. Tadpole will build contigs unless you explicitly add the flag "mode=extend" or "mode=correct", regardless of whether you have 1 or 2 inputs. In extend or correct mode, it will modify the input reads, and not make contigs.

I'm glad to hear that you've achieved more contiguous assemblies after extending the reads and assembling them with longer kmers - that was my goal in designing that mode (and particularly, to allow merging of non-overlapping reads), but unfortunately I've been too busy to test it thoroughly. You've given me a second data point about it being beneficial, though, so thanks!

"shave" and "rinse" are what some assemblers call "tip removal" and "bubble removal". But, they are implemented a bit differently and occur before the graph is built, rather than as graph simplification routines. As such, they pose virtually no risk of causing misassemblies, and reduce the risk of misassemblies due to single chimeric reads. But unfortunately, in my experience, they also only provide very minor improvements in continuity or error-correction. Sometimes they make subsequent operations faster, though. By default, adding the flag "shave" will remove dead-end kmer paths of depth no more than 1 and length no more than 150 that branch out of a path with substantially greater depth. "rinse", similarly, only removes short paths of depth no more than 1 in which each end terminates in a branch node of substantially greater depth. Because these operations are so conservative, they seem to have little impact. Assemblers like Velvet and AllPaths-LG can collapse bubbles with a 50-50 split as to the path depth, which greatly increases the continuity (particularly with diploid organisms), but poses the risk of misassemblies when there are repeat elements. Tadpole always errs on the side of caution, preferring lower continuity to possible misassemblies.

Tadpole is still not pair-aware and does not perform scaffolding, though that's certainly my next goal, when I get a chance. When you generate contigs, Tadpole automatically runs AssemblyStats (which you can run as standalone using stats.sh). This mentions scaffolds in various places, because it's designed for assemblies that are potentially scaffolded, but you'll note that for Tadpole the scaffold statistics and contig statistics are identical.

Don't feel like you have to use all aspects of Tadpole in order to use it effectively! I am currently using it for mitochondrial assembly also, because it's easy to set a specific depth band to assemble, and thus pull out the mito without the main genome after identifying it on a kmer frequency histogram (in fact, I wrote a script to do this automatically). But in that case I don't actually use the error-correction or extension capabilities, as they are not usually necessary as the coverage is already incredibly high and low-depth kmers are being ignored. I use those more for single-cell work, which has lots of very-low-depth regions.
Hi! I'm assembling a chloroplast and a mitocondrial genome

I was using the reads that mapped agains a close reference for both, and doing de novo assembly. I plan to extend my read length and perform an assembly with longer kmer length like it is suggested here
A question please, what coverage over the reference genome of my mitocondria do you suggest?
I have a total coverage of 1400x but my advisor told me to use 90x.



Also, reading your comments, can you point to a guide to learn how to "Don't feel like you have to use all aspects of Tadpole in order to use it effectively! I am currently using it for mitochondrial assembly also, because it's easy to set a specific depth band to assemble, and thus pull out the mito without the main genome after identifying it on a kmer frequency histogram"?

Also, what do you mean by "depth band to assemble" ?

Sorry, i'm very very new to this field.

Thank you very much in advance

Last edited by silverfox; 07-27-2019 at 09:52 AM.
silverfox is offline   Reply With Quote
Old 07-27-2019, 01:15 PM   #104
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,978
Default

Instead of trying to extend the reads use tadpole in assembly mode. I linked a guide for how to use "tadpole" and its modes in my last comment.

You may have too much coverage for small genomes such as these (if the libraries were made from isolated mitochondria/chloroplast DNA). You may have to use "bbnorm.sh" to normalize your data. A guide to use BBNorm is available at this link.
GenoMax is offline   Reply With Quote
Old 07-27-2019, 02:57 PM   #105
silverfox
Junior Member
 
Location: Perú

Join Date: Jul 2019
Posts: 4
Default

Quote:
Originally Posted by GenoMax View Post
Instead of trying to extend the reads use tadpole in assembly mode. I linked a guide for how to use "tadpole" and its modes in my last comment.

You may have too much coverage for small genomes such as these (if the libraries were made from isolated mitochondria/chloroplast DNA). You may have to use "bbnorm.sh" to normalize your data. A guide to use BBNorm is available at this link.

Hi GenoMax! Thank you for your reply!

My libraries were not made from isolated mitochondria and chloroplast DNA. We made a whole genome sequencing of purple maize. Then I mapped all the reads agains the reference genome (10 chromosomes + mitochondria + chloroplast) using bowtie2. Then, using samtools, I extracted the alignments (bams) only for mitochondria and chloroplast and with samtools fastq I extracted the reads mapped agains the reference mitochondria and chloroplast. I used repair.sh to sort the reads by name and to have the same number of reads per fastq file.

I want to use these reads to do de novo assembly.

For chloroplast, I have a total coverage of 5600x. Doing sampling to have a 60x or 90x of coverage, and kmers of 37,47,57,67 or close, I get a highly fragmented assembly.

For mitochondria, I have a total coverage of 1400x. Doing sampling to have a 60x of coverage and kmers 47,57,67,77 I got an assembly of 46 contigs and using kmers of 45,65,85,95, I got 31 contigs.

I was thinking of using tadpole for assembly but I wasn't sure if I should extend my reads and what would be the correct coverage and kmer length to use in that case...

Thank you so much in advance for your advice

Last edited by silverfox; 07-27-2019 at 04:05 PM.
silverfox is offline   Reply With Quote
Reply

Tags
assembler, bbmap, bbmerge, bbnorm, bbtools, error correction, tadpole

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:49 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO