SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Assembly using Illumina + Nanopore 1D reads? JonB Oxford Nanopore 7 11-03-2017 04:39 AM

Reply
 
Thread Tools
Old 05-11-2018, 04:02 AM   #1
iltisanni
Member
 
Location: Mainz, Germany

Join Date: Mar 2017
Posts: 20
Default Nanopore - circular Assembly

Hi,

we succesfully sequenced a DNA sample and assembled the genome with canu.
We got one circular contig which is perfect. But the contig is overlapping.

Since nanopore Reads are very long some reads at the end have the same sequence as the reads at the beginning of the contig and vice versa.
Canu is also reporting that the contig is circular.

Now we want to fix those reads at the beginning and the end of the contig to get one linear contig without overlaps.

I cannot find any software to help us there except "circlator". I guess it's the minimus2 function we have to use here, but this function has dependencies to the software AMOS which seems to be impossible to install on Ubuntu 18.04.

Can anyone help us here? Maybe any alternative software to circlator?


Of course we could always trim the contig manually by finding the end of the contig at the beginning and then trim at this position or use a script which does the same... but...the best code is still the one which has already been written by someone else :-)

Last edited by iltisanni; 05-11-2018 at 04:04 AM.
iltisanni is offline   Reply With Quote
Old 05-11-2018, 04:19 AM   #2
Markiyan
Senior Member
 
Location: Cambridge

Join Date: Sep 2010
Posts: 111
Lightbulb Also you can use blast or mummer to detect the overlapping ends...

First you need to detect by how much the ends are overlapping.
Than you can save non-overlapping portion + a single copy of the overlapping area sequence to a file.

You can detect overlapping ends of the contig(s) by the standalone blast using the master-slave alignment formatting output option (blast the sequence against itself).: Lower the expect value -e 1e-50 or less and crank up the word size to 16 - 64bp (-W 32)
Also dotplot/mummer alignment against itself may be userfull.

Using above info you can decide which base-range to keep, so you get non - overlapping ends.
Than you open your sequence in Artemis or similar editor and do select->base range
and save the selected base range to a fasta file: File->Write->Bases of selection->Fasta format.
Markiyan is offline   Reply With Quote
Old 05-14-2018, 12:25 AM   #3
iltisanni
Member
 
Location: Mainz, Germany

Join Date: Mar 2017
Posts: 20
Default

Thank you. You helped me a lot and your suggestion to align the sequence against itself was right. Now I found the trimming point and trimmed the fasta with a simple cat X.fasta | cut -c 1-XXX > trimmed.fasta after deleting the header line first and inserting it again at the end in the trimmed.fasta

I found this information directly in the canu documentation:
http://canu.readthedocs.io/en/latest...ed-has-overlap

--->

An alternative is to run MUMmer to get self-alignments on the contig and use those trim points. For example, assuming the circular element is in tig00000099.fa. Run:

nucmer -maxmatch -nosimplify tig00000099.fa tig00000099.fa
show-coords -lrcTH out.delta


to find the end overlaps in the tig. The output would be something like:

1 1895 48502 50400 1895 1899 99.37 50400 50400 3.76 3.77 tig00000001 tig00000001
48502 50400 1 1895 1899 1895 99.37 50400 50400 3.77 3.76 tig00000001 tig00000001

means trim to 1 to 48502. There is also an alternate writeup.

<---

Last edited by iltisanni; 05-14-2018 at 12:28 AM.
iltisanni is offline   Reply With Quote
Old 05-14-2018, 02:52 AM   #4
Ali May
Member
 
Location: Netherlands

Join Date: Aug 2016
Posts: 12
Default

Quote:
Originally Posted by iltisanni View Post
Hi,

I cannot find any software to help us there except "circlator". I guess it's the minimus2 function we have to use here, but this function has dependencies to the software AMOS which seems to be impossible to install on Ubuntu 18.04.

Hi, I use Circlator in similar scenarios. I think you can just use the 'normal' Circlator function and not specifically 'minimus2', which indeed is a hassle as far as I remember.

Code:
circlator all <assembly.fasta> <corrected_longreads_from_canu.fasta> <output_folder> --threads <nr_of_trheads>
Then I check the output of

Code:
04.merge.circularise_details.log
in the output folder to hopefully see a line like

Code:
[merge circularise_details]	scaffold1|size4159270|arrow	Circularized: yes
Then the file
Code:
06.fixstart.fasta
is the final output file which should have fixed coordinates without overlaps etc. Let me know if this helps.
Ali May is offline   Reply With Quote
Old 05-14-2018, 04:29 AM   #5
iltisanni
Member
 
Location: Mainz, Germany

Join Date: Mar 2017
Posts: 20
Default

Quote:
Originally Posted by Ali May View Post
Hi, I use Circlator in similar scenarios. I think you can just use the 'normal' Circlator function and not specifically 'minimus2', which indeed is a hassle as far as I remember.
I'm not sure about the "fixstart" option. We want exactly what is written for the "minimus2" option but "fixstart" just sets a new starting point at the first dnaA gene if finds. But it does not circularize contigs by merging any overlapping contigs if I'm not mistaken...
iltisanni is offline   Reply With Quote
Old 05-14-2018, 04:45 AM   #6
Ali May
Member
 
Location: Netherlands

Join Date: Aug 2016
Posts: 12
Default

Quote:
Originally Posted by iltisanni View Post
I'm not sure about the "fixstart" option. We want exactly what is written for the "minimus2" option but "fixstart" just sets a new starting point at the first dnaA gene if finds. But it does not circularize contigs by merging any overlapping contigs if I'm not mistaken...
I see, although the option I suggested was 'all', which does include circularisation (https://github.com/sanger-pathogens/...wiki/Task:-all). However it's true that it includes also the 'fixstart' option, so in your case not ideal as I understand.
Ali May is offline   Reply With Quote
Old 05-14-2018, 04:52 AM   #7
iltisanni
Member
 
Location: Mainz, Germany

Join Date: Mar 2017
Posts: 20
Default

Quote:
Originally Posted by Ali May View Post
I see, although the option I suggested was 'all', which does include circularisation (https://github.com/sanger-pathogens/...wiki/Task:-all). However it's true that it includes also the 'fixstart' option, so in your case not ideal as I understand.
Oh Hey.. I just recognized the "merge" function which is included with the all option.

I guess this does what I want...I will try it. Alle the other functions coming with the "all" option are not needed in my case.

The only thing I don't get is whether the "merge" function uses spades for anything? And if spades is used, for what?
My assembler is canu because it seems to be the best right now for Nanopore Reads, so nothing with spades...

Last edited by iltisanni; 05-14-2018 at 05:25 AM.
iltisanni is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 01:44 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO