SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
De novo SNP calling in absence of complete reference assembly fcr De novo discovery 15 09-21-2012 02:34 AM
How I find not assembly read in a reference assembly??? matiasfreired Bioinformatics 1 04-05-2012 12:13 PM
Assisted de novo genome assembly? Create new consensus mapping reads to reference? zmartine Bioinformatics 8 02-10-2012 12:31 AM
How to combine two Reference Genome (Files)? byou678 Bioinformatics 0 10-11-2011 07:13 AM
de novo assembly vs. reference assembly fadista General 3 02-15-2011 11:11 PM

Reply
 
Thread Tools
Old 08-02-2010, 12:21 AM   #1
mwatson
Member
 
Location: Roslin, UK

Join Date: Aug 2010
Posts: 13
Default Combine de novo and reference assembly

Hi

I'd be interested in anyone who can tell me about software that will/can combine de novo assembled contigs with a reference assembly.

What I have are bacterial genomes and between 36 and 72bp reads.

When I align to the reference, large parts of the genome align perfectly, but then I find gaps; If I do a de novo assembly, I can see that some of the contigs span the gaps, but I am doing this by eye using MUMmer, IGV and a few other bits and bobs.

It seems to me obvious that someone would have written this, but I can't find anything....

Mick
mwatson is offline   Reply With Quote
Old 08-02-2010, 01:14 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

What assembler are you using for your de novo and reference guided assemblies?

Have you tried MIRA3?
http://chevreux.org/projects_mira.html
maubp is offline   Reply With Quote
Old 08-02-2010, 03:11 AM   #3
mwatson
Member
 
Location: Roslin, UK

Join Date: Aug 2010
Posts: 13
Default

OK, so I use either SOAPdenovo or Velvet for the de novo stuff, and I currently use Novoalign for the reference assembly.

The problem I am trying to solve is that from the reference assembly, I have a SAM/BAM file that I can view in IGV, and if I find a gap, I want to know why there is a gap - and many times, one of the de novo contigs will span the gap, indicating that the gap in the reference assembly is due to a real gap in the genome, not just an alignment issue.
mwatson is offline   Reply With Quote
Old 08-02-2010, 11:07 PM   #4
Geneious
Registered Vendor
 
Location: New Zealand

Join Date: Jul 2010
Posts: 22
Default

You can also try the 14-day free trial of Geneious Pro.
www.geneious.com/download
Geneious is offline   Reply With Quote
Old 08-03-2010, 12:18 AM   #5
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,543
Default

Quote:
Originally Posted by mwatson View Post
OK, so I use either SOAPdenovo or Velvet for the de novo stuff, and I currently use Novoalign for the reference assembly.

The problem I am trying to solve is that from the reference assembly, I have a SAM/BAM file that I can view in IGV, and if I find a gap, I want to know why there is a gap - and many times, one of the de novo contigs will span the gap, indicating that the gap in the reference assembly is due to a real gap in the genome, not just an alignment issue.
I don't quite understand what you are describing. Are you saying you have "gaps" meaning areas where nothing maps to the reference, yet there is a de novo contig that could be used to span this "gap"? If so, to me it sounds like a region of divergence between the reference genome and your new genome (not a missing section in the new genome). Perhaps you can try adjusting the setting of your guided assembler? Or try including selected de novo contigs in the guided assembly as well?
maubp is offline   Reply With Quote
Old 08-03-2010, 11:05 AM   #6
tplsmith
Junior Member
 
Location: Nebraska

Join Date: Aug 2008
Posts: 7
Default MIRA-based pipeline to view scaffolds

Not sure what you really want is to "combine" a reference and your de novo stuff. Researchers that I have been assisting in the same kinds of studies have had good success using the MIRA assembler mentioned earlier, then using bambus to generate a .bnk file. Viewing the results using hawkeye provides a lot of information like the kind you appear to be looking for, especially useful when you have a lot of paired end information and want to see how the scaffolds fit together and where problems may lie. Definitely this is useful to see if a piece in the reference that is "missing" in your data is due to assembly quality issues because you can get an idea of read depth at the ends of the contigs. You can then compare the pertinent scaffolds of the assembly to a reference in a variety of ways, a good one that you can easily edit is in the Geneious package advertised in one of the replies especially because in that single package you can call a variety of aligners (Geneious, ClustalW2, MUSCLE) to see how it affects the results. Instructions for the MIRA/bambus/hawkeye pipeline can be obtained at the MIRA website, some were written here by my colleague who's contact information is on the site also. This should help you decide if your strain has sequence not found in the reference or vice versa. There is also a MIRA discussion group that you can address specific questions to if you have problems.
tplsmith is offline   Reply With Quote
Old 09-24-2010, 12:17 PM   #7
Adjuvant
Member
 
Location: Chicago, IL

Join Date: Sep 2010
Posts: 13
Default

Apparently mwatson and I are interested in the same things.

I'm also doing bacterial sequencing. I used novoalign to align my reads to several reference sequences, extracted the unaligned reads and performed velvet assembly on those. Blasting the resulting contigs shows quite a few that have sequence correseponding to my reference sequences at the ends of the contigs, but novel sequence in the middle. So in an effort to combine my alignments and my de novo assemblies, I did a pileup of my novoalign alignments, dumped the consensus to fastq, then separated the quality data to yield several consensus fasta sequences (corresponding to each of the reference genomes).

Here's where I get stuck: the pileup fills gaps in the alignment with N's. When I look at my alignment in Tablet, however, I can see that not all gaps are equal. Many are clearly spanned by a lot of paired end reads, whereas others have no spanning pairs and so might be the sites where some of my de novo assembled contigs might fit. They would also be sites where I'd first like to start designing outward directed primers for Sanger sequencing.

My question is: Is there a way to separate my alignment consensus sequences into contigs separated by these unspanned gaps? The way I'm doing it now is scanning through my alignment in Tablet looking for such gaps, then looking for those gaps in my consensus sequence and manually deleting the N's. It seems like there should be a better way.

Thanks.
Adjuvant is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:11 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO