Unconfigured Ad

**maubp** · 08-02-2010, 01:14 AM

What assembler are you using for your de novo and reference guided assemblies?

Have you tried MIRA3?

**mwatson** · 08-02-2010, 03:11 AM

OK, so I use either SOAPdenovo or Velvet for the de novo stuff, and I currently use Novoalign for the reference assembly.

The problem I am trying to solve is that from the reference assembly, I have a SAM/BAM file that I can view in IGV, and if I find a gap, I want to know why there is a gap - and many times, one of the de novo contigs will span the gap, indicating that the gap in the reference assembly is due to a real gap in the genome, not just an alignment issue.

**Geneious** · 08-02-2010, 11:07 PM

You can also try the 14-day free trial of Geneious Pro.

https://www.geneious.com/download

**maubp** · 08-03-2010, 12:18 AM

Originally posted by mwatson View Post

OK, so I use either SOAPdenovo or Velvet for the de novo stuff, and I currently use Novoalign for the reference assembly.

The problem I am trying to solve is that from the reference assembly, I have a SAM/BAM file that I can view in IGV, and if I find a gap, I want to know why there is a gap - and many times, one of the de novo contigs will span the gap, indicating that the gap in the reference assembly is due to a real gap in the genome, not just an alignment issue.

I don't quite understand what you are describing. Are you saying you have "gaps" meaning areas where nothing maps to the reference, yet there is a de novo contig that could be used to span this "gap"? If so, to me it sounds like a region of divergence between the reference genome and your new genome (not a missing section in the new genome). Perhaps you can try adjusting the setting of your guided assembler? Or try including selected de novo contigs in the guided assembly as well?

**tplsmith** · 08-03-2010, 11:05 AM

MIRA-based pipeline to view scaffolds

Not sure what you really want is to "combine" a reference and your de novo stuff. Researchers that I have been assisting in the same kinds of studies have had good success using the MIRA assembler mentioned earlier, then using bambus to generate a .bnk file. Viewing the results using hawkeye provides a lot of information like the kind you appear to be looking for, especially useful when you have a lot of paired end information and want to see how the scaffolds fit together and where problems may lie. Definitely this is useful to see if a piece in the reference that is "missing" in your data is due to assembly quality issues because you can get an idea of read depth at the ends of the contigs. You can then compare the pertinent scaffolds of the assembly to a reference in a variety of ways, a good one that you can easily edit is in the Geneious package advertised in one of the replies especially because in that single package you can call a variety of aligners (Geneious, ClustalW2, MUSCLE) to see how it affects the results. Instructions for the MIRA/bambus/hawkeye pipeline can be obtained at the MIRA website, some were written here by my colleague who's contact information is on the site also. This should help you decide if your strain has sequence not found in the reference or vice versa. There is also a MIRA discussion group that you can address specific questions to if you have problems.

**Adjuvant** · 09-24-2010, 12:17 PM

Apparently mwatson and I are interested in the same things.

I'm also doing bacterial sequencing. I used novoalign to align my reads to several reference sequences, extracted the unaligned reads and performed velvet assembly on those. Blasting the resulting contigs shows quite a few that have sequence correseponding to my reference sequences at the ends of the contigs, but novel sequence in the middle. So in an effort to combine my alignments and my de novo assemblies, I did a pileup of my novoalign alignments, dumped the consensus to fastq, then separated the quality data to yield several consensus fasta sequences (corresponding to each of the reference genomes).

Here's where I get stuck: the pileup fills gaps in the alignment with N's. When I look at my alignment in Tablet, however, I can see that not all gaps are equal. Many are clearly spanned by a lot of paired end reads, whereas others have no spanning pairs and so might be the sites where some of my de novo assembled contigs might fit. They would also be sites where I'd first like to start designing outward directed primers for Sanger sequencing.

My question is: Is there a way to separate my alignment consensus sequences into contigs separated by these unspanned gaps? The way I'm doing it now is scanning through my alignment in Tablet looking for such gaps, then looking for those gaps in my consensus sequence and manually deleting the N's. It seems like there should be a better way.

Thanks.

Topics	Statistics	Last Post
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, Yesterday, 12:03 PM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 Yesterday, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, Yesterday, 11:40 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:40 AM
MetaBeeAI Helps Scientists Process Research Literature Faster by SEQadmin2 Started by SEQadmin2, 05-28-2026, 11:40 AM	0 responses 29 views 0 reactions	Last Post by SEQadmin2 05-28-2026, 11:40 AM
Scientists Solve a 25-Year Mystery in RNA Interference by SEQadmin2 Started by SEQadmin2, 05-26-2026, 10:12 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 05-26-2026, 10:12 AM

Unconfigured Ad

Combine de novo and reference assembly

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News