SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > 454 Pyrosequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
Annotate contigs with BLAST hit names; remove contigs with no hit Bueller_007 Bioinformatics 10 02-27-2013 11:22 AM
How to align SOLiD data? Heisman SOLiD 11 01-29-2012 05:21 PM
Align primer against NGS contigs? -yl- Bioinformatics 3 11-20-2011 11:01 PM
Align reads to contigs ojy Bioinformatics 3 07-25-2011 10:16 AM
Who is the best way to align/assemble to a reference? anyone1985 Bioinformatics 3 04-30-2009 06:40 PM

Reply
 
Thread Tools
Old 12-30-2008, 05:06 PM   #1
azmicro
Junior Member
 
Location: Arizona

Join Date: Dec 2008
Posts: 2
Default How to align contigs?

I'm probably asking a basic question, but I've searched for hours and can't seem to find a straight answer.

We have recently sequenced the entire genome (~5 MB) of a Salmonella strain using a brand new 454 sequencer. Ours was one of the first sequences ran. Since this is new, no one here really knows what to do with the data.

I ran the sff reads through gsAssembler (i.e. Newbler) and now have contigs. There are several strains of Salmonella that have been sequenced and fully annotated. Thus, I believe it would be easiest to compare the contigs to a reference strain to figure out what gaps need to be filled. I used Gs Reference Mapper to do this, but the data that comes out of Mapper is significantly less than what comes out of Assembler. Thus, I think Mapper might be chopping up the contigs to make them fit better.

Is there a program where I can use the contigs produced from Assember (which are .ace files) and compare them to a reference sequence that I have in a .fasta format? I have access to Consed, but can't seem to add a .fasta file into Consed to use as a reference.

Thanks for the help!
azmicro is offline   Reply With Quote
Old 01-02-2009, 09:08 AM   #2
tweist
Junior Member
 
Location: boston, usa

Join Date: Aug 2008
Posts: 3
Default

Have you tried Mauve Genome Aligner? It's available at http://gel.ahabs.wisc.edu/mauve/.
tweist is offline   Reply With Quote
Old 01-02-2009, 02:49 PM   #3
hlu
Member
 
Location: Branford, Connecticut

Join Date: Jan 2009
Posts: 32
Default

blat software is pretty good to compare 2 sets of bacterial contigs.

blat is faster than blast, and by default it generates an excel compatible tab delimited table. This is very easy to view from Excel, or parse for follow up reviews.

blat is freeware for academic usage, and can be downloaded from web.
hlu is offline   Reply With Quote
Old 01-06-2009, 09:41 AM   #4
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Quote:
Originally Posted by azmicro View Post
I'm probably asking a basic question, but I've searched for hours and can't seem to find a straight answer.

We have recently sequenced the entire genome (~5 MB) of a Salmonella strain using a brand new 454 sequencer. Ours was one of the first sequences ran. Since this is new, no one here really knows what to do with the data.

I ran the sff reads through gsAssembler (i.e. Newbler) and now have contigs. There are several strains of Salmonella that have been sequenced and fully annotated. Thus, I believe it would be easiest to compare the contigs to a reference strain to figure out what gaps need to be filled. I used Gs Reference Mapper to do this, but the data that comes out of Mapper is significantly less than what comes out of Assembler. Thus, I think Mapper might be chopping up the contigs to make them fit better.

Is there a program where I can use the contigs produced from Assember (which are .ace files) and compare them to a reference sequence that I have in a .fasta format? I have access to Consed, but can't seem to add a .fasta file into Consed to use as a reference.

Thanks for the help!
I am working on something very similar. How large are your contigs? and is there some headway you made that you can share?
bioinfosm is offline   Reply With Quote
Old 01-08-2009, 12:33 PM   #5
biofqzhao
Member
 
Location: Penn State

Join Date: Jan 2009
Posts: 14
Default

Actually you can use the Fasta file of your contigs instead of .ace file. There are a bunch of softwares that can be used to map your target contigs to the reference genome. OSLay is a pretty one (http://www-ab.informatik.uni-tuebing...y/welcome.html). PGA4genomics can also be used to assemble your contigs following one or more reference genome (http://nar.oxfordjournals.org/cgi/content/full/gkn168v1).
You can also use MUMmer to layout the contigs.
biofqzhao is offline   Reply With Quote
Old 01-09-2009, 12:05 PM   #6
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default

Quote:
Originally Posted by azmicro View Post
Thus, I believe it would be easiest to compare the contigs to a reference strain to figure out what gaps need to be filled. I used Gs Reference Mapper to do this, but the data that comes out of Mapper is significantly less than what comes out of Assembler. Thus, I think Mapper might be chopping up the contigs to make them fit better.
Just to clarify: are you sure you compared the contigs, and not the original reads, to the reference strains?

But more to the point - if your strain is divergent enough from your reference strains, then it doesn't seem surprising to me that you'd get less coverage by mapping from one strain to another, than by assembling your new strain de novo ... i.e. your mapping is failing wherever there's enough divergence, whereas if you have good reads, your assembly will cover divergent regions as well as homologous regions.
jnfass is offline   Reply With Quote
Old 01-09-2009, 02:25 PM   #7
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

OSLay is brialliant for the purpose I wanted .. thanks much
bioinfosm is offline   Reply With Quote
Old 01-09-2009, 04:45 PM   #8
azmicro
Junior Member
 
Location: Arizona

Join Date: Dec 2008
Posts: 2
Default

I figured out what was wrong! I used Mauve to compare the 454ContigsAll.fna file that came out of Assembler to a reference .fasta genome I downloaded from GenBank. Mauve provides a really nice visualization of where the contigs match up. Through Mauve I found contigs that did not match the reference sequence. When I BLASTed these contigs, I discovered they matched up to a Salmonella plasmid. For the sequencing I just did a genomic prep and didn't even think to separate out the plasmid DNA. Thus, Assembler's output included contigs that matched up to a plasmid whereas Mapper only included contigs that matched the reference sequence. Hence, the discrepancy between the amount of data output. This definitely makes my life easier!

And in response to jnfass: Mapper compares the reads to a reference sequences and assembles contigs based on that reading. Mapper then gives you much longer and thus far fewer contigs than Assembler.
azmicro is offline   Reply With Quote
Old 01-09-2009, 05:19 PM   #9
jnfass
Member
 
Location: Davis, CA

Join Date: Aug 2008
Posts: 88
Default

Glad you found your solution, azmicro ..
but I'd have to quibble that the number and length of contigs you'll get, and whether you get better (de novo) assemblies or (mapped) assemblies, will definitely depend on how divergent your reference and sequenced species are ... yours must be pretty close (being different strains, but not different species? maybe?)
jnfass is offline   Reply With Quote
Old 09-22-2010, 02:51 PM   #10
ssully
Member
 
Location: NYC

Join Date: Aug 2010
Posts: 48
Default

I have a multi-chromosome reference sequence, and I want to map my 454-generated contigs (not reads) from a closely-related species, against it. The contigs are in one large multi-record FASTA file, the chromosomes are in one large Genbank (.gbk) file, i.e., a single file with 15 sets of features plus sequence, ordered 1 through 15. I've tried Mauve Contig Mover but while it did what looks like a great mapping job, and nicely displays the contig and chromosome boundary information (and annotations of the reference sequece) in the final alignment graphic, none of the output files I see allow me to easily map contigs on a per-chromosome basis (e.g., "this set of contigs maps to chromosome 12 in this order and orientation...."). The .tab file in the output gives ordered contig coordinates on a single giant pseudochromosome, which is all but useless to me without an indication of how these relate to the chromosome boundaries of the reference sequence. The output also includes a contig directory... which is empty....?

Ultimately what I'm aiming for are synteny maps of each chromosome in my reference genome. I realize Mauve was developed mainly on prokaryotic (single chromosome) genomes, but am I missing something here? Is there an easy way to do what I want with Mauve, that I'm not seeing, short of running each chromosome separately as a reference sequence? If not, should I be trying a different contig mapper?
ssully is offline   Reply With Quote
Old 10-06-2010, 07:14 AM   #11
flxlex
Moderator
 
Location: Oslo, Norway

Join Date: Nov 2008
Posts: 415
Default

Since this is such an old thread (that I happened to be subscribed to), may I suggest starting a new one with your question...
flxlex is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:59 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO