Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to align contigs?

    I'm probably asking a basic question, but I've searched for hours and can't seem to find a straight answer.

    We have recently sequenced the entire genome (~5 MB) of a Salmonella strain using a brand new 454 sequencer. Ours was one of the first sequences ran. Since this is new, no one here really knows what to do with the data.

    I ran the sff reads through gsAssembler (i.e. Newbler) and now have contigs. There are several strains of Salmonella that have been sequenced and fully annotated. Thus, I believe it would be easiest to compare the contigs to a reference strain to figure out what gaps need to be filled. I used Gs Reference Mapper to do this, but the data that comes out of Mapper is significantly less than what comes out of Assembler. Thus, I think Mapper might be chopping up the contigs to make them fit better.

    Is there a program where I can use the contigs produced from Assember (which are .ace files) and compare them to a reference sequence that I have in a .fasta format? I have access to Consed, but can't seem to add a .fasta file into Consed to use as a reference.

    Thanks for the help!

  • #2
    Have you tried Mauve Genome Aligner? It's available at http://gel.ahabs.wisc.edu/mauve/.

    Comment


    • #3
      blat software is pretty good to compare 2 sets of bacterial contigs.

      blat is faster than blast, and by default it generates an excel compatible tab delimited table. This is very easy to view from Excel, or parse for follow up reviews.

      blat is freeware for academic usage, and can be downloaded from web.

      Comment


      • #4
        Originally posted by azmicro View Post
        I'm probably asking a basic question, but I've searched for hours and can't seem to find a straight answer.

        We have recently sequenced the entire genome (~5 MB) of a Salmonella strain using a brand new 454 sequencer. Ours was one of the first sequences ran. Since this is new, no one here really knows what to do with the data.

        I ran the sff reads through gsAssembler (i.e. Newbler) and now have contigs. There are several strains of Salmonella that have been sequenced and fully annotated. Thus, I believe it would be easiest to compare the contigs to a reference strain to figure out what gaps need to be filled. I used Gs Reference Mapper to do this, but the data that comes out of Mapper is significantly less than what comes out of Assembler. Thus, I think Mapper might be chopping up the contigs to make them fit better.

        Is there a program where I can use the contigs produced from Assember (which are .ace files) and compare them to a reference sequence that I have in a .fasta format? I have access to Consed, but can't seem to add a .fasta file into Consed to use as a reference.

        Thanks for the help!
        I am working on something very similar. How large are your contigs? and is there some headway you made that you can share?
        --
        bioinfosm

        Comment


        • #5
          Actually you can use the Fasta file of your contigs instead of .ace file. There are a bunch of softwares that can be used to map your target contigs to the reference genome. OSLay is a pretty one (http://www-ab.informatik.uni-tuebing...y/welcome.html). PGA4genomics can also be used to assemble your contigs following one or more reference genome (http://nar.oxfordjournals.org/cgi/content/full/gkn168v1).
          You can also use MUMmer to layout the contigs.

          Comment


          • #6
            Originally posted by azmicro View Post
            Thus, I believe it would be easiest to compare the contigs to a reference strain to figure out what gaps need to be filled. I used Gs Reference Mapper to do this, but the data that comes out of Mapper is significantly less than what comes out of Assembler. Thus, I think Mapper might be chopping up the contigs to make them fit better.
            Just to clarify: are you sure you compared the contigs, and not the original reads, to the reference strains?

            But more to the point - if your strain is divergent enough from your reference strains, then it doesn't seem surprising to me that you'd get less coverage by mapping from one strain to another, than by assembling your new strain de novo ... i.e. your mapping is failing wherever there's enough divergence, whereas if you have good reads, your assembly will cover divergent regions as well as homologous regions.

            Comment


            • #7
              OSLay is brialliant for the purpose I wanted .. thanks much
              --
              bioinfosm

              Comment


              • #8
                I figured out what was wrong! I used Mauve to compare the 454ContigsAll.fna file that came out of Assembler to a reference .fasta genome I downloaded from GenBank. Mauve provides a really nice visualization of where the contigs match up. Through Mauve I found contigs that did not match the reference sequence. When I BLASTed these contigs, I discovered they matched up to a Salmonella plasmid. For the sequencing I just did a genomic prep and didn't even think to separate out the plasmid DNA. Thus, Assembler's output included contigs that matched up to a plasmid whereas Mapper only included contigs that matched the reference sequence. Hence, the discrepancy between the amount of data output. This definitely makes my life easier!

                And in response to jnfass: Mapper compares the reads to a reference sequences and assembles contigs based on that reading. Mapper then gives you much longer and thus far fewer contigs than Assembler.

                Comment


                • #9
                  Glad you found your solution, azmicro ..
                  but I'd have to quibble that the number and length of contigs you'll get, and whether you get better (de novo) assemblies or (mapped) assemblies, will definitely depend on how divergent your reference and sequenced species are ... yours must be pretty close (being different strains, but not different species? maybe?)

                  Comment


                  • #10
                    I have a multi-chromosome reference sequence, and I want to map my 454-generated contigs (not reads) from a closely-related species, against it. The contigs are in one large multi-record FASTA file, the chromosomes are in one large Genbank (.gbk) file, i.e., a single file with 15 sets of features plus sequence, ordered 1 through 15. I've tried Mauve Contig Mover but while it did what looks like a great mapping job, and nicely displays the contig and chromosome boundary information (and annotations of the reference sequece) in the final alignment graphic, none of the output files I see allow me to easily map contigs on a per-chromosome basis (e.g., "this set of contigs maps to chromosome 12 in this order and orientation...."). The .tab file in the output gives ordered contig coordinates on a single giant pseudochromosome, which is all but useless to me without an indication of how these relate to the chromosome boundaries of the reference sequence. The output also includes a contig directory... which is empty....?

                    Ultimately what I'm aiming for are synteny maps of each chromosome in my reference genome. I realize Mauve was developed mainly on prokaryotic (single chromosome) genomes, but am I missing something here? Is there an easy way to do what I want with Mauve, that I'm not seeing, short of running each chromosome separately as a reference sequence? If not, should I be trying a different contig mapper?

                    Comment


                    • #11
                      Since this is such an old thread (that I happened to be subscribed to), may I suggest starting a new one with your question...

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Techniques and Challenges in Conservation Genomics
                        by seqadmin



                        The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                        Avian Conservation
                        Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                        03-08-2024, 10:41 AM
                      • seqadmin
                        The Impact of AI in Genomic Medicine
                        by seqadmin



                        Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                        02-26-2024, 02:07 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 03-14-2024, 06:13 AM
                      0 responses
                      33 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-08-2024, 08:03 AM
                      0 responses
                      72 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-07-2024, 08:13 AM
                      0 responses
                      81 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 03-06-2024, 09:51 AM
                      0 responses
                      68 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X