Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Scaffolding suggestion?

    Hello:

    I'm assembling a genomic region about 10Mb, using data from various platforms. Here are the types of data I have:
    1. Some Sanger sequences of BAC ends and target genes
    2. Single end 454 reads
    3. Single end 50 bp Solexa reads
    4. Paired end 74 bp Solexa reads
    I think my current strategy is to assemble those data separately into 4 pools of contigs. Then I would like to assemble the 4 pools of contigs and then scaffold them together (with the PE Solexa data). There are two strategies:
    A. Assemble those contigs first (with CAP3 or so), and use the Solexa PE reads to help scaffolding the final long contigs.
    B. Assemble the contigs together with all the Solexa PE reads, in software like MIRA, then the scaffolding process is automatically done within MIRA.
    Do people have an idea which one is better? For strategy A to work, I assume I would need to map the Solexa PE reads to the contigs (with software like BWA) and use the mapping information for scaffolding. Do people know of a scaffolding software that could deal with this?

    Thanks,
    Cheng-Ruei Lee

  • #2
    If you have a close enough reference genome, you could run different assemblies and 'merge' them using MAIA: http://bioinformatics.oxfordjournals...6/18/i433.full. I haven't used it myself, but it looks very promising! Not what you asked for, but just another idea...

    Comment


    • #3
      165 scaffold

      Dear All,
      I have sequenced a bacterial genome using solexa
      these days working working on assembly
      I have assembled it using SOAP denovo and have got 164 scaffold
      I am now confused that what must i do with the scaffold . shall i annotate the data i have got or try to improve scaffold with using other assembler
      please help

      Comment


      • #4
        You could also try running the Celera assembler, which has a built in scaffolder and supports all of the data types you mention. http://j.mp/h7uX9i

        It can have a pretty steep learning curve, but I have found it produces spectacular results. There is excellent help and how to and the team supporting it at the Venter Institute and University of Maryland CBCB are always willing to help out.
        Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio

        Comment


        • #5
          Originally posted by huma Asif View Post
          Dear All,
          I have sequenced a bacterial genome using solexa
          these days working working on assembly
          I have assembled it using SOAP denovo and have got 164 scaffold
          I am now confused that what must i do with the scaffold . shall i annotate the data i have got or try to improve scaffold with using other assembler
          please help
          Asif,

          This is a decision that is completely up to what the project dictates. You could try another assembler, like Celera, and see if you fill in gaps or produce a better assembly. If you want to annotate the genome, then scaffolding is not the sole important metric. You should look to see what your avg or N50 contig size is, if it is small, then producing good de novo annotation will be hard.
          Justin H. Johnson | Twitter: @BioInfo | LinkedIn: http://bit.ly/LIJHJ | EdgeBio

          Comment


          • #6
            N50=70234

            thank you for ur reply
            N50 of my assembly is 70234 .My project demand is just to assemble the data that i have got from illumina and to to figure out plant pathogenic genes.
            If u think that this N50 is nt bad suggest me some online bacterial genome annotation tool .I have tried Glimmer and in output i got some ORF .i want to check what they are or are they complete .
            I have expertise in Chloroplast genomics resequencing projects and have newly started working on bacterial genomics and denovo assembly so confused about how to generate complete sequence from Scaffold .this bacterial genome that i assemble through SOAP denovo shows 9942 gaps .As far as I understand i need to fill these gaps to get complete genome .At present i am not interested in completing the genome so my thought is make a rough map of the this bacteria with gap and see how many genes are covered and what do they code
            please help me with annotation tools to start with this thought

            Comment


            • #7
              Hello Huma,
              Have you tried to blast the ORF that you got to check what these ORFs could be?

              Comment


              • #8
                Yes

                yes I have checked these ORF
                Now I have covered many problems with the help of this Best forum
                I have assembled my genome again with the suggestions i got from this forum
                I have annotated them and and have checked the evolutionary genes and found that the species I am working is Pseudomonas putida.As far as I know it is not plant pathogen but having some virulence genes .these days I am trying to figure out papers on pseudomonas putida and their role in biofilm formation
                I will be obliged if I get any info about these organism from here
                Regards

                Comment


                • #9
                  how to merge mulitple scaffold files??

                  Hello All,
                  I have two scaffold sequence files obtained from assembly of SOLiD MP and 454 PE paired reads. I would like to build super scaffolds using these two scaffold sequence files with the help of 454 paired information(20kb). please suggest any pipeline/software's for this purpose.

                  Comment


                  • #10
                    Hello waterboy,

                    What is the assembly tool you used to assemble SOLiD MP data...??

                    Thanks,

                    Comment


                    • #11
                      Dear all,
                      I have sequence data in 4 contigs, please could you inform me which program to use to get one FASTA? I have no experience in this filed and please helm me.
                      Thank you in advance

                      Comment


                      • #12
                        Originally posted by iaia View Post
                        Dear all,
                        I have sequence data in 4 contigs, please could you inform me which program to use to get one FASTA? I have no experience in this filed and please helm me.
                        There's no magical solution; it will depend on the data you have & the genome you are studying. The obvious question is how many contigs do you expect to have when you are complete and why? What is the nature of the contigs and how did you generate them?

                        Comment


                        • #13
                          Hi iaia,

                          Do you just want to merge them simply? or you want the proper scaffolding based on the sequence, which contig should come first and which later?

                          Comment


                          • #14
                            Thank you for your reply,
                            yea, I wanted to scaffold them based on the sequence...

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Strategies for Sequencing Challenging Samples
                              by seqadmin


                              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                              03-22-2024, 06:39 AM
                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 03-27-2024, 06:37 PM
                            0 responses
                            12 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-27-2024, 06:07 PM
                            0 responses
                            11 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-22-2024, 10:03 AM
                            0 responses
                            53 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-21-2024, 07:32 AM
                            0 responses
                            68 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X