Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • boetsie
    Senior Member
    • Feb 2010
    • 245

    SSPACE: a new stand-alone scaffolding tool for small and large genomes

    Hi all,

    during my Master thesis I developed a stand-alone scaffolding tool named SSPACE for scaffolding pre-assembled contigs using paired-read data. I developed this program since I couldn't find a program which was able to do this, except from Bambus. However, we had lots of issues on Bambus, including errors and complicated input datasets.

    Therefore, SSPACE was developed. The main featues are;

    * Inputs are simple FASTA contig sequences as well as (multiple) FASTA/FASTQ paired-read data
    * High-quality scaffolds in a short runtime and limited memory requirements
    * High reduction of the amount of contigs stored into scaffolds and high N50 value
    * Multiple library input of both paired-end and/or mate pair datasets
    * Possible contig extension of unmapped sequence reads
    * Easy interpretation of the final scaffolds
    * Visualization of the final scaffolds using GraphViz

    SSPACE has been tested on the E.coli, Grosmannia clavigera and Giant Panda genomes and showed less scaffolds and higher N50 value compared with the produced scaffolds from common de novo assemblers, like Abyss and SOAPdeNovo.

    SSPACE is freely available at


    The publication is accepted at bioinformatics and will be online soon. Publication shows more detailed information about the produced scaffolds and their quality, including time and memory information.

    Hope it could be useful and any comments or questions are ofcourse welcome.

    Cheers,
    Boetsie
  • boetsie
    Senior Member
    • Feb 2010
    • 245

    #2
    Hi all,

    publication of SSPACE is now available at;



    Boetsie

    Comment

    • ganga.jeena
      Member
      • Jun 2010
      • 15

      #3
      congrats


      Its grt to hear such an achievement.
      Is your paper freely available.
      Can you mail me downloadable software copy
      Regards,
      Ganga Jeena

      Comment

      • dan
        wiki wiki
        • Jul 2008
        • 194

        #4
        Congrats!

        Before I get into the paper, can I ask if this tool supports 'hierarchical scaffolding' in the way that Bambus (supposedly) does? i.e. If I want to add in 'scaffolding' information based on gene synteny from some related organisms, can I add that in but with a lower priority than the true PE/MP data?

        Does it detect repeats from the graph structure like Bambus does now?

        I'm curious because Bambus promises a lot of nice functionality, which is why I keep hammering away at it. However, I'm starting to wonder if it's time to jump ship to a tool that is more robust (if perhaps less feature rich).


        Cheers,
        Dan.
        Homepage: Dan Bolser
        MetaBase the database of biological databases.

        Comment

        • dan
          wiki wiki
          • Jul 2008
          • 194

          #5
          Nice paper! The question that arises is weather we can feed PE data directly to the algorithm, rather than being shoehorned through Bowtie?

          For example, Bowtie may not be the best tool for aligning 454 reads to contigs, but I'd still like to use 454 PE data to scaffold my assembly. Is there some intermediate file or Bowtie like PE format that we can feed to SSPACE?

          Unfortunately parts of http://bioinformatics.oxfordjournals.org are down, so I can't see the supplementary figure, sorry if that would help address my question.
          Homepage: Dan Bolser
          MetaBase the database of biological databases.

          Comment

          • boetsie
            Senior Member
            • Feb 2010
            • 245

            #6
            Hi Dan,

            thanks for your reply!

            It does not fully supports the same hierarchical scaffolding as Bambus. We use a simple approach;

            1) Produce scaffolds using the first library
            2) Use scaffolds from 1), and produce scaffolds using the second library
            3) and so on...

            we do not use a priority for the libraries, like Bambus. We let the user determine what order of library is used.

            It is able to detect repeats by determining the number of incoming and outcoming 'links' between contigs. Repeats are outputted by the program.

            Bambus has indeed more functionality. However, we found that the input options were too complex for simple scaffolding purposes.

            About your question about Bowtie;
            Unfortunately, only Bowtie is supported at the moment, as SSPACE was designed for Illumina input (or other short paired reads) and based on Bowtie output.

            My question; What program do people use for aligning 454 reads, can it produce similar output as Bowtie?

            Cheers,
            Boetsie

            Originally posted by dan View Post
            Congrats!

            Before I get into the paper, can I ask if this tool supports 'hierarchical scaffolding' in the way that Bambus (supposedly) does? i.e. If I want to add in 'scaffolding' information based on gene synteny from some related organisms, can I add that in but with a lower priority than the true PE/MP data?

            Does it detect repeats from the graph structure like Bambus does now?

            I'm curious because Bambus promises a lot of nice functionality, which is why I keep hammering away at it. However, I'm starting to wonder if it's time to jump ship to a tool that is more robust (if perhaps less feature rich).


            Cheers,
            Dan.

            Comment

            • dan
              wiki wiki
              • Jul 2008
              • 194

              #7
              Thanks for the clear reply Boetsie, really great to hear that you do do repeat filtering based on graph structure, and allowing the user to pick the order of the libraries seems like a nice strategy.

              I've been using Newbler to align 454's PE data to contigs. Newbler automatically handles the specifics of the 454 style PE reads so, although it isn't the best aligner for 454, it is very easy to use the results, which are just tab delimited... You can read about the format of the Newbler PE data here!

              Newbler can be persuaded to output ace-like format too, but it doesn't do SAM/BAM IIRC.

              I was looking at the code, and it should be easy enough to feed in the data to SSPACE ;-)
              Homepage: Dan Bolser
              MetaBase the database of biological databases.

              Comment

              • sjackman
                Member
                • Mar 2009
                • 15

                #8
                Hi Boetsie,

                Does SSPACE use the SAM output format of Bowtie? If not, could it?

                Cheers,
                Shaun

                Comment

                • boetsie
                  Senior Member
                  • Feb 2010
                  • 245

                  #9
                  Hi Shaun,

                  no it does not, it uses the standard output from bowtie. With modifications to the script, it should be possible to use the SAM format.

                  Cheers,
                  Boetsie

                  Comment

                  • corthay
                    Member
                    • Oct 2008
                    • 25

                    #10
                    BAC / Fosmid end

                    Hi boetsie,

                    Can I use additional BAC/Fosmid ends for scaffolding the pre-assebmled contigs
                    or scaffolds with SSPACE? If it's possible, is there any parameter for this purpose?

                    Thanks,
                    Corthay

                    Comment

                    • boetsie
                      Senior Member
                      • Feb 2010
                      • 245

                      #11
                      Originally posted by corthay View Post
                      Hi boetsie,

                      Can I use additional BAC/Fosmid ends for scaffolding the pre-assebmled contigs
                      or scaffolds with SSPACE? If it's possible, is there any parameter for this purpose?

                      Thanks,
                      Corthay
                      Hi Corthay,

                      i'm not very familiar with BAC/fosmid ends, so there is no parameter for this purpose. However, if;
                      - these are paired sequences
                      - the sequences' lengths are below 1024 (maximum input of Bowtie)
                      - the pairs have either orientation of --> <-- (typical paired-end) or <-- --> (typical mate pair)

                      I see no problems why you should not give it a try if it satisfies the above points.

                      Kind regards,
                      Boetsie

                      Comment

                      • dan
                        wiki wiki
                        • Jul 2008
                        • 194

                        #12
                        What would be great is a simple tab delimited format for providing paired sequence alignments, rather than going via Bowtie... I had a quick look at the code, but unfortunately I couldn't work out where to add such functionality easily. I'll have another look at some point if nobody else does.
                        Homepage: Dan Bolser
                        MetaBase the database of biological databases.

                        Comment

                        • corthay
                          Member
                          • Oct 2008
                          • 25

                          #13
                          Hi Boetsie,

                          Thanks for the response.

                          I've just specified "k=2" as clone coverage of BAC ends is almost 5x.
                          As a result, scaffolds N50 is a bit improved and the number of scaffolds is reduced. Thanks for the development of useful tool.

                          Corthay.


                          Originally posted by boetsie View Post
                          Hi Corthay,

                          i'm not very familiar with BAC/fosmid ends, so there is no parameter for this purpose. However, if;
                          - these are paired sequences
                          - the sequences' lengths are below 1024 (maximum input of Bowtie)
                          - the pairs have either orientation of --> <-- (typical paired-end) or <-- --> (typical mate pair)

                          I see no problems why you should not give it a try if it satisfies the above points.

                          Kind regards,
                          Boetsie

                          Comment

                          • boetsie
                            Senior Member
                            • Feb 2010
                            • 245

                            #14
                            Originally posted by dan View Post
                            What would be great is a simple tab delimited format for providing paired sequence alignments, rather than going via Bowtie... I had a quick look at the code, but unfortunately I couldn't work out where to add such functionality easily. I'll have another look at some point if nobody else does.
                            Hi Dan,

                            i know what you mean, but than multiple library input can't be used since we do an hierarchical clustering (first generate scaffolds using one library, than produce scaffolds by aligning next library on first scaffolds and produce new scaffolds etc...). So for each library we align the reads to the new scaffolds. Therefore, no predefined paired sequence alignments could be provided, except if only one library is used. In addition, if we have such an input we would be very similar to Bambus. Our purpose is to have an easy to use scaffolder without providing complex input formats, but with a simple fasta input.

                            Next week, i'll try to provide another alignment tool (e.g. Newbler) to map long reads to the contigs/scaffolds.

                            Kind regards,
                            Boetsie

                            Comment

                            • boetsie
                              Senior Member
                              • Feb 2010
                              • 245

                              #15
                              Originally posted by corthay View Post
                              Hi Boetsie,

                              Thanks for the response.

                              I've just specified "k=2" as clone coverage of BAC ends is almost 5x.
                              As a result, scaffolds N50 is a bit improved and the number of scaffolds is reduced. Thanks for the development of useful tool.

                              Corthay.
                              Hi Corthay,

                              great that it worked and that it improved your assembly a bit!

                              Kind regards,
                              Boetsie

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              22 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              27 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              38 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              61 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...