Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    Hi Nestor,

    unfortunately, the .tab possibility is not yet available in SSPACE, it is available in the next release though. Due to holidays and other setup for downloading the tools it is somewhat delayed, but i think it should be released before the end of this month.

    There is not yet a script for converting the mentioned formats to a .tab format. But that should not be a problem.

    Regards,
    Boetsie

    Originally posted by user1313 View Post
    Hi boetsie,

    I tried SSPACE, it is pretty good. However, i want to use sequencing technologies besides illumina, for scaffolding. Is the *.tab file for that? Also, i have the assemblies already in *.caf, *.maf and *.ace formats. Can i extract info on the mates from these (former two) files and use it in SSPACE? Do you have any scripts that can do that properly?

    Sincerely yours
    Nestor

    Comment


    • #77
      Hi DeNovoG,

      good that it worked well!

      i'm not very familiar with the AGP format, but i think you can convert the .evidence file from SSPACE using an own script. I think all the information you need for the AGP file is already present in the evidence file.

      Maybe others have worked with this format?

      Regards,
      Boetsie

      Originally posted by DeNovoG View Post
      Hello there,

      Great tool! my organellar assembly improved quite nicely.

      Now that I have the scaffolds I wanted to ask here how may upload them to genBank to comply with their requisite of uploading an AGP file for scaffolds containing: contig and amount of Ns linking them:



      Is there a way or any suggestions on how may achieve this AGP files using sspace output files? It will be great.

      Thank you very much for any info,

      Best Regards,

      G

      Comment


      • #78
        Ask for support for SOLiD mate pair data

        Hi Boetsie,
        It's a nice experience to use this tool. However could you add support for SOLiD mate pair data? I think the difference between processing base-space data and color-space data is mainly in the mapping stage.
        Best regards,
        Relipmoc

        Comment


        • #79
          Hi Boetsie,

          Really nice tool! Works really well!

          Comment


          • #80
            Hello Boetsie,
            I see most users are very happy with Sspace, congratulations!
            I was wondering whether or not Sspace could work with 300bp 454 PE reads. The manual says nothing about it and, from what I have read, bowtie is specialized on short reads mapping.
            In the Mira maillist, it was mentioned that Sspace now accepts bwa alignments (which are compatible with 454 reads), but did not find any mention in the manual of the last release, so I am not sure.
            Could you confirm it is possible to use bwa alignmets as an input for Sspace scaffolding?
            Before assembly of the reads with Mira, I used Ssaha2 to find adapters and linkers, but did not clipped it, just pointed them out, is it necessary to completely clip those out before mapping?
            Thank you in advance,

            Juan Montenegro

            Comment


            • #81
              Sorry, was away for a week, therefore i could not answer sooner. Thanks all for the nice feedback, I appreciate that! A small update; we are planning to release the new version of SSPACE before the end of the month. However, the set-up will probably differ from the previous version, more on this will come later.

              Originally posted by relipmoc View Post
              Hi Boetsie,
              It's a nice experience to use this tool. However could you add support for SOLiD mate pair data? I think the difference between processing base-space data and color-space data is mainly in the mapping stage.
              Best regards,
              Relipmoc
              Hi Relipmoc, Thank you for this suggestion, it sure would be nice to include this. I will try to see what i can do. I'm not experienced with SOLiD data however, so have to search for some examples. I've even never worked with it, since we don't have a SOLiD machine over here. Do you have any recommendations? I see Bowtie has the colorspace mapping option, so that should not be a problem. Or is it wiser to convert the SOLiD files to a fastq format like the script that BWA uses (solid2fastq.pl)?


              Hello Boetsie,
              I see most users are very happy with Sspace, congratulations!
              I was wondering whether or not Sspace could work with 300bp 454 PE reads. The manual says nothing about it and, from what I have read, bowtie is specialized on short reads mapping.
              In the Mira maillist, it was mentioned that Sspace now accepts bwa alignments (which are compatible with 454 reads), but did not find any mention in the manual of the last release, so I am not sure.
              Could you confirm it is possible to use bwa alignmets as an input for Sspace scaffolding?
              Before assembly of the reads with Mira, I used Ssaha2 to find adapters and linkers, but did not clipped it, just pointed them out, is it necessary to completely clip those out before mapping?
              Thank you in advance,

              Juan Montenegro
              Hi Juan,
              at the moment it is only possible to map ungapped reads with Bowtie. In the next release i include the possibility to allow for gaps (max is two with Bowtie) and allowance for a Tab delimited format that contains paired-read positions on contigs. Users can therefore use their own read mapper, instead of only Bowtie. A conversion script of SAM to tab format will be present in the package.

              Next step for me would be to include a large read mapper (probably BWA-SW), only problem is that it does not run in windows (like bowtie), so probably this option would not be used in the windows-version.

              I think it is always useful to clip the adapters out, I think they are useless and can only cause problems. I don't know how large read-mappers deal with these factors, but i think mapping of the reads will improve. Maybe ask this to more experienced large read-mapping users.

              Regards,
              Boetsie

              Comment


              • #82
                Thank you Boetsie for your fast reply. Just a couple of questions left.
                As I understand it, despite the algorithm they use, long read or short read aligners must have an output format that can be read by Sspace. According to the readme file of Sspace, it uses only Contig-Ends' alignments for memory reasons and the default aligner, Bowtie, cannot be use with long reads. However, it should be possible to filter and format other output alignments in order to be like Bowtie output.
                If such parser was available, do you think Sspace could deal with these alignments and use them for scaffolding?

                Comment


                • #83
                  Hi Juanda,

                  Contig-ends' alignment are used to speed-up the alignment process. If you have a contig of say 1 million basepairs, the aligner has to index this, but also try to find alignments on this index. If i just take the ends of the contigs (likely to be lower than 15.000, depends on the insert size) it will be much much faster.

                  In principal all aligner's output could be used, I just have to parse the output of the aligner to obtain the information I need. However, everyone has there favorite aligner, so it would be impossible to include all aligners in SSPACE.
                  Therefore, I'm trying to include a format where the information I need is present (basically the position of the two paired-reads on the contigs), this is the .tab format. Parse scripts could be made to convert the output of other aligners (like .sam format) to the .tab format. The .tab option will be present in the next release of SSPACE.

                  Regards,
                  Boetsie

                  Originally posted by Juanda07 View Post
                  Thank you Boetsie for your fast reply. Just a couple of questions left.
                  As I understand it, despite the algorithm they use, long read or short read aligners must have an output format that can be read by Sspace. According to the readme file of Sspace, it uses only Contig-Ends' alignments for memory reasons and the default aligner, Bowtie, cannot be use with long reads. However, it should be possible to filter and format other output alignments in order to be like Bowtie output.
                  If such parser was available, do you think Sspace could deal with these alignments and use them for scaffolding?

                  Comment


                  • #84
                    Hi all,

                    At the past few months we’ve received a number of suggestions/wishes to improve our SSPACE software, as well as very nice compliments. Thank you all for that!

                    We have done our best to include as much suggestions as possible and this has resulted in a new release we are very proud of, named SSPACE premium! In contrast of our older SSPACE (which will still be free of charge), for obtaining the SSPACE premium version we ask you gently to donate a small contribution to compensate a bit for the large effort that has been put in the development of this program (see the bottom of my post).

                    The new features of SSPACE premium are;
                    • A pre-filtering step has been introduced to remove linkages with repetitive contigs. This reduces mistakes in placing repeats within scaffolds.
                    • The linkage-ratio now also takes the contig length into account. This was done to normalize for linkage over-estimation. Larger contigs tend to have more links than smaller contigs and this leads to a bias in the linkage-ratio (this improvement is especially an advantage for matepairs).
                    • The overall speed and memory usage has been significantly improved.
                    • The alignment with Bowtie can now also be performed in multithread.
                    • The user can also specify to perform a Bowtie gapped-alignment. This is especially of use for larger sequence reads (such as Roche 454).
                    • We now also include additional orientations of the paired-reads, including -> <-, <- ->, <- <- and ->-> and orientations. Users do not have to convert the read-direction of the input files.
                    • Added a customized input option that allows for tab-delimited file format (containing the positions for the paired-read on the contigs).
                    • A script is available to convert .sam or .bam files to .tab files.
                    • A full list of changes/improvements with respect to the SSPACE basic is included in the README file.

                    We would like to stress that it is NOT our intention to commercialize this software! It will still be released with the GNU open source license. However, we hope you understand that small contributions can allow us to further work on our software that (hopefully) serve a large NGS community. Moreover part of the donation is also used to guarantee that users receive good support from our bioinformaticians. For each download we ask 250 euro. Nonetheless you can get SSPACE premium for free if you combine it to a NextGen sequencing project

                    SSPACE premium can be obtained at http://www.baseclear.com/landingpages/sspace-premium/

                    Attached is a benchmark of how SSPACE performs on a number of different de novo assemblies and different species. For the two bacterial species the contigs were aligned against a reference genome and the quality of the scaffolds, in terms of correct consecutive contigs within the scaffold, were estimated. The analysis were run on a 32Gb linux machine, with default SSPACE settings.

                    Kind regards,
                    Boetsie
                    Attached Files

                    Comment


                    • #85
                      Hello Boetsie

                      I used your SSPACE for scaffolding my contigs generated from ABySS on human sample. I experimented it in two ways:

                      1.> In the library files I provided zipped compressed fastq files
                      2.> In the library files I provided uncompressed fastq files

                      The results came different. The N50 for the first case is 1424 and in the uncompressed second case is 1995. Do you think there is an advantage of using uncompressed file than compressed files , as a bigger N50 is usually desirable.

                      Aby

                      Comment


                      • #86
                        Hi Aby,

                        Well, as far as I know it is not possible to load in zipped compressed files into SSPACE. But i must say i have never tried it. So I think it is indeed desirable to use uncompressed files instead of compressed files, simply because i don't know what it does I see an advantage though in terms of speed for the users, so i'll have a look if i can put in the option to include fastq.gz format.

                        Regards,
                        Boetsie

                        Comment


                        • #87
                          SSPACE Not working for SOAPdenovo Contigs

                          Dear Boetsi

                          I tried SSPACE on SOAPdenovo contig file which had a size of 6.2 GB. SSPACE crashed giving error of that the characters exceeded 2^32-1 characters! Does SSPACE not work for huge contig files ?


                          Aby
                          Last edited by narain; 08-26-2011, 12:46 AM. Reason: the second part is already answered.

                          Comment


                          • #88
                            SOAPdenovo scaffolder performing better than SSPACE scaffolder

                            Dear Boetsi

                            I had a 110 GB data of paired end reads of 90 bases length each end, which I assembled to contigs using ABySS and SOAPdenovo separately. ABySS gave a contig file of size 4.4 GB and N50 value of 1424 and SOAPdenovo gave a contig file of size 6.2 GB and N50 value of 681 . Hence as the N50 value of ABySS is bigger, it might look better contigs. I used abyss-fac tool to get the N50 value.

                            I tried SSPACE on ABySS contigs to form scaffold and I get a scaffold file of size 4.9 GB. Its strange that the scaffold file generated is bigger than the corresponding contig file. The N50 of this scaffold file is 1995, which is only slightly larger than that of its contigs. I tried SOAPdenovo scaff on SOAPdenovo contigs to get the scaffold of size 2.6 GB which has an N50 of 14746. This indicates that SOAPdenovo scaff is better than SSPACE as the N50 value is much larger! I used default values of k=5, n=15 and a=0.7 when using SSPACE! Do you recommend changing these values ? I am trying once again with lower n say 3.

                            However to confirm this I should run the two scaffolders for the same contig files. Now, SOAPdenovo scaff scaffolder does not accept any contig files other than that generated by SOAPdenovo itself as it makes use of many other intermediate files other than just the contig files. So I am left with the option of trying SSPACE on the contig file generated by SOAPdenovo. I tried SSPACE on SOAPdenovo contig file which had a size of 6.2 GB. SSPACE crashed giving error of that the characters exceeded 2^32-1 characters! Does SSPACE not work for huge contig files such as those generated by SOAPdenovo for human genome ?


                            Aby

                            P.S.
                            Getting to the details of the read files, I have 6 lanes of data of each about 18 GB in fastq format, making it roughly 110 GB in total. Indeed this looks like slightly less data for a good denovo assembly, but should not be too bad. Since the coverage is not that awesome, though the read lengths are about 90 bases which is good, I am using lower kmer value of 25 for both ABySS and SOAPdenovo in the command. The n value which signifies Minimum overlapping reads to make a contig is set to 10 (default value of the programs).
                            Last edited by narain; 08-26-2011, 10:11 AM. Reason: To inform the complete picture.

                            Comment


                            • #89
                              Originally posted by narain View Post
                              Dear Boetsi

                              I had a 110 GB data of paired end reads of 90 bases length each end, which I assembled to contigs using ABySS and SOAPdenovo separately. ABySS gave a contig file of size 4.4 GB and N50 value of 1424 and SOAPdenovo gave a contig file of size 6.2 GB and N50 value of 681 . Hence as the N50 value of ABySS is bigger, it might look better contigs. I used abyss-fac tool to get the N50 value.

                              I tried SSPACE on ABySS contigs to form scaffold and I get a scaffold file of size 4.9 GB. Its strange that the scaffold file generated is bigger than the corresponding contig file. The N50 of this scaffold file is 1995, which is only slightly larger than that of its contigs. I tried SOAPdenovo scaff on SOAPdenovo contigs to get the scaffold of size 2.6 GB which has an N50 of 14746. This indicates that SOAPdenovo scaff is better than SSPACE as the N50 value is much larger! I used default values of k=5, n=15 and a=0.7 when using SSPACE! Do you recommend changing these values ? I am trying once again with lower n say 3.

                              However to confirm this I should run the two scaffolders for the same contig files. Now, SOAPdenovo scaff scaffolder does not accept any contig files other than that generated by SOAPdenovo itself as it makes use of many other intermediate files other than just the contig files. So I am left with the option of trying SSPACE on the contig file generated by SOAPdenovo. I tried SSPACE on SOAPdenovo contig file which had a size of 6.2 GB. SSPACE crashed giving error of that the characters exceeded 2^32-1 characters! Does SSPACE not work for huge contig files such as those generated by SOAPdenovo for human genome ?


                              Aby

                              P.S.
                              Getting to the details of the read files, I have 6 lanes of data of each about 18 GB in fastq format, making it roughly 110 GB in total. Indeed this looks like slightly less data for a good denovo assembly, but should not be too bad. Since the coverage is not that awesome, though the read lengths are about 90 bases which is good, I am using lower kmer value of 25 for both ABySS and SOAPdenovo in the command. The n value which signifies Minimum overlapping reads to make a contig is set to 10 (default value of the programs).
                              Hi Aby,

                              I'm not the one who tells which program is better or not. I would like to point out here that you have a library length of 90bp. SSPACE will map entire read to the contigs. So the mapping stage is sensitive for both the quality of the reads, as well as the quality of the contigs. SOAP, on the other hand, as far as i am aware of, makes use of k-mer sequences to map the paired-reads to the contigs. Say you have set an k-mer of 25bp, the chance of good mapping pairs is much higher.

                              I suggest to trim the paired-reads to remove low-quality basepairs, so SSPACE can map the reads better. Or you can change the code in SSPACE's mapWithBowtie.pl to set the maximum number of gaps to, for example, 3 basepairs. If you need help for this, please ask me.

                              Regards,
                              Boetsie

                              Comment


                              • #90
                                Hi boetsie

                                I'm eager to use your program after having to 'manually' extend contigs using combinations of patman/bowtie/velvet processes.

                                I initially had a bowtie-build error which was resolved by giving chmod a+x to all the files in the SSPACE subdirectories. I am using the latest version, v1.1.

                                Unfortunately I'm getting another error when i use the -x 1 option.

                                ######################################
                                Finished Collecting Overlapping Reads - BUILDING CONSENSUS...
                                Undefined subroutine &main:umper called at /usr/local/bin/SSPACE-1.1_linux-x86_64/bin/ExtendOrFormatContigs.pl line 212, <IN> line 8.

                                LIBRARY pass7
                                ------------------------------------------------------------

                                =>Mon Aug 29 13:44:56 2011: Building Bowtie index for contigs (tmp.pass7_sspace/subset_contigs.fasta)
                                Warning: Empty input file
                                Reference file does not seem to be a FASTA file
                                Command: /usr/local/bin/SSPACE-1.1_linux-x86_64/bowtie/bowtie-build --quiet --noref tmp.pass7_sspace/subset_contigs.fasta bowtieoutput/pass7_sspace.pass7.bowtieIndex
                                #######################################

                                I can't find the 'tmp.pass7_sspace/subset_contigs.fasta' file anywhere, but perhaps this has something to do with the undefined subroutine &main:umper? Also, I do have many unmapped reads, so I'm thinking it should be able to extend?

                                When I use the -x 0 option however, I am able to finish with no problems. I don't think I have any problems with my inputs.

                                My invocation was:
                                perl /usr/local/bin/SSPACE-1.1_linux-x86_64/SSPACE_v1-1.pl -l library.txt -s contigs.fa -x 1 -m 50 -o 20 -p 1 -b pass7_sspace -v 1

                                Could you comment?
                                Thank you,
                                kennels
                                Last edited by Kennels; 08-28-2011, 11:32 PM. Reason: add more detail

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 08:47 AM
                                0 responses
                                12 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                59 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                54 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X