Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by boetsie View Post
    ... since we do an hierarchical clustering ... for each library we align the reads to the new scaffolds, therefore, no predefined paired sequence alignments could be provided ...
    What you need to do is track the positions of these features from the input contigs onto the output scaffolds to internally generate a new tab-delimited input file with the right coordinates... I tried doing this with BioPerl, but unfortunately got tied in knots with the cryptic class hierarchy.

    In theory it shouldn't be hard to say 'position x on contig y in the input is now position j on scaffold k in the output', and simply run it again for the new library. However, I guess there is quite a bit of complexity to such a code.


    Anyway, just a suggestion for improvement of an already useful tool!

    Cheers,
    Dan.
    Homepage: Dan Bolser
    MetaBase the database of biological databases.

    Comment


    • #17
      Originally posted by dan View Post
      What you need to do is track the positions of these features from the input contigs onto the output scaffolds to internally generate a new tab-delimited input file with the right coordinates... I tried doing this with BioPerl, but unfortunately got tied in knots with the cryptic class hierarchy.

      In theory it shouldn't be hard to say 'position x on contig y in the input is now position j on scaffold k in the output', and simply run it again for the new library. However, I guess there is quite a bit of complexity to such a code.


      Anyway, just a suggestion for improvement of an already useful tool!

      Cheers,
      Dan.
      Hi Dan,

      first of all, thank you for the suggestions and the positive feedback!

      I see what you mean, and i think it is indeed a useful function to allow other input formats. I think as a start it would be nice to allow .sam format inputs.

      About remembering the positions i'm doing quite the same with remembering which contigs are on which scaffolds after each library. I think the same trick could be applied for mapping.
      I'll see what i can do.

      Thanks,
      Boetsie

      Comment


      • #18
        Hi boetsie again,

        I would like to ask you if only unique mapped reads are used for the scaffolding.

        If not, I am planing to mask repeat sequence before scaffolding.

        Thanks,
        Corthay

        Comment


        • #19
          Originally posted by corthay View Post
          Hi boetsie again,

          I would like to ask you if only unique mapped reads are used for the scaffolding.

          If not, I am planing to mask repeat sequence before scaffolding.

          Thanks,
          Corthay
          Hi again

          I indeed use only reads that can uniquely map to only one position on all the contigs. I use the option -m 1 from Bowtie (see; http://bowtie-bio.sourceforge.net/ma...html#reporting). Otherwise, it is impossible to know what link should be made if a read maps to multiple contigs.

          Is this what you mean?

          Kind regards,
          Boetsie

          Comment


          • #20
            Hi boetsie,

            Thanks for your quick reply. I understood how uniqueness is guaranteed.
            Then, I have two more questions please.

            Firstly, I am wondering why the total bases of scaffolds without N is increased even though I set 0 for "-x" option.

            Secondly, how do you calculate the distance of reads within a given contig pair.
            Do you estimate the size of gap using reads, or gap size is just ignored ?

            Sorry for asking so many questions.

            Thanks
            Corthay.


            Originally posted by boetsie View Post
            Hi again

            I indeed use only reads that can uniquely map to only one position on all the contigs. I use the option -m 1 from Bowtie (see; http://bowtie-bio.sourceforge.net/ma...html#reporting). Otherwise, it is impossible to know what link should be made if a read maps to multiple contigs.

            Is this what you mean?

            Kind regards,
            Boetsie

            Comment


            • #21
              Hi Corthay,

              no problem, good that it is clear now

              1)
              Hmmm, that should never be the case. Are you looking at the summary file to conclude that the total bases of scaffolds is increased? Because this value (sum (bp)) is the total number of bases WITH N's. The number of bases without N's should either be the same or less than the original total number of bases, since it tries to merge the contigs if they share -n overlap.

              If you want, i can send you a script which calculates the number of N's in the scaffolds.

              2)
              For estimating the gap, i use the size of gap using reads.

              Kind regards and no problem for the questions ,
              Boetsie

              Originally posted by corthay View Post
              Hi boetsie,

              Thanks for your quick reply. I understood how uniqueness is guaranteed.
              Then, I have two more questions please.

              Firstly, I am wondering why the total bases of scaffolds without N is increased even though I set 0 for "-x" option.

              Secondly, how do you calculate the distance of reads within a given contig pair.
              Do you estimate the size of gap using reads, or gap size is just ignored ?

              Sorry for asking so many questions.

              Thanks
              Corthay.

              Comment


              • #22
                congratulation

                I am using SSPACE and I find this tool very useful and user friendly (not as Bambus!).

                Thanks!

                Comment


                • #23
                  Originally posted by gstitan View Post
                  I am using SSPACE and I find this tool very useful and user friendly (not as Bambus!).

                  Thanks!
                  Thank you for this compliment

                  Comment


                  • #24
                    Hi, boetsie.
                    SSPACE is very good tool for scaffolding. I thanks you for your good job.

                    By the way, How does SSAPCE pronounce? "espeis"?

                    Comment


                    • #25
                      Hi, I'm excited to get SSPACE up and running. Unfortunately I'm getting a permission denial when making the directories (line 141). SSPACE is installed on a server in a directory where I don't have write permissions, which I suspect is the problem. Is there a way to direct where the results folders end up? or is my issue much simpler (and dumber).

                      Comment


                      • #26
                        Hi themwg,

                        good that it is working! Unfortunately, you can't specify where the folders end up. The folder structure is generated in your current working directory. Maybe you can turn the problem around; go to the directory where you would like the files/folders will end up and run the program from there. Then specify the full path to the contigs and also the full paths in the library file for your paired sequences.

                        If this won't work, i'm able to make a customised script for you You can mail me any time.

                        Boetsie

                        Originally posted by themwg View Post
                        Hi, I'm excited to get SSPACE up and running. Unfortunately I'm getting a permission denial when making the directories (line 141). SSPACE is installed on a server in a directory where I don't have write permissions, which I suspect is the problem. Is there a way to direct where the results folders end up? or is my issue much simpler (and dumber).

                        Comment


                        • #27
                          the next problem

                          Thanks Boetsie for the quick reply.
                          Sure enough I get further along if I just direct to SSPACE.pl from my directory. However I hit a second problem during the Reading, filtering and converting input seqs it Can't write to single file. here it is below

                          =>Fri Feb 11 11:55:38 2011: Reading, filtering and converting input sequences of library '/home/carroll/Desktop/data_carroll/SSPACEtests/leo95130_I' initiated
                          Can't write to single file -- fatal

                          =>Fri Feb 11 11:55:38 2011: Storing contigs to format for scaffolding

                          LIBRARY /home/carroll/Desktop/data_carroll/SSPACEtests/leo95130_I
                          ------------------------------------------------------------

                          =>Fri Feb 11 11:55:44 2011: Building Bowtie index for contigs (tmp.standard_output/subset_contigs.fasta)

                          Bowtie-build error; -1 at /opt/SSPACE-1.1_linux-x86_64/bin/mapWithBowtie.pl line 37.
                          WARNING: No scaffolding, because no reads found on contigs

                          I imagine the bowtie build error is related to the first. Any thoughts on why it can't write to the single file (merging the two seq files?). Those files are in fastq format from illumina. They are also both quite large >10GB. My machine has a meager 44GB Ram. IF any of that is at all relevant here.

                          Thanks!

                          Comment


                          • #28
                            Hi again,

                            I think i know what the problem is. You have a library called "/home/carroll/Desktop/data_carroll/SSPACEtests/leo95130_I". This is a very strange name for a library. Name it something like "leo95130_I" or "lib1" (without the quotes though). Now, with your current library name, the script will try to create a file containing this library name in folder 'reads'. It will now be something like;

                            reads/home/carroll/Desktop/data_carroll/SSPACEtests/leo95130_I.filtered.reads

                            This will surely cause problems (as you noticed). The other error you get is probably caused by the same problem, namely your library name.

                            Your library should be something like;

                            library1 /path-to-file/filename_1.fastq /path-to-file/filename_2.fastq 500 0.25 0

                            If you are unable to generate the library, you can mail me your current library file and i can help you.

                            Kind regards,
                            Boetsie

                            Comment


                            • #29
                              Hello, I am running into some problems while using SSPACE. I believe it has to do with tmp.alboxf_scaffolds_no_extension/subset_contigs.fasta not being built properly, so my question is how is subset_contigs.fasta built?

                              Thanks!

                              Comment


                              • #30
                                Hi goldenflaw,

                                what kind of problems are your running into?

                                The file you mention is generated by taking a short subset of the contigs. How this is done, is explained below (and in the README of SSPACE).

                                Before mapping, contigs are shortened, reducing the search space for Bowtie. Only edges of the contigs are considered for mapping. Cutting of edges is determined by taking the maximal allowed distance inserted by the user in the library file (insert size and insert standard deviation). The maximal distance is insert_size + (insert_size * insert_stdev). For example, with a insert size of 500 and a deviation of 0.5, the maximal distance is 750. First 750 bases and last 750 bases are subtracted from the contig sequence, in this case;

                                ------------------------------------------

                                ------------|-----------------|

                                -------------------------------------------
                                750bp------------------------750bp
                                Please do not look at the white stripes in the example. I couldn't get the spacings between the two dashed lines right

                                Kind regards,
                                Boetsie

                                Originally posted by goldenflaw View Post
                                Hello, I am running into some problems while using SSPACE. I believe it has to do with tmp.alboxf_scaffolds_no_extension/subset_contigs.fasta not being built properly, so my question is how is subset_contigs.fasta built?

                                Thanks!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM
                                • seqadmin
                                  The Impact of AI in Genomic Medicine
                                  by seqadmin



                                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                  02-26-2024, 02:07 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-14-2024, 06:13 AM
                                0 responses
                                32 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-08-2024, 08:03 AM
                                0 responses
                                71 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-07-2024, 08:13 AM
                                0 responses
                                80 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-06-2024, 09:51 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X