Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi Lisa,

    no problem, good that I could help you and it's all working now! Good luck and feel free to contact me if you have any questions.

    Regards,
    Boetsie

    Originally posted by Lisa0508 View Post
    Hi Boetzer,
    Thank you so much for your quick reply and patient explanation! It's working now. I just got registered on this forum yesterday. All settings were in a default condition. I'm very sorry I did not check the private message option. Now it's O.K. to recieve private messages. Thank you again!
    Regards,
    Lisa

    Comment


    • Hi all,

      we have released a new version of both SSPACE Basic and SSPACE Premium. SSPACE Basic is the previous version of SSPACE Premium. The new SSPACE premium contains the following new features:
      • included the readmapper BWA/BWA-sw
      • Changed the multithreading of Bowtie/BWA. instead of running the readmapping of the aligner in multithread mode, SSPACE calls the aligner in single-threaded mode with multiple instances. This will preserve the order of the reads for processing and read-tracking, speeding up the process and reducing memory consumption.
      • Readfiles are split into files with portions of 1 million paired-reads instead of one file. This will speed up the alignment (see previous feature).
      • During extension, contigs are extended with subsequences (k-mers) of the unmapped reads, instead of the full read. This will increase the coverage for extension, since k-mers have a better overlap with the contigs than full reads.
      • A file is generated with more detailed information about the extension process.
      • Included the option -S which makes it able to skip the reading and processing of the paired-read input files.
      • It is now possible to include .gz files (only if gunzip is installed).
      • Changed the folder structure.
      • Changed the format of the final scaffolds
      • Included some additional statistics in the summary file; GC content, N25/N75, number of gaps and total size of gaps
      • Added a tool for quality-trimming of paired-reads
      • Added a tool for estimation of the insert size

      In addition, we have been working on a tool named GapFiller for closing gaps within scaffolds using paired-read data. Currently GapFiller is submitted for publication, and the basic sourcecode will be available upon acceptance. At that time we will make sure academic users can apply for a free license. However, before the manuscript is accepted, a pre-release is available at the cost of 250,- euro (applicable to both academic and commercial users).

      See our website for more information about SSPACE and GapFiller: http://www.baseclear.com/landingpage...ics-solutions/

      Kind regards,
      Boetsie

      Comment


      • k-mers and m parameter

        Originally posted by boetsie View Post
        • During extension, contigs are extended with subsequences (k-mers) of the unmapped reads, instead of the full read. This will increase the coverage for extension, since k-mers have a better overlap with the contigs than full reads.
        Dear Boetsie,
        does this mean that we should have better results with the -m parameter optimised for k-mer size instead of read length ? How can we know the k-mer size used and how do we best adjust the -m value for example for a 50bp read?
        regards,
        Steve

        Comment


        • Hi Steve,

          The kmer size used is just the (-m +1)value. -m thus actually means the overlap the kmer should have, and the extra nucleotide is the 'overhang'. The difference between the two;

          previous method:

          ctg: GTCGATAGATAGATCGTCGATAGTAGTCGA
          read:...GATTGATAGATCGTCGATAGTAGTCGAG


          The above read will not be used for extension, since it contains a mismatch and thus does not fully overlap with the contig. The new method cuts the read into k-mers;

          Say we use a -m of 20, the kmers of the read is;

          READ: GATAGATCGTCGATAGTAGTCGAGAT
          kmer: GATAGATCGTCGATAGTAGTC
          kmer: .ATAGATCGTCGATAGTAGTCG
          kmer: ..TAGATCGTCGATAGTAGTCGA
          kmer: ...AGATCGTCGATAGTAGTCGAG
          kmer: ....GATCGTCGATAGTAGTCGAGA
          kmer: .....ATCGTCGATAGTAGTCGAGAT
          etc...


          if we now extend the contig, the overlapping k-mer is;

          ctg: GTCGATAGATAGATCGTCGATAGTAGTCGA
          read:..........AGATCGTCGATAGTAGTCGAG


          This will thus increase the coverage since it removes the errors, especially for longer reads.

          Regards,
          Boetsie

          Originally posted by stevebaeyen View Post
          Dear Boetsie,
          does this mean that we should have better results with the -m parameter optimised for k-mer size instead of read length ? How can we know the k-mer size used and how do we best adjust the -m value for example for a 50bp read?
          regards,
          Steve

          Comment


          • Hey all,

            I got a quite strange problem: my contig fasta file looks like:

            >22617
            GTCTACTTCAGACAAGGAAGACGGTCTACTTCAGATGAGGAAGATGGTCTGCTACAAAGGGAAGACGGTCTGCTTCAGGCCAGGAAGACGGTCTGCTACA
            >22619
            CGTCTTCCAATTTTGAATCAGACCGTCTTGATTTTGAATTGGACCGTCTCCCCTGGGCGCATCTGCTGGGCCGCTGGGGCTGGAACCGTGGCTCAAAATT
            >22621
            TTCCTCAGCAACAACATTGATGGTGTCTTTTGTGTACATGTATGAGTAGTCAGTCAAGTAAAGTATGCGCACCTGTCTTTTGGTAAGCCTACGCAGCCTG
            >22623
            AGGCACTCTGCCCGAGTGGTTAAGGGGTAAGTCTCGAATACATTATTCGACCGTCCATCATGACGGGTTAACTTATAGGCTCTGCCTGCGTCGGTTCAAA

            BUT

            the programms tells me that:

            ERROR: Invalid (-s) contig file /home/dpr..../de_novo_assembly_DNA/SOAPdenovo_39/PseudoAfi_K39.contig.fastasorted.fasta ...Exiting.


            So can u tell me why my file should be corrupt?

            Any help is kindly appreciated,

            best


            Phil

            Comment


            • Originally posted by sphil View Post
              Hey all,

              I got a quite strange problem: my contig fasta file looks like:

              >22617
              GTCTACTTCAGACAAGGAAGACGGTCTACTTCAGATGAGGAAGATGGTCTGCTACAAAGGGAAGACGGTCTGCTTCAGGCCAGGAAGACGGTCTGCTACA
              >22619
              CGTCTTCCAATTTTGAATCAGACCGTCTTGATTTTGAATTGGACCGTCTCCCCTGGGCGCATCTGCTGGGCCGCTGGGGCTGGAACCGTGGCTCAAAATT
              >22621
              TTCCTCAGCAACAACATTGATGGTGTCTTTTGTGTACATGTATGAGTAGTCAGTCAAGTAAAGTATGCGCACCTGTCTTTTGGTAAGCCTACGCAGCCTG
              >22623
              AGGCACTCTGCCCGAGTGGTTAAGGGGTAAGTCTCGAATACATTATTCGACCGTCCATCATGACGGGTTAACTTATAGGCTCTGCCTGCGTCGGTTCAAA

              BUT

              the programms tells me that:

              ERROR: Invalid (-s) contig file /home/dpr..../de_novo_assembly_DNA/SOAPdenovo_39/PseudoAfi_K39.contig.fastasorted.fasta ...Exiting.


              So can u tell me why my file should be corrupt?

              Any help is kindly appreciated,

              best


              Phil
              Hi Phil,

              the error has nothing to do with the file format. The line where this error occurs is just checking whether the contig file exists or not. Somehow it does not find your file. Can you check if the file is really at the specified location and that the user rights are correct?

              Boetsie

              Comment


              • Hey,

                sry for the late answer but I was not in the office last days. I checked the location and it is the right one so maybe i got something wrong in the library file.

                here is the line containing my library...

                TrueSeqStd /home/dpr/P/PA/SGII_ATCACG_L003_R1.fastq /home/dpr/P/PA/SGII_ATCACG_L003_R2.fastq 50 0.5 FR



                maybe there is a fault?

                Best,


                Phil



                got it, thanks for the help
                Last edited by sphil; 01-13-2012, 12:55 AM. Reason: solved

                Comment


                • Dear boetsie,

                  Is it possible to implement a feature in SSPACE for it to recognize inward-facing reads in a Illumina MP library? This is a serious problem for some library preparations. This feature is present in Ray assembler, for example:
                  Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                  Regards,
                  Nestor

                  Comment


                  • Hi Nestor,

                    This is already implemented in SSPACE. Basically, Ray does the same as SSPACE by incoorperating a range of allowed reads, for example an insert size of 4000 with 0.25 deviation (range is thus 3000-5000). This will initialy filter out 'paired-end' reads, since these have smaller insert sizes (< 500bp). In addition, SSPACE requires for each library the orientation of the paired-reads. If you specify the orientation <-- -->, --> <-- paired-reads will not be taking into account for scaffolding.

                    Regards,
                    Boetsie

                    Originally posted by user1313 View Post
                    Dear boetsie,

                    Is it possible to implement a feature in SSPACE for it to recognize inward-facing reads in a Illumina MP library? This is a serious problem for some library preparations. This feature is present in Ray assembler, for example:
                    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                    Regards,
                    Nestor

                    Comment


                    • Dear boetsie,

                      What's with the libraries, where number of "smaller insert size" read pairs is significantly higher, than of "long insert size" read pairs? Don't you think that using such libraries with SSPACE could lead to horrible results such as, in some cases, re-orienting the contigs? Is SSPACE capable now of detecting such libraries by counting PE/MP ratio of reads that were mapped within each contiguous sequence of DNA?

                      Regards,
                      Nestor


                      Originally posted by boetsie View Post
                      Hi Nestor,

                      This is already implemented in SSPACE. Basically, Ray does the same as SSPACE by incoorperating a range of allowed reads, for example an insert size of 4000 with 0.25 deviation (range is thus 3000-5000). This will initialy filter out 'paired-end' reads, since these have smaller insert sizes (< 500bp). In addition, SSPACE requires for each library the orientation of the paired-reads. If you specify the orientation <-- -->, --> <-- paired-reads will not be taking into account for scaffolding.

                      Regards,
                      Boetsie

                      Comment


                      • Originally posted by user1313 View Post
                        Dear boetsie,

                        What's with the libraries, where number of "smaller insert size" read pairs is significantly higher, than of "long insert size" read pairs? Don't you think that using such libraries with SSPACE could lead to horrible results such as, in some cases, re-orienting the contigs? Is SSPACE capable now of detecting such libraries by counting PE/MP ratio of reads that were mapped within each contiguous sequence of DNA?

                        Regards,
                        Nestor
                        That is indeed a problem, they might influence the scaffolding process. But since the smaller read pairs are --><-- orientated (and matepairs <-- --> orientated), they are filtered out.
                        I do not see the benefit of including the PE/MP ratio of reads mapped within a contig, they do not contribute to the scaffolding process. They can only influence the process when the pairs are aligned on different contigs, but as said, they will be filtered out because of orientation.

                        Comment


                        • Dear boetsie,

                          Thank you for the answer. I still, however, would not agree. Correct me, please, if i am wrong.

                          If we have contig 1 and contig 2 with some PE reads (short arrow "->") and some MP reads (longer arrow "-->") like this:

                          Code:
                              contig 1             contig 2
                          5`------------3`     5`------------3`
                              <--    ->          <-    -->
                                      ->          <-
                              ---------- 4000bp ----------
                          Now, let's assume that we have twice more of the PE reads than of MP reads.
                          We gave SSPACE the information that the library is MP with 4000bp insert size. Won't SSPACE reverse-complement contigs in this manner to make the more-abundant "PE" reads to fit the 4000bp "<-- -->" pattern?

                          Code:
                            contig 1(RC)         contig 2 (RC)
                          5`------------3`     5`------------3`
                              <-    -->          <--    ->
                               <-                        ->
                              ---------- 4000bp ----------
                          I don't say it will happen every time, but in some cases, where the length of the RC-contigs would fit the distance listed in the library file it could be a disastrous problem. To tell you the truth, with my limited experience, i have seen more problematic MP libraries than of good ones.

                          Regards,
                          Nestor


                          Originally posted by boetsie View Post
                          That is indeed a problem, they might influence the scaffolding process. But since the smaller read pairs are --><-- orientated (and matepairs <-- --> orientated), they are filtered out.
                          I do not see the benefit of including the PE/MP ratio of reads mapped within a contig, they do not contribute to the scaffolding process. They can only influence the process when the pairs are aligned on different contigs, but as said, they will be filtered out because of orientation.
                          Last edited by user1313; 02-29-2012, 07:24 AM.

                          Comment


                          • Yes, you are right, sorry. But this will only happen if both the contigs are short. Say the pair-end reads are mapped as following;

                            Code:
                                contig 1 (1000bp)            contig 2 (8000 bp)
                            5`------------>3`     5`----------------------------->3`
                                        <-           <- 
                                       pos900    pos100
                            Since MP are <----> orientated, contig 2 should be reverse complement;

                            Code:
                                contig 1 (1000bp)            contig 2 (8000 bp)
                            5`------------3`     3`<-----------------------------5`
                                        <-                                  -> 
                                       pos900                             pos7900
                            The distance is now (1000-900) + 7900 = 8000. This is a difference of 4000 compared with your library (8000-4000bp = 4000 difference).

                            I agree though, that if contig 2 is 4000bp smaller, the distance would be 4000bp. Near the size of your library! This could be a problem, especially with contig orientation and insert size estimation (distance is not 4000 for above example, but ~200bp (1000-900 of contig1) + (pos100 of contig2)).

                            Thanks for the direction, I'll try to dive deeper into this...

                            Regards,
                            Boetsie

                            Comment


                            • Is it possible to run SSPACE on external read mappings, i.e. can I perform the read mappings on my own and then have SSPACE do the scaffolding based on them?

                              Comment


                              • Originally posted by gaffa View Post
                                Is it possible to run SSPACE on external read mappings, i.e. can I perform the read mappings on my own and then have SSPACE do the scaffolding based on them?
                                yes, this is possible. The file should be in a TAB delimited format like:

                                <contig1> <startpos_on_contig1> <endpos_on_contig1> <contig2> <startpos_on_contig2> <endpos_on_contig2>

                                E.g.
                                contig1 100 150 contig1 350 300
                                contig1 4000 4050 contig2 110 60

                                There is a script in the 'tools' directory of the package to convert SAM/BAM to a tab format.

                                Regards,
                                Boetsie

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                7 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                7 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                66 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X