Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Thanks Boetsie!

    Cheers,
    Ricardo

    Originally posted by boetsie View Post
    Hi Ricardo,

    look at this post where colindaven suggests how to fix the problem;

    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    Simply chmod a+x all directories of SSPACE.

    Regards,
    Boetsie

    Comment


    • Hi Boetsie

      I'm trying to scaffold a set of contigs from a bacterial genome assembly. Before scaffolding, there were no non-ACGTN bases in my assembly, but after scaffolding with SSPACE, there were. Can you please let me know what is causing this, and if there is an option to turn it off?

      I have already set the -x parameter to 0 to turn off extension.

      Thanks.

      Comment


      • Hi mht,

        SSPACE only adds 'n' or 'N' characters to the assembly, so it would be strange if there are other characters included after scaffolding. Could you please show me an example of what non-ACGTN characters there are included?

        Regards,
        Boetsie

        Originally posted by mht View Post
        Hi Boetsie

        I'm trying to scaffold a set of contigs from a bacterial genome assembly. Before scaffolding, there were no non-ACGTN bases in my assembly, but after scaffolding with SSPACE, there were. Can you please let me know what is causing this, and if there is an option to turn it off?

        I have already set the -x parameter to 0 to turn off extension.

        Thanks.

        Comment


        • oops boetsie, my bad. they were lower-case ACGTN characters. I used Velvet as my assembler so the lower case acgts were from there. What is the difference between 'n' and 'N' characters in SSPACE?

          Originally posted by boetsie View Post
          Hi mht,

          SSPACE only adds 'n' or 'N' characters to the assembly, so it would be strange if there are other characters included after scaffolding. Could you please show me an example of what non-ACGTN characters there are included?

          Regards,
          Boetsie

          Comment


          • It will generate a ‘n’ if a negative gap was found, meaning that there is potential overlap between the contigs but SSPACE could not find a full overlap.

            It will generate a lower-case ‘acgt’ if there is actually an overlap found, e.g.;

            Ctg1: AGTAGATAGATGATCGCGCTGA
            Ctg2:.............ATCGCGCTGAAGTAGATAGATGAGATCGAC


            Will be;
            AGTAGATAGATGatcgcgctgaAGTAGATAGATGAGATCGAC

            Regards,
            Boetsie

            Comment


            • for the TAB delimited format like:

              <contig1> <startpos_on_contig1> <endpos_on_contig1> <contig2> <startpos_on_contig2> <endpos_on_contig2>

              E.g.
              contig1 100 150 contig1 350 300
              contig1 4000 4050 contig2 110 60

              if startpos greater than endpos means the reads mapped on to the - strand


              I map my BAC end reads on the congits by BLAT , how do I contain the strand information in my TAB files?

              Comment


              • Hi,

                Using SSPACE, will it always be better to do contig extension prior to scaffolding? And do I do extension with both paired end and single end reads, or just paired end?

                Thanks.

                Comment


                • I'm not really sure what you mean. You could just add your region of alignment in the tab-file, e.g. if the BAC aligns from contig 1 at position 1000-3000 and at contig 2 at position 4000-2000 (so reverse), you can just add this info:

                  contig1 1000 3000 contig2 4000 2000

                  SSPACE can only handle links between two contigs, so if a BAC aligns on multiple contigs you have to split it so you only have only a contig-contig link, instead of contig-contig-contig.

                  Regards,
                  Boetsie


                  Originally posted by biocomfun View Post
                  for the TAB delimited format like:

                  <contig1> <startpos_on_contig1> <endpos_on_contig1> <contig2> <startpos_on_contig2> <endpos_on_contig2>

                  E.g.
                  contig1 100 150 contig1 350 300
                  contig1 4000 4050 contig2 110 60

                  if startpos greater than endpos means the reads mapped on to the - strand


                  I map my BAC end reads on the congits by BLAT , how do I contain the strand information in my TAB files?

                  Comment


                  • Hi,

                    I can't really judge that, since it depends on what you think is 'better'. Anyway, if you have a nice draft assembly, I would not use the contig extension option, main reason is that it is a time and memory-consuming process. Our current strategy is to to use SSPACE for generating the scaffolds followed by our tool GapFiller to close the gaps (N's) produced by SSPACE. GapFiller uses local information from the paired-read data for the extension, instead of all the unaligned reads. This extension is much faster and more reliable.

                    Regards,
                    Boetsie

                    Originally posted by mht View Post
                    Hi,

                    Using SSPACE, will it always be better to do contig extension prior to scaffolding? And do I do extension with both paired end and single end reads, or just paired end?

                    Thanks.

                    Comment


                    • hi,
                      I have a question.I have some single-end 454 data, how would the SSPACE run if I artificially make it a pair-end data whose sequence of the other side is all "NNNNNNNNNN"?

                      Comment


                      • No, this won't work, since both reads of a pair should be mapped along the contigs. You better make paired-end data by splitting the reads. For example if your read is 200 long, you can make a paired-end read of the first 100bp and the last 100bp. Specify your insert size as 200bp. I've never done this, but I think this could work.

                        Regards,
                        Boetsie

                        Originally posted by sheepyuan View Post
                        hi,
                        I have a question.I have some single-end 454 data, how would the SSPACE run if I artificially make it a pair-end data whose sequence of the other side is all "NNNNNNNNNN"?

                        Comment


                        • Originally posted by boetsie View Post
                          No, this won't work, since both reads of a pair should be mapped along the contigs. You better make paired-end data by splitting the reads. For example if your read is 200 long, you can make a paired-end read of the first 100bp and the last 100bp. Specify your insert size as 200bp. I've never done this, but I think this could work.

                          Regards,
                          Boetsie
                          Thank you very much, I'll try your method of splitting the read!

                          Comment


                          • SSPACE combining cDNA and PE/MP

                            Hi all,

                            I'm using SSPACE with a wealth of data, from small PE libraries up to 20kb and 40kb mate pair libraries. In addition, I have three lanes of 2x100nt RNAseq which I'm curious if could be incorporated. My genome is highly repetitive (70%) , so I'm hoping that the more gene space sequence, the better.

                            I've seen the nematode paper where RNApath was used to scaffold a genome with RNAseq reads, but has anyone successfully used cDNA + PE/MP WGS data in SSPACE? There are some obvious considerations with splicing, but perhaps the plus/minus insert size error can take this into account?

                            Thanks,
                            Alex
                            Last edited by aharkess; 01-15-2013, 10:50 AM.
                            ==========
                            Alex Harkess
                            Leebens-Mack Lab
                            Plant Biology Department
                            University of Georgia, Athens GA

                            Comment


                            • Hello, have you use SSPACE for scaffolding your genome using RNA-seq data? How did you determine your insert size data?Thanks.
                              Originally posted by aharkess View Post
                              Hi all,

                              I'm using SSPACE with a wealth of data, from small PE libraries up to 20kb and 40kb mate pair libraries. In addition, I have three lanes of 2x100nt RNAseq which I'm curious if could be incorporated. My genome is highly repetitive (70%) , so I'm hoping that the more gene space sequence, the better.

                              I've seen the nematode paper where RNApath was used to scaffold a genome with RNAseq reads, but has anyone successfully used cDNA + PE/MP WGS data in SSPACE? There are some obvious considerations with splicing, but perhaps the plus/minus insert size error can take this into account?

                              Thanks,
                              Alex

                              Comment


                              • Hi!

                                I am getting very good results with SSPACE Boetsie, which I plan to use forward with GapFiller.
                                I have a bunch of questions though, but the one more important now is about the foundlinks files.

                                I am sure I am missing the true naming convention of the foundlinks file (I mean, r1 f1 does mean contig1 in formattedcontigs file, and so on?). Any light on this please?

                                If the question it is not well understood, read below (if it is, skip it)

                                I have done several SSPACE runs over Velvet generated contigs, arranged in different fasta inputs:
                                - 1: contigs 1,3,4,6,7
                                - 2: contigs 2,3,4,6
                                - 3: contigs 1,2,4,6

                                I use SSPACE with two read libraries, in two runs. The first one with both libraries, the second one with the bigger insert size library. Both runs are free of scaffolds correct ones, and then I inspect the links. However, in the run1.big_insert_lib.foundlinks I have the same links than in run2.big_insert_lib.foundlinks, but I am not able to associate them to the same contigs, using the formattedcontigs file for name translation. (the question above

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                28 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X