Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Hobbe
    Member
    • Apr 2010
    • 29

    I am having problems using SSPACE basic with my 454 paired-end data, and was hoping to get some help here. SSPACE runs fine using my Illumina PE data, but my 454-data has much longer insert-sizes (3-5 kb), and I think they really could make difference.

    My problem is that SSPACE reads all the 454-pairs in, removes quite a lot of them as the include Ns, and then maps 0 of them. The report is below. It was difficult to get the reads in a format that SSPACE accepts, and I guess that the problem lies in the fastq-files. Some (very few) reads are too long (over 1024 bases), and bowtie complains about these. Would this crash the whole run? I know that bowtie is not the best choice for longer reads, but I thought it would still manage to map some reads? Is SSPACE premium the answer?

    Any/all help would be much appreciated,
    Henrik

    READING READS Lib454:
    ------------------------------------------------------------
    Total inserted pairs = 1217215
    Number of pairs containing N's = 1066178
    Remaining pairs = 151037
    ------------------------------------------------------------
    ...

    LIBRARY Lib454 STATS:
    ################################################################################

    MAPPING READS TO CONTIGS:
    ------------------------------------------------------------
    Number of single reads found on contigs = 0
    Number of pairs used for pairing contigs / total pairs = 0 / 0
    ------------------------------------------------------------

    READ PAIRS STATS:
    Assembled pairs: 0 (0 sequences)
    Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 3709 +/-927.25): 0
    Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 0
    Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 0
    ---
    Satisfied in distance/logic within a given contig pair (pre-scaffold): 0
    Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 0
    ---
    Total satisfied: 0 unsatisfied: 0


    Estimated insert size statistics (based on 0 pairs):
    Mean insert size = 0
    Median insert size = 0
    REPEATS:
    Number of repeated edges = 0
    ------------------------------------------------------------

    ################################################################################

    Comment

    • gaffa
      Member
      • Oct 2010
      • 82

      Originally posted by boetsie View Post
      yes, this is possible. The file should be in a TAB delimited format like:

      <contig1> <startpos_on_contig1> <endpos_on_contig1> <contig2> <startpos_on_contig2> <endpos_on_contig2>

      E.g.
      contig1 100 150 contig1 350 300
      contig1 4000 4050 contig2 110 60

      There is a script in the 'tools' directory of the package to convert SAM/BAM to a tab format.

      Regards,
      Boetsie
      Ok, great! Thanks.

      On a slightly related note, how well do you think SSPACE would deal with scaffolding information from other sources than paired/mate-reads, such as e.g. physical/genetic linkage data (supplied then in the above file format)? Some scaffolders (notably Bambus) claim to be able to work with essentially any kind of link information between contigs - could the same be said of SSPACE?

      Comment

      • boetsie
        Senior Member
        • Feb 2010
        • 245

        Originally posted by Hobbe View Post
        I am having problems using SSPACE basic with my 454 paired-end data, and was hoping to get some help here. SSPACE runs fine using my Illumina PE data, but my 454-data has much longer insert-sizes (3-5 kb), and I think they really could make difference.

        My problem is that SSPACE reads all the 454-pairs in, removes quite a lot of them as the include Ns, and then maps 0 of them. The report is below. It was difficult to get the reads in a format that SSPACE accepts, and I guess that the problem lies in the fastq-files. Some (very few) reads are too long (over 1024 bases), and bowtie complains about these. Would this crash the whole run? I know that bowtie is not the best choice for longer reads, but I thought it would still manage to map some reads? Is SSPACE premium the answer?
        SSPACE basic does not handle 454 reads well, simply because the reads are too long for bowtie to align (up to 1024 bases). Also, bowtie can handle only up to two mismatches. In SSPACE premium I've added the BWA-SW aligner to deal with larger reads. Otherwise, you can align the reads yourself with BWA-SW and try to convert the resulting .SAM file to a .tab file (see post above).

        Boetsie

        Comment

        • boetsie
          Senior Member
          • Feb 2010
          • 245

          Originally posted by gaffa View Post
          Ok, great! Thanks.

          On a slightly related note, how well do you think SSPACE would deal with scaffolding information from other sources than paired/mate-reads, such as e.g. physical/genetic linkage data (supplied then in the above file format)? Some scaffolders (notably Bambus) claim to be able to work with essentially any kind of link information between contigs - could the same be said of SSPACE?
          I've not tested it myself, but I think any linking information is suited. I would suggest to give it a try

          Comment

          • is41985
            Junior Member
            • May 2012
            • 2

            Hi,

            to save me the hassle of going through the code, I have a short question regarding insert sizes.
            When scaffolding, does SSPACE use the user specified insert size (from the library.txt file), or the estimated insert size (that is reported in the summary file)?
            It is important, since in my case these two seem to differ, and I need the real (user-specified) value to be used.


            Thank you,
            Ivan.

            Comment

            • boetsie
              Senior Member
              • Feb 2010
              • 245

              Originally posted by is41985 View Post
              Hi,

              to save me the hassle of going through the code, I have a short question regarding insert sizes.
              When scaffolding, does SSPACE use the user specified insert size (from the library.txt file), or the estimated insert size (that is reported in the summary file)?
              It is important, since in my case these two seem to differ, and I need the real (user-specified) value to be used.


              Thank you,
              Ivan.
              Hi Ivan,

              Ha, don't do that, posting it here will save you some time SSPACE uses the user specified insert-size. The insert size in the report is just some extra information.

              Regards,
              Boetsie

              Comment

              • is41985
                Junior Member
                • May 2012
                • 2

                Originally posted by boetsie View Post
                Hi Ivan,

                Ha, don't do that, posting it here will save you some time SSPACE uses the user specified insert-size. The insert size in the report is just some extra information.

                Regards,
                Boetsie
                Indeed it did
                Thank you for the quick reply!


                Best regards,
                Ivan.

                Comment

                • Hiroki
                  Member
                  • May 2010
                  • 17

                  Hi boetsie,

                  I am trying to make scaffold from contigs from 454 data with Illumina mate pair 75-bp reads with different sizes.

                  There are 502k contigs and only 1000 or so decreased by scaffolding. This was much less than I expected.

                  My current parameters are as follows:
                  -x = 0
                  -z = 0
                  -k = 5
                  -a = 0.7
                  -n = 15
                  -T = 1
                  -p = 0

                  Which option should I change??

                  Of course I know there is a possibility that my mate pair libraries are crap. But I just want to try different settings.

                  Sorry for a newbie question.

                  Thank you so much

                  Comment

                  • boetsie
                    Senior Member
                    • Feb 2010
                    • 245

                    It is hard to tell, but probably you have to change something in your library file. To give you further advice I both need the output summaryfile and input library file for this. Could you maybe send them through e-mail ([email protected])?

                    Boetsie

                    Originally posted by Hiroki View Post
                    Hi boetsie,

                    I am trying to make scaffold from contigs from 454 data with Illumina mate pair 75-bp reads with different sizes.

                    There are 502k contigs and only 1000 or so decreased by scaffolding. This was much less than I expected.

                    My current parameters are as follows:
                    -x = 0
                    -z = 0
                    -k = 5
                    -a = 0.7
                    -n = 15
                    -T = 1
                    -p = 0

                    Which option should I change??

                    Of course I know there is a possibility that my mate pair libraries are crap. But I just want to try different settings.

                    Sorry for a newbie question.

                    Thank you so much

                    Comment

                    • Hiroki
                      Member
                      • May 2010
                      • 17

                      Thanks for the response.
                      I will PM you soon.

                      Hiroki

                      Comment

                      • Hiroki
                        Member
                        • May 2010
                        • 17

                        Hi Boetsie,

                        I emailed you some files a couple days ago.
                        Did you receive them?
                        Let me know if anything wrong.

                        Sorry if I'm rushing you...

                        Thanks.

                        Hiroki

                        Comment

                        • gaffa
                          Member
                          • Oct 2010
                          • 82

                          Is there any easy way to find out for a given SSPACE scaffold what are the names of the pre-SSPACE contigs/scaffolds that are included in it?

                          Comment

                          • boetsie
                            Senior Member
                            • Feb 2010
                            • 245

                            Originally posted by gaffa View Post
                            Is there any easy way to find out for a given SSPACE scaffold what are the names of the pre-SSPACE contigs/scaffolds that are included in it?
                            Hi Gaffa,

                            have a look in the 'intermediate_results' folder, there is a file ending with *formattedcontigs.fasta. Here, the original headers are named after 'seed'.

                            Boetsie

                            Comment

                            • rpdias
                              Junior Member
                              • Aug 2012
                              • 2

                              Greetings,
                              I'm quite new to SSPACE. I'm using it in a Ubuntu 64bits VM machine with perl and python.
                              I'm tying to use the example data to test SSPACE, but I have always the same error, it seems that can't do bowtie indexing. The same happens with original Illumina data.

                              Anyone had the same problem or can help me on this?

                              Thanks in advance for any help!

                              Cheers,
                              Ricardo


                              SSPACE Log
                              perl /media/NGStorage/NGS/SW/SSPACE-BASIC-2.0_linux-x86_64/SSPACE_Basic_v2.0.pl -l libraries.txt -s contigs_abyss.fasta -k 5 -a 0.7 -x 0 -b ecoli_scaffolds_no_extension

                              Legacy library getopts.pl will be removed from the Perl core distribution in the next major release. Please install the separate libperl4-corelibs-perl package. It is being used at /media/NGStorage/NGS/SW/SSPACE-BASIC-2.0_linux-x86_64/SSPACE_Basic_v2.0.pl, line 87.
                              Your inserted inputs on [SSPACE_Basic_v2.0_linux] at Tue Aug 21 10:09:36 2012:
                              Required inputs:
                              -l = libraries.txt
                              -s = contigs_abyss.fasta
                              -b = ecoli_scaffolds_no_extension

                              Optional inputs:
                              -x = 0
                              -z = 0
                              -k = 5
                              -a = 0.7
                              -n = 15
                              -T = 1
                              -p = 0


                              =>Tue Aug 21 10:09:36 2012: Reading, filtering and converting input sequences of library file initiated
                              Reading read-pairs lib1.1 @ 10000000

                              ------------------------------------------------------------

                              =>Tue Aug 21 10:10:11 2012: Storing contigs to format for scaffolding

                              LIBRARY lib1
                              ------------------------------------------------------------

                              =>Tue Aug 21 10:10:12 2012: Reading contig file

                              =>Tue Aug 21 10:10:12 2012: Building Bowtie index for contigs

                              Bowtie-build error; -1 at /media/NGStorage/NGS/SW/SSPACE-BASIC-2.0_linux-x86_64/bin/PairingAndScaffolding.pl line 829.
                              **************************************************

                              Process failed on Tue Aug 21 10:10:12 2012

                              Comment

                              • boetsie
                                Senior Member
                                • Feb 2010
                                • 245

                                Hi Ricardo,

                                look at this post where colindaven suggests how to fix the problem;

                                Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                                Simply chmod a+x all directories of SSPACE.

                                Regards,
                                Boetsie

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                  by SEQadmin2


                                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                  Here are nine questions we think about, in roughly the order they matter, before...
                                  06-18-2026, 07:11 AM
                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-17-2026, 06:09 AM
                                0 responses
                                30 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-09-2026, 11:58 AM
                                0 responses
                                44 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-05-2026, 10:09 AM
                                0 responses
                                51 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-04-2026, 08:59 AM
                                0 responses
                                51 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...