Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • I am having problems using SSPACE basic with my 454 paired-end data, and was hoping to get some help here. SSPACE runs fine using my Illumina PE data, but my 454-data has much longer insert-sizes (3-5 kb), and I think they really could make difference.

    My problem is that SSPACE reads all the 454-pairs in, removes quite a lot of them as the include Ns, and then maps 0 of them. The report is below. It was difficult to get the reads in a format that SSPACE accepts, and I guess that the problem lies in the fastq-files. Some (very few) reads are too long (over 1024 bases), and bowtie complains about these. Would this crash the whole run? I know that bowtie is not the best choice for longer reads, but I thought it would still manage to map some reads? Is SSPACE premium the answer?

    Any/all help would be much appreciated,
    Henrik

    READING READS Lib454:
    ------------------------------------------------------------
    Total inserted pairs = 1217215
    Number of pairs containing N's = 1066178
    Remaining pairs = 151037
    ------------------------------------------------------------
    ...

    LIBRARY Lib454 STATS:
    ################################################################################

    MAPPING READS TO CONTIGS:
    ------------------------------------------------------------
    Number of single reads found on contigs = 0
    Number of pairs used for pairing contigs / total pairs = 0 / 0
    ------------------------------------------------------------

    READ PAIRS STATS:
    Assembled pairs: 0 (0 sequences)
    Satisfied in distance/logic within contigs (i.e. -> <-, distance on target: 3709 +/-927.25): 0
    Unsatisfied in distance within contigs (i.e. distance out-of-bounds): 0
    Unsatisfied pairing logic within contigs (i.e. illogical pairing ->->, <-<- or <-->): 0
    ---
    Satisfied in distance/logic within a given contig pair (pre-scaffold): 0
    Unsatisfied in distance within a given contig pair (i.e. calculated distances out-of-bounds): 0
    ---
    Total satisfied: 0 unsatisfied: 0


    Estimated insert size statistics (based on 0 pairs):
    Mean insert size = 0
    Median insert size = 0
    REPEATS:
    Number of repeated edges = 0
    ------------------------------------------------------------

    ################################################################################

    Comment


    • Originally posted by boetsie View Post
      yes, this is possible. The file should be in a TAB delimited format like:

      <contig1> <startpos_on_contig1> <endpos_on_contig1> <contig2> <startpos_on_contig2> <endpos_on_contig2>

      E.g.
      contig1 100 150 contig1 350 300
      contig1 4000 4050 contig2 110 60

      There is a script in the 'tools' directory of the package to convert SAM/BAM to a tab format.

      Regards,
      Boetsie
      Ok, great! Thanks.

      On a slightly related note, how well do you think SSPACE would deal with scaffolding information from other sources than paired/mate-reads, such as e.g. physical/genetic linkage data (supplied then in the above file format)? Some scaffolders (notably Bambus) claim to be able to work with essentially any kind of link information between contigs - could the same be said of SSPACE?

      Comment


      • Originally posted by Hobbe View Post
        I am having problems using SSPACE basic with my 454 paired-end data, and was hoping to get some help here. SSPACE runs fine using my Illumina PE data, but my 454-data has much longer insert-sizes (3-5 kb), and I think they really could make difference.

        My problem is that SSPACE reads all the 454-pairs in, removes quite a lot of them as the include Ns, and then maps 0 of them. The report is below. It was difficult to get the reads in a format that SSPACE accepts, and I guess that the problem lies in the fastq-files. Some (very few) reads are too long (over 1024 bases), and bowtie complains about these. Would this crash the whole run? I know that bowtie is not the best choice for longer reads, but I thought it would still manage to map some reads? Is SSPACE premium the answer?
        SSPACE basic does not handle 454 reads well, simply because the reads are too long for bowtie to align (up to 1024 bases). Also, bowtie can handle only up to two mismatches. In SSPACE premium I've added the BWA-SW aligner to deal with larger reads. Otherwise, you can align the reads yourself with BWA-SW and try to convert the resulting .SAM file to a .tab file (see post above).

        Boetsie

        Comment


        • Originally posted by gaffa View Post
          Ok, great! Thanks.

          On a slightly related note, how well do you think SSPACE would deal with scaffolding information from other sources than paired/mate-reads, such as e.g. physical/genetic linkage data (supplied then in the above file format)? Some scaffolders (notably Bambus) claim to be able to work with essentially any kind of link information between contigs - could the same be said of SSPACE?
          I've not tested it myself, but I think any linking information is suited. I would suggest to give it a try

          Comment


          • Hi,

            to save me the hassle of going through the code, I have a short question regarding insert sizes.
            When scaffolding, does SSPACE use the user specified insert size (from the library.txt file), or the estimated insert size (that is reported in the summary file)?
            It is important, since in my case these two seem to differ, and I need the real (user-specified) value to be used.


            Thank you,
            Ivan.

            Comment


            • Originally posted by is41985 View Post
              Hi,

              to save me the hassle of going through the code, I have a short question regarding insert sizes.
              When scaffolding, does SSPACE use the user specified insert size (from the library.txt file), or the estimated insert size (that is reported in the summary file)?
              It is important, since in my case these two seem to differ, and I need the real (user-specified) value to be used.


              Thank you,
              Ivan.
              Hi Ivan,

              Ha, don't do that, posting it here will save you some time SSPACE uses the user specified insert-size. The insert size in the report is just some extra information.

              Regards,
              Boetsie

              Comment


              • Originally posted by boetsie View Post
                Hi Ivan,

                Ha, don't do that, posting it here will save you some time SSPACE uses the user specified insert-size. The insert size in the report is just some extra information.

                Regards,
                Boetsie
                Indeed it did
                Thank you for the quick reply!


                Best regards,
                Ivan.

                Comment


                • Hi boetsie,

                  I am trying to make scaffold from contigs from 454 data with Illumina mate pair 75-bp reads with different sizes.

                  There are 502k contigs and only 1000 or so decreased by scaffolding. This was much less than I expected.

                  My current parameters are as follows:
                  -x = 0
                  -z = 0
                  -k = 5
                  -a = 0.7
                  -n = 15
                  -T = 1
                  -p = 0

                  Which option should I change??

                  Of course I know there is a possibility that my mate pair libraries are crap. But I just want to try different settings.

                  Sorry for a newbie question.

                  Thank you so much

                  Comment


                  • It is hard to tell, but probably you have to change something in your library file. To give you further advice I both need the output summaryfile and input library file for this. Could you maybe send them through e-mail ([email protected])?

                    Boetsie

                    Originally posted by Hiroki View Post
                    Hi boetsie,

                    I am trying to make scaffold from contigs from 454 data with Illumina mate pair 75-bp reads with different sizes.

                    There are 502k contigs and only 1000 or so decreased by scaffolding. This was much less than I expected.

                    My current parameters are as follows:
                    -x = 0
                    -z = 0
                    -k = 5
                    -a = 0.7
                    -n = 15
                    -T = 1
                    -p = 0

                    Which option should I change??

                    Of course I know there is a possibility that my mate pair libraries are crap. But I just want to try different settings.

                    Sorry for a newbie question.

                    Thank you so much

                    Comment


                    • Thanks for the response.
                      I will PM you soon.

                      Hiroki

                      Comment


                      • Hi Boetsie,

                        I emailed you some files a couple days ago.
                        Did you receive them?
                        Let me know if anything wrong.

                        Sorry if I'm rushing you...

                        Thanks.

                        Hiroki

                        Comment


                        • Is there any easy way to find out for a given SSPACE scaffold what are the names of the pre-SSPACE contigs/scaffolds that are included in it?

                          Comment


                          • Originally posted by gaffa View Post
                            Is there any easy way to find out for a given SSPACE scaffold what are the names of the pre-SSPACE contigs/scaffolds that are included in it?
                            Hi Gaffa,

                            have a look in the 'intermediate_results' folder, there is a file ending with *formattedcontigs.fasta. Here, the original headers are named after 'seed'.

                            Boetsie

                            Comment


                            • Greetings,
                              I'm quite new to SSPACE. I'm using it in a Ubuntu 64bits VM machine with perl and python.
                              I'm tying to use the example data to test SSPACE, but I have always the same error, it seems that can't do bowtie indexing. The same happens with original Illumina data.

                              Anyone had the same problem or can help me on this?

                              Thanks in advance for any help!

                              Cheers,
                              Ricardo


                              SSPACE Log
                              perl /media/NGStorage/NGS/SW/SSPACE-BASIC-2.0_linux-x86_64/SSPACE_Basic_v2.0.pl -l libraries.txt -s contigs_abyss.fasta -k 5 -a 0.7 -x 0 -b ecoli_scaffolds_no_extension

                              Legacy library getopts.pl will be removed from the Perl core distribution in the next major release. Please install the separate libperl4-corelibs-perl package. It is being used at /media/NGStorage/NGS/SW/SSPACE-BASIC-2.0_linux-x86_64/SSPACE_Basic_v2.0.pl, line 87.
                              Your inserted inputs on [SSPACE_Basic_v2.0_linux] at Tue Aug 21 10:09:36 2012:
                              Required inputs:
                              -l = libraries.txt
                              -s = contigs_abyss.fasta
                              -b = ecoli_scaffolds_no_extension

                              Optional inputs:
                              -x = 0
                              -z = 0
                              -k = 5
                              -a = 0.7
                              -n = 15
                              -T = 1
                              -p = 0


                              =>Tue Aug 21 10:09:36 2012: Reading, filtering and converting input sequences of library file initiated
                              Reading read-pairs lib1.1 @ 10000000

                              ------------------------------------------------------------

                              =>Tue Aug 21 10:10:11 2012: Storing contigs to format for scaffolding

                              LIBRARY lib1
                              ------------------------------------------------------------

                              =>Tue Aug 21 10:10:12 2012: Reading contig file

                              =>Tue Aug 21 10:10:12 2012: Building Bowtie index for contigs

                              Bowtie-build error; -1 at /media/NGStorage/NGS/SW/SSPACE-BASIC-2.0_linux-x86_64/bin/PairingAndScaffolding.pl line 829.
                              **************************************************

                              Process failed on Tue Aug 21 10:10:12 2012

                              Comment


                              • Hi Ricardo,

                                look at this post where colindaven suggests how to fix the problem;

                                Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                                Simply chmod a+x all directories of SSPACE.

                                Regards,
                                Boetsie

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                66 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X