Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reliable gap closure of scaffolds with GapFiller

    Hi all,

    after a successful release of SSPACE (http://seqanswers.com/forums/showthread.php?t=8350) we have generated a new tool, called GapFiller, for closing the remaining gaps produced after scaffolding.

    GapFiller seeks to find reads that potentially fall within gaps by aligning paired-reads with Bowtie or BWA(-sw). Per gap, it extends both sides until a user-defined overlap is find, and the number of gaps corresponds to the initial number of gapped nucleotides in the scaffolds (allowing a user-defined deviation).

    The main features;

    * Inputs are simple FASTA scaffold sequences as well as (multiple) FASTA/FASTQ paired-read data
    * Multiple library input of both paired-end and/or mate pair datasets
    * High-quality closing of gaps
    * High reduction of the number of gaps, and the number of gapped nucleotides
    * Detailed output of the gaps, e.g.number of reads used, number of nucleotides, remaining gapped nucleotides
    * Detailed output of the gapclosing process.

    GapFiller has been tested and compared with various datasets (PE and MP), *different gapclosure tools (IMAGE and SOAP's GapClosure ) and different species. GapFiller was tested on four prokaryotes; E.coli,* (E.coli, S.coelicolor, S. aureus, R.* sphaeroides) and two eukaryotes (S.cerevisiae, human chromosome 14).

    The results, using the quality metrics of GAGE ( http://gage.cbcb.umd.edu/results/index.html), show that the quality of the closure of GapFiller is more accurate than IMAGE and SOAP's GapClosure.
    Although GapFiller yields similar results in terms of the number of gaps/nucleotides closed as SOAP's GapClosure, the smaller error rate indicates that our tool is more appropriate for reliable gap filling.


    Further details are provided in our paper in biology (http://genomebiology.com/2012/13/6/R56/abstract). The program can be obtained from our website (http://www.baseclear.com/bioinformatics-tools/) and is free for academic users.

    Hope it could be useful and any comments or questions are welcome.

    Regards,
    Marten Boetzer a.k.a. Boetsie

  • #2
    Thank you for the kind reply. We hope our programs can be of any use, and we are continuously trying to improve our programs as well as developing new ones.

    Comment


    • #3
      Hi Boetsie/Marten,

      I'm using SSPACE for contig extension and scaffolding, and it works pretty well. I am also interested in GapFiller. However, I'm a bit confused about the utility of contig extension and gap closure. For example, before running GapFiller, is it necessary to run SSPACE first to extend the contigs? Or should I use non-extended scaffolds directly for GapFiller? Will these two different SSPACE scaffold inputs (extended vs. non-extended) affect GapFiller result?

      Thank you and look forward to having your feedback.

      Comment


      • #4
        Hi dnajuice,

        thank you for your question, and for using our software. It is not necessary to extend the contigs before gapclosure. GapFiller will simply extract the scaffolds and tries to fill them. The extension step of SSPACE is only to further extend the contigs to improve scaffolding.

        The extension of SSPACE is based on unmapped single-reads (reads that do not map to any of the contigs), while GapFiller makes use of paired-reads, making the extension more reliable. In addition, GapFiller is able to fill repeated regions, while the extension of SSPACE can not do this, since it uses reads only once. I don't think the extension step of SSPACE will affect the gapclosing, as long as the extension is correct of course, so do not set the extension settings too low.

        Regards,
        Marten



        Originally posted by dnajuice View Post
        Hi Boetsie/Marten,

        I'm using SSPACE for contig extension and scaffolding, and it works pretty well. I am also interested in GapFiller. However, I'm a bit confused about the utility of contig extension and gap closure. For example, before running GapFiller, is it necessary to run SSPACE first to extend the contigs? Or should I use non-extended scaffolds directly for GapFiller? Will these two different SSPACE scaffold inputs (extended vs. non-extended) affect GapFiller result?

        Thank you and look forward to having your feedback.

        Comment


        • #5
          Hi Boetsie/Marten,

          I'm new to using GapFiller and have some questions about the input files. I am working with a bacterial genome, thus I only truly have one scaffold. My data consist of multiple contigs that when aligned to a reference genome produce scaffolds with gaps of varying length. I also have several contigs with no apparent synteny to my reference strain and I'm not sure how to treat them with GapFiller.

          1) Does GapFiller require the same no. of N's between contigs?
          2) How would GapFiller know to join contigs separated by N's if I'm working with a single super scaffold as with the bacterial genome?
          3) Is it best to make a fasta file containing all my contigs with each contig containing a preset no. of N's at the 3'end of the oriented contig (i.e. add 100 N's to the 3' end of each of my contigs), and if so, do I also need to add the same no. of preset N's to the 5'end?

          Thank you.

          Comment


          • #6
            Hi LadyGlory,

            sorry for my late reply, I was away for some weeks.

            I'm not really sure what you mean. Did you align your contigs to a close reference genome and now have only scaffold with varying N's? If so, you can use this scaffold without a problem. Though, be careful that the gapsize estimation could not be correct if there is large deletion/insertion in your sample compared with your reference. In addition, you will probably end up with large gaps corresponding to regions that are not within your sample, as well as (as you already mentioned) contigs that could not align to your reference genome.

            I think the best option is to first use GapFiller on this scaffold, and see how well the gapclosure went. Otherwise, I would suggest to do a scaffolding based on paired-read information (e.g. with SSPACE, Bambus, SOPRA...), since these programs are not influenced by genomic rearrangements between your reference genome and your sample, such as large inversions, deletions, insertions and translocations.

            Regards,
            Boetsie

            Originally posted by LadyGlory View Post
            Hi Boetsie/Marten,

            I'm new to using GapFiller and have some questions about the input files. I am working with a bacterial genome, thus I only truly have one scaffold. My data consist of multiple contigs that when aligned to a reference genome produce scaffolds with gaps of varying length. I also have several contigs with no apparent synteny to my reference strain and I'm not sure how to treat them with GapFiller.

            1) Does GapFiller require the same no. of N's between contigs?
            2) How would GapFiller know to join contigs separated by N's if I'm working with a single super scaffold as with the bacterial genome?
            3) Is it best to make a fasta file containing all my contigs with each contig containing a preset no. of N's at the 3'end of the oriented contig (i.e. add 100 N's to the 3' end of each of my contigs), and if so, do I also need to add the same no. of preset N's to the 5'end?

            Thank you.

            Comment


            • #7
              Problems trying to run GapFiller

              Hi,

              I have been trying to get gapfiller to work but have run in to real difficulties. I keep getting the following:


              zool1059@ubuntu:~/Documents/Kelly$ perl /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/GapFiller.pl -l library2 -s 18C_3_scaf.fasta -m 30 -b Lib1
              Your inserted inputs on [GapFiller_v1-10] at Sat Sep 22 18:15:48 2012:
              -s 18C_3_scaf.fasta
              -l library2
              -b Lib1
              -o 2
              -m 30
              -r 0.7
              -n 10
              -T 1
              -g 1
              -d 50
              -t 10
              -i 10


              =>Sat Sep 22 18:15:48 2012: Reading and processing paired-read files

              ITERATION 1:

              =>Sat Sep 22 18:15:59 2012: Mapping reads to scaffolds, reading bowtie output and storing unmapped reads
              /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: 2: /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: ���D�=�*3o'X4~b,f Zn%S��s[@gYCj
              dKF P�a>l2T8 NRum /z?e(UJ h+�&O\^�]pwIWG�kVL_{
              �r - q�}0x¬t A6Q�y1B <9c 7|ME;
              5 #. p��0!B�3 �r
              : not found
              /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: 1: /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: ELF : not found
              /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: 3: /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: Ҟ p�yIk�: not found
              /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: 4: /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: @ ��� ���B�O �RLt�E����S � �
              C�c������M : not found
              /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: 5: /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: � �: not found
              /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: 1: /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: � T
              : not found
              /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: 2: /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: b
              K�
              C�
              �F: not found
              /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: 3: /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: }�: not found
              /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: 4: /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: 2� : not found
              /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: 1: /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: nd: not found
              /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: 13: /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: � �


              (U
              lG � �
              %� � ¬ !:
              � R �2
              ,6 �� J � �[
              $ �
              � t �� � �E x
              ^� H �N D �
              36 �� |� � � v _ D�
              � � c
              �[ 4 � � �0 � �

              Z� �
              �� �� ��
              � �� � �� � � � M
              e �
              t z

              F l
              �� i!
              � �g
              1 �� i � �u
              |� ��
              l
              �g � z 1�
              %� Zsd D! td � ! �md�� ! �ndP! tdPl!A I
              ! �pd n ! �rd �A! �nd s
              �qd �: File name too long
              /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: 14: /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/bowtie/bowtie-build: Syntax error: ")" unexpected

              Bowtie-build error; 512 at /home/zool1059/Software/GapFiller_v1-10_linux-x86_64/GapFiller.pl line 242.

              I don't really understand what's going on. As far as I can tell bowtie is installed properly. I created the scaffold file by aligning my velvet contigs.fa against a reference using ABACAS. Any help would be gratefully accepted.

              Cheers,

              Andries

              Comment


              • #8
                I have a finished genome assembled de novo (actually I also used SSPACE to identify maximal connections between final scaffolds). When I used GapFiller instead of reducing the number of Ns, it increased them. I compare the new N list with the original, and it correct some gaps, but sometimes in the original I had just 1 N, and now I have 7 Ns in this position. Can I turn off this sort of parameter?, why is it extending with 6 Ns?
                Thanks

                Comment


                • #9
                  Hi luisgls,

                  Set the -t option to 0, the -t will trim off by default 10 bases of your 'contig' edges, since we have seen that these are usually of bad quality.

                  Regards,
                  Boetsie

                  Originally posted by luisgls View Post
                  I have a finished genome assembled de novo (actually I also used SSPACE to identify maximal connections between final scaffolds). When I used GapFiller instead of reducing the number of Ns, it increased them. I compare the new N list with the original, and it correct some gaps, but sometimes in the original I had just 1 N, and now I have 7 Ns in this position. Can I turn off this sort of parameter?, why is it extending with 6 Ns?
                  Thanks

                  Comment


                  • #10
                    Hi Boetsie,

                    I have used SSPACE with good success, and I congrat you for such a good software. Now I am trying GapFiller. I am using a single library with about 6M reads and a machine with 64 GB RAM. I use somewhat standard parameters, and have tried with -i 1 and -i 2, but the program stops after iteration1, without reporting any error. Just it stops after the "Mapping reads..." log. I am using "bwa" in library file, since my reads range from 36 up to 122 (I have the option to use only >100 bp reads if necessary).

                    I am wondering why this happens. Any idea?
                    Going to try the bowtie option too...

                    thanks
                    CPC

                    Comment


                    • #11
                      Changing "bwa" to "bowtie" I got:

                      Bowtie-build error; -1 at ~/bin/gapfiller line 242.

                      Comment


                      • #12
                        umm when pasting the bowtie line I realized that maybe the symbolik link was causing the problem, and it seems that it was the source of the problem in both cases.

                        Also, could you explain further when and what for using iterations? I have checked and the second iteration is closing gaps, so it seems useful. Why weren't closed during iteration1? It is just the discovery of previously unmapped reads, that now are able to completely map to one edge?

                        Thank you again!
                        CPC

                        Comment


                        • #13
                          Sorry for the late reply! Good that it solved the problem.

                          The iteration is indeed that previously unmapped reads are used for further closing the gap. This is especially useful if you used long insert size libraries for scaffolding.

                          Regards,
                          Boetsie

                          Comment


                          • #14
                            errors in running

                            Hi Marten,

                            Thanks for the tool for gap filling. However, I had some problem running it , basically the problem is it stopped at the the bowtie-build which is the first step of bowtie. I checked the align output file, it turned out that asem1.contig.gpfill.gapclosure.fa is empty. I tried to read through your perl script but failed to understand how asem1.contig.gpfill.gapclosure.fa is generated so I couldn't figure out why this is empty. Could you please let me know the possible reason of it? Thank you very much!

                            -rw-r--r-- 1 users 40 2013-03-08 06:10 asem1.contig.gpfill.bowtieIndex.1.ebwt
                            -rw-r--r-- 1 users 4 2013-03-08 06:10 asem1.contig.gpfill.bowtieIndex.2.ebwt
                            -rw-r--r-- 1 users 0 2013-03-08 06:10 asem1.contig.gpfill.gapclosure.fa



                            perl ~/bin/GapFiller_v1-11_linux-x86_64/GapFiller.pl -l libraries -s asem1.contig -m 30 -o 3 -r 0.7 -n 10 -d 50 -t 0 -g 0 -T 1 -i 1 -b asem1.contig.gpfill

                            Your inserted inputs on [GapFiller_v1-11_Final] at Fri Mar 8 05:37:25 2013:
                            -s asem1.contig
                            -l libraries
                            -b asem1.contig.gpfill
                            -o 3
                            -m 30
                            -r 0.7
                            -n 10
                            -T 1
                            -g 0
                            -d 50
                            -t 0
                            -i 1


                            =>Fri Mar 8 05:37:25 2013: Reading and processing paired-read files

                            ITERATION 1:

                            =>Fri Mar 8 06:10:25 2013: Mapping reads to scaffolds, reading alignment output and storing reads
                            Warning: Empty input file
                            Reference file does not seem to be a FASTA file
                            Command: /home/bin/GapFiller_v1-11_linux-x86_64/bowtie/bowtie-build --quiet --noref asem1.contig.gpfill/alignoutput/asem1.contig.gpfill.gapclosure.fa asem1.contig.gpfill/alignoutput/asem1.contig.gpfill.bowtieIndex

                            Bowtie-build error; 256 at /home/bin/GapFiller_v1-11_linux-x86_64/GapFiller.pl line 242.

                            Comment


                            • #15
                              Hi Martin,

                              I have been working with GapFiller and had great success with the first assembly that I tried it on, but I'm having a puzzling problem with my latest run: a large number of the contigs are being dramatically truncated. Before running GapFiller, the minimum contig size was 200 bp, but after running it there are over 1200 contigs shorter than 100 bp, with some as short as 2 bp. Do you (or anyone else) have any idea what might be going on here? I ran it with the default parameters.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              52 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X