SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Does a reliable consensus mean more reliable SNPs? lg36 Bioinformatics 1 06-04-2012 12:36 AM
HELP, primer design problem in gap closure f0415007 General 1 09-28-2011 02:44 AM
HELP, primer design problem in gap closure f0415007 Illumina/Solexa 0 09-20-2011 11:46 PM
Computational Gap Closure for 454 and Solid avtsanger Bioinformatics 1 08-19-2010 05:02 AM
PCR Primer Design Software for Gap-closure bulletin2007 454 Pyrosequencing 5 05-19-2010 11:58 AM

Reply
 
Thread Tools
Old 07-09-2012, 02:31 AM   #1
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default Reliable gap closure of scaffolds with GapFiller

Hi all,

after a successful release of SSPACE (http://seqanswers.com/forums/showthread.php?t=8350) we have generated a new tool, called GapFiller, for closing the remaining gaps produced after scaffolding.

GapFiller seeks to find reads that potentially fall within gaps by aligning paired-reads with Bowtie or BWA(-sw). Per gap, it extends both sides until a user-defined overlap is find, and the number of gaps corresponds to the initial number of gapped nucleotides in the scaffolds (allowing a user-defined deviation).

The main features;

* Inputs are simple FASTA scaffold sequences as well as (multiple) FASTA/FASTQ paired-read data
* Multiple library input of both paired-end and/or mate pair datasets
* High-quality closing of gaps
* High reduction of the number of gaps, and the number of gapped nucleotides
* Detailed output of the gaps, e.g.number of reads used, number of nucleotides, remaining gapped nucleotides
* Detailed output of the gapclosing process.

GapFiller has been tested and compared with various datasets (PE and MP), *different gapclosure tools (IMAGE and SOAP's GapClosure ) and different species. GapFiller was tested on four prokaryotes; E.coli,* (E.coli, S.coelicolor, S. aureus, R.* sphaeroides) and two eukaryotes (S.cerevisiae, human chromosome 14).

The results, using the quality metrics of GAGE ( http://gage.cbcb.umd.edu/results/index.html), show that the quality of the closure of GapFiller is more accurate than IMAGE and SOAP's GapClosure.
Although GapFiller yields similar results in terms of the number of gaps/nucleotides closed as SOAP's GapClosure, the smaller error rate indicates that our tool is more appropriate for reliable gap filling.


Further details are provided in our paper in biology (http://genomebiology.com/2012/13/6/R56/abstract). The program can be obtained from our website (http://www.baseclear.com/bioinformatics-tools/) and is free for academic users.

Hope it could be useful and any comments or questions are welcome.

Regards,
Marten Boetzer a.k.a. Boetsie
boetsie is offline   Reply With Quote
Old 07-24-2012, 03:59 AM   #2
rayanc
Junior Member
 
Location: FR

Join Date: Sep 2011
Posts: 6
Default

Methods such as GapFiller are immensely useful and this article looks good, keep up the good work Marten!
rayanc is offline   Reply With Quote
Old 08-03-2012, 12:17 AM   #3
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Thank you for the kind reply. We hope our programs can be of any use, and we are continuously trying to improve our programs as well as developing new ones.
boetsie is offline   Reply With Quote
Old 08-26-2012, 09:32 PM   #4
dnajuice
Junior Member
 
Location: USA

Join Date: Aug 2012
Posts: 8
Default

Hi Boetsie/Marten,

I'm using SSPACE for contig extension and scaffolding, and it works pretty well. I am also interested in GapFiller. However, I'm a bit confused about the utility of contig extension and gap closure. For example, before running GapFiller, is it necessary to run SSPACE first to extend the contigs? Or should I use non-extended scaffolds directly for GapFiller? Will these two different SSPACE scaffold inputs (extended vs. non-extended) affect GapFiller result?

Thank you and look forward to having your feedback.
dnajuice is offline   Reply With Quote
Old 08-27-2012, 01:47 PM   #5
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Hi dnajuice,

thank you for your question, and for using our software. It is not necessary to extend the contigs before gapclosure. GapFiller will simply extract the scaffolds and tries to fill them. The extension step of SSPACE is only to further extend the contigs to improve scaffolding.

The extension of SSPACE is based on unmapped single-reads (reads that do not map to any of the contigs), while GapFiller makes use of paired-reads, making the extension more reliable. In addition, GapFiller is able to fill repeated regions, while the extension of SSPACE can not do this, since it uses reads only once. I don't think the extension step of SSPACE will affect the gapclosing, as long as the extension is correct of course, so do not set the extension settings too low.

Regards,
Marten



Quote:
Originally Posted by dnajuice View Post
Hi Boetsie/Marten,

I'm using SSPACE for contig extension and scaffolding, and it works pretty well. I am also interested in GapFiller. However, I'm a bit confused about the utility of contig extension and gap closure. For example, before running GapFiller, is it necessary to run SSPACE first to extend the contigs? Or should I use non-extended scaffolds directly for GapFiller? Will these two different SSPACE scaffold inputs (extended vs. non-extended) affect GapFiller result?

Thank you and look forward to having your feedback.
boetsie is offline   Reply With Quote
Old 09-10-2012, 09:45 PM   #6
LadyGlory
Junior Member
 
Location: Hawaii

Join Date: Sep 2012
Posts: 1
Default

Hi Boetsie/Marten,

I'm new to using GapFiller and have some questions about the input files. I am working with a bacterial genome, thus I only truly have one scaffold. My data consist of multiple contigs that when aligned to a reference genome produce scaffolds with gaps of varying length. I also have several contigs with no apparent synteny to my reference strain and I'm not sure how to treat them with GapFiller.

1) Does GapFiller require the same no. of N's between contigs?
2) How would GapFiller know to join contigs separated by N's if I'm working with a single super scaffold as with the bacterial genome?
3) Is it best to make a fasta file containing all my contigs with each contig containing a preset no. of N's at the 3'end of the oriented contig (i.e. add 100 N's to the 3' end of each of my contigs), and if so, do I also need to add the same no. of preset N's to the 5'end?

Thank you.
LadyGlory is offline   Reply With Quote
Old 09-23-2012, 11:12 AM   #7
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Hi LadyGlory,

sorry for my late reply, I was away for some weeks.

I'm not really sure what you mean. Did you align your contigs to a close reference genome and now have only scaffold with varying N's? If so, you can use this scaffold without a problem. Though, be careful that the gapsize estimation could not be correct if there is large deletion/insertion in your sample compared with your reference. In addition, you will probably end up with large gaps corresponding to regions that are not within your sample, as well as (as you already mentioned) contigs that could not align to your reference genome.

I think the best option is to first use GapFiller on this scaffold, and see how well the gapclosure went. Otherwise, I would suggest to do a scaffolding based on paired-read information (e.g. with SSPACE, Bambus, SOPRA...), since these programs are not influenced by genomic rearrangements between your reference genome and your sample, such as large inversions, deletions, insertions and translocations.

Regards,
Boetsie

Quote:
Originally Posted by LadyGlory View Post
Hi Boetsie/Marten,

I'm new to using GapFiller and have some questions about the input files. I am working with a bacterial genome, thus I only truly have one scaffold. My data consist of multiple contigs that when aligned to a reference genome produce scaffolds with gaps of varying length. I also have several contigs with no apparent synteny to my reference strain and I'm not sure how to treat them with GapFiller.

1) Does GapFiller require the same no. of N's between contigs?
2) How would GapFiller know to join contigs separated by N's if I'm working with a single super scaffold as with the bacterial genome?
3) Is it best to make a fasta file containing all my contigs with each contig containing a preset no. of N's at the 3'end of the oriented contig (i.e. add 100 N's to the 3' end of each of my contigs), and if so, do I also need to add the same no. of preset N's to the 5'end?

Thank you.
boetsie is offline   Reply With Quote
Old 09-26-2012, 09:04 AM   #8
luisgls
Junior Member
 
Location: barcelona

Join Date: Oct 2011
Posts: 2
Default

I have a finished genome assembled de novo (actually I also used SSPACE to identify maximal connections between final scaffolds). When I used GapFiller instead of reducing the number of Ns, it increased them. I compare the new N list with the original, and it correct some gaps, but sometimes in the original I had just 1 N, and now I have 7 Ns in this position. Can I turn off this sort of parameter?, why is it extending with 6 Ns?
Thanks
luisgls is offline   Reply With Quote
Old 09-28-2012, 02:40 AM   #9
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Hi luisgls,

Set the -t option to 0, the -t will trim off by default 10 bases of your 'contig' edges, since we have seen that these are usually of bad quality.

Regards,
Boetsie

Quote:
Originally Posted by luisgls View Post
I have a finished genome assembled de novo (actually I also used SSPACE to identify maximal connections between final scaffolds). When I used GapFiller instead of reducing the number of Ns, it increased them. I compare the new N list with the original, and it correct some gaps, but sometimes in the original I had just 1 N, and now I have 7 Ns in this position. Can I turn off this sort of parameter?, why is it extending with 6 Ns?
Thanks
boetsie is offline   Reply With Quote
Old 02-27-2013, 08:29 AM   #10
CPCantalapiedra
Member
 
Location: Zaragoza (Spain)

Join Date: Sep 2011
Posts: 38
Default

Hi Boetsie,

I have used SSPACE with good success, and I congrat you for such a good software. Now I am trying GapFiller. I am using a single library with about 6M reads and a machine with 64 GB RAM. I use somewhat standard parameters, and have tried with -i 1 and -i 2, but the program stops after iteration1, without reporting any error. Just it stops after the "Mapping reads..." log. I am using "bwa" in library file, since my reads range from 36 up to 122 (I have the option to use only >100 bp reads if necessary).

I am wondering why this happens. Any idea?
Going to try the bowtie option too...

thanks
CPC
CPCantalapiedra is offline   Reply With Quote
Old 02-27-2013, 08:37 AM   #11
CPCantalapiedra
Member
 
Location: Zaragoza (Spain)

Join Date: Sep 2011
Posts: 38
Default

Changing "bwa" to "bowtie" I got:

Bowtie-build error; -1 at ~/bin/gapfiller line 242.
CPCantalapiedra is offline   Reply With Quote
Old 02-27-2013, 08:59 AM   #12
CPCantalapiedra
Member
 
Location: Zaragoza (Spain)

Join Date: Sep 2011
Posts: 38
Default

umm when pasting the bowtie line I realized that maybe the symbolik link was causing the problem, and it seems that it was the source of the problem in both cases.

Also, could you explain further when and what for using iterations? I have checked and the second iteration is closing gaps, so it seems useful. Why weren't closed during iteration1? It is just the discovery of previously unmapped reads, that now are able to completely map to one edge?

Thank you again!
CPC
CPCantalapiedra is offline   Reply With Quote
Old 03-05-2013, 07:38 AM   #13
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Sorry for the late reply! Good that it solved the problem.

The iteration is indeed that previously unmapped reads are used for further closing the gap. This is especially useful if you used long insert size libraries for scaffolding.

Regards,
Boetsie
boetsie is offline   Reply With Quote
Old 03-07-2013, 01:49 PM   #14
lizzyzhao
Junior Member
 
Location: California

Join Date: Jan 2013
Posts: 1
Default errors in running

Hi Marten,

Thanks for the tool for gap filling. However, I had some problem running it , basically the problem is it stopped at the the bowtie-build which is the first step of bowtie. I checked the align output file, it turned out that asem1.contig.gpfill.gapclosure.fa is empty. I tried to read through your perl script but failed to understand how asem1.contig.gpfill.gapclosure.fa is generated so I couldn't figure out why this is empty. Could you please let me know the possible reason of it? Thank you very much!

-rw-r--r-- 1 users 40 2013-03-08 06:10 asem1.contig.gpfill.bowtieIndex.1.ebwt
-rw-r--r-- 1 users 4 2013-03-08 06:10 asem1.contig.gpfill.bowtieIndex.2.ebwt
-rw-r--r-- 1 users 0 2013-03-08 06:10 asem1.contig.gpfill.gapclosure.fa



perl ~/bin/GapFiller_v1-11_linux-x86_64/GapFiller.pl -l libraries -s asem1.contig -m 30 -o 3 -r 0.7 -n 10 -d 50 -t 0 -g 0 -T 1 -i 1 -b asem1.contig.gpfill

Your inserted inputs on [GapFiller_v1-11_Final] at Fri Mar 8 05:37:25 2013:
-s asem1.contig
-l libraries
-b asem1.contig.gpfill
-o 3
-m 30
-r 0.7
-n 10
-T 1
-g 0
-d 50
-t 0
-i 1


=>Fri Mar 8 05:37:25 2013: Reading and processing paired-read files

ITERATION 1:

=>Fri Mar 8 06:10:25 2013: Mapping reads to scaffolds, reading alignment output and storing reads
Warning: Empty input file
Reference file does not seem to be a FASTA file
Command: /home/bin/GapFiller_v1-11_linux-x86_64/bowtie/bowtie-build --quiet --noref asem1.contig.gpfill/alignoutput/asem1.contig.gpfill.gapclosure.fa asem1.contig.gpfill/alignoutput/asem1.contig.gpfill.bowtieIndex

Bowtie-build error; 256 at /home/bin/GapFiller_v1-11_linux-x86_64/GapFiller.pl line 242.
lizzyzhao is offline   Reply With Quote
Old 03-27-2013, 03:01 AM   #15
heath.obrien
Junior Member
 
Location: Toronto, ON, Canada

Join Date: Apr 2011
Posts: 2
Default

Hi Martin,

I have been working with GapFiller and had great success with the first assembly that I tried it on, but I'm having a puzzling problem with my latest run: a large number of the contigs are being dramatically truncated. Before running GapFiller, the minimum contig size was 200 bp, but after running it there are over 1200 contigs shorter than 100 bp, with some as short as 2 bp. Do you (or anyone else) have any idea what might be going on here? I ran it with the default parameters.
heath.obrien is offline   Reply With Quote
Old 03-29-2013, 10:13 AM   #16
mcnelson.phd
Senior Member
 
Location: Connecticut

Join Date: Jul 2011
Posts: 162
Default

I've been trying to get GapFiller working on a number of Illumina assemblies that we have in my lab, but I'm not getting all of the output files that the manual says I should.

I ran the tutorial on the scaffolds in the example/ folder using the stipulated read sets, but I never get the following output files: XXX.filler.final.text, XXX.closed.evidence.txt, XXX.gapfilled.final.fa.

I do get the following files: XXX.summarfile.final.txt, XXX.gapclosure.fa and a bwa logfile in alignoutput/ and an empty XXX.closed.evidence.iteration1.txt file in intermediate_results/.

Has anyone else had this problem or have any suggestions for getting the correct output files?
mcnelson.phd is offline   Reply With Quote
Old 03-31-2013, 11:50 PM   #17
CPCantalapiedra
Member
 
Location: Zaragoza (Spain)

Join Date: Sep 2011
Posts: 38
Default

I had the same problem sometimes. Don't know exactly why, but running again the analysis often works in my case, and the complete output file set is generated. You can try to run with several iterations and you can check in the output which iteration is stopping at. Maybe stderr could give some light too?
CPCantalapiedra is offline   Reply With Quote
Old 04-04-2013, 07:33 AM   #18
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Quote:
Originally Posted by mcnelson.phd View Post
I've been trying to get GapFiller working on a number of Illumina assemblies that we have in my lab, but I'm not getting all of the output files that the manual says I should.

I ran the tutorial on the scaffolds in the example/ folder using the stipulated read sets, but I never get the following output files: XXX.filler.final.text, XXX.closed.evidence.txt, XXX.gapfilled.final.fa.

I do get the following files: XXX.summarfile.final.txt, XXX.gapclosure.fa and a bwa logfile in alignoutput/ and an empty XXX.closed.evidence.iteration1.txt file in intermediate_results/.

Has anyone else had this problem or have any suggestions for getting the correct output files?
Hi, somehow gapFiller stopped at a certain point. Can you look at the command line at see if there is any error?

Regards,Boetsie
boetsie is offline   Reply With Quote
Old 04-04-2013, 07:37 AM   #19
boetsie
Senior Member
 
Location: NL, Leiden

Join Date: Feb 2010
Posts: 245
Default

Quote:
Originally Posted by heath.obrien View Post
Hi Martin,

I have been working with GapFiller and had great success with the first assembly that I tried it on, but I'm having a puzzling problem with my latest run: a large number of the contigs are being dramatically truncated. Before running GapFiller, the minimum contig size was 200 bp, but after running it there are over 1200 contigs shorter than 100 bp, with some as short as 2 bp. Do you (or anyone else) have any idea what might be going on here? I ran it with the default parameters.
Hmmm, that is very strange. Any idea if the scaffolds contain very short sequences between gaps, e.g.;

AAGCTGCTAGNNNGTATGNNNNNAGGGTAGATAG

I think the script does not handle these patterns very well at the moment. I'm working on this.

Regards,
Boetsie
boetsie is offline   Reply With Quote
Old 04-04-2013, 09:36 PM   #20
megancamilla
Junior Member
 
Location: Australia

Join Date: Mar 2013
Posts: 1
Default GapFiller doesn't go through iterations

Quote:
Originally Posted by CPCantalapiedra View Post
umm when pasting the bowtie line I realized that maybe the symbolik link was causing the problem, and it seems that it was the source of the problem in both cases.

Also, could you explain further when and what for using iterations? I have checked and the second iteration is closing gaps, so it seems useful. Why weren't closed during iteration1? It is just the discovery of previously unmapped reads, that now are able to completely map to one edge?

Thank you again!
CPC

Hey CPCantalarpiedra,

I'm having exactly the same problem. Can you elaborate on how you fixed your problem? Did you remove a symbolic link or put one in? Also in which directory is your bwa and bowtie installed? (i.e. /usr/local/bin?)

Thanks!

Megan
megancamilla is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 06:26 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO