SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Short Read Micro re-Aligner (beta release) nilshomer Bioinformatics 27 04-17-2014 09:29 AM
SNP finder roshanbernard Bioinformatics 5 12-16-2011 01:46 AM
Tandem Repeats Finder source code Ashu Bioinformatics 0 10-31-2011 06:03 AM
Which peak-finder after bowtie? Chip-seq sbrohee Bioinformatics 4 10-12-2011 06:29 AM
Good ChIP-seq finder? steinmann Bioinformatics 4 05-10-2010 07:10 AM

Reply
 
Thread Tools
Old 04-05-2010, 12:21 PM   #1
Broadie
Member
 
Location: Boston, MA

Join Date: Oct 2009
Posts: 15
Default Join Finder Beta Release

I have just released a beta version of Join Finder, a Perl script for consed users that helps find joins between gap edges. This is the first I have released anything on Sourceforge so please forgive any oversights on my part. The script also requires blastall and formatdb be installed on your system.

It can be downloaded here:
http://sourceforge.net/projects/joinfinder/


A) DEPENDENCIES

Join Finder is dependant on two NCBI Tools, formatdb and
blastall. Blast can be downloaded here:

http://www.ncbi.nlm.nih.gov/staff/ta...nix_setup.html

Once you have a working Blast installation, continue with the
installation instructions below.

Also note, this script is designed to work with consed ace files and
requires use to output a file from consed when running. See section C,
BASIC HELP, for details.

B) INSTALLATION INSTRUCTIONS

1) Download join_finder.pl to a location in a linux machine.
2) Open the program in a text editor such as xemacs.
3) Edit line 22 so that the text between quotation marks is the
explicit path to the local blastall installation. For example, if
blastall is installed in
/production/tools/blast/blast-2.2.14/bin/blastall, line 22 should
read:

my $blast_path="/production/tools/blast/blast-2.2.14/bin/blastall";

4) Edit line 23 as you edited line 22, so that the text between quotation marks is the
explicit path to formatdb. So it would look something like:

my $formatdb_path="/production/tools/blast/blast-2.2.14/bin/formatdb";

5) If you know how many cores the linux machine that blast is running
on has, change the "4" in line 26 to that number. If you do not know,
you can try changing this number to 1 if you have any problems.

C) BASIC HELP

Prior to using the program, you must save an info.txt file using
Consed for the ace file assembly you wish to analyze for joins. Do
this by opening the ace file in consed and selecting Info>Show Maps of
Contigs In Scaffolds>Save to File>OK.

DESCRIPTION
Join Finder helps locate joins by blasting contig end sequence probes and looks for link-supported joins.
Sequence probes are BLASTed against 2KB ends of contigs, strict matches are examined against link
information and potential joins are outputed in jf.results.
-----------
OPTIONS
l: Setting to adjust how many low-quality bases(less than 25) are allowed in probe sequences. Default is 0.
p: Setting to adjust probe size in terms of bases. Default is 100.
o: Specifiy alternate output file name. Default is jf.results.
h: Print this help information.
-------
USAGE
join_finder.pl <ACE FILE> <ONO FILE> -l <INTEGER> -p <INTEGER> -o <OUTPUT FILE NAME>
Ex(Default Usage):
~amr/bin/tools/join_finder.pl 454Contigs.ace.1 info.txt
Ex(Advanced Usage):
join_finder.pl 454Contigs.ace.1 info.txt -l 5 -p 30 -o myjoins.txt
-----

A few notes regarding the options above:
The option -l, for low-quality bases, instructs join finder to back away from the gap edge
when selected probe sequence until this threshold is reached. For
example, if -l is set to default(0), this probe would be accepted:

TTCGGGTAACTTCCACTTCGTCATTCCCGCG

But the one below would be rejected, because lower case bases indicate
low quality, and there are 3 low quality bases. Hence join finder
would slide away from the gap and try again.

ttcGGGTAACTTCCACTTCGTCATTCCCGCG

If -l is set to 5, however, the probe would be accepted, since the
user elected to allow 5 low quality bases.

The option -p instructs join finder on the size of probes to use when
looking for joins. The default is 100, but you can find more joins
with a smaller number. However, you will also find more false
positives, in which a join is proposed that is really just a
tandem repeat split by a gap. The script is a useful tool for finding
joins quickly but you still must excercise your own judgement.

D) JOIN FINDER OUTPUT

Join finder outputs several files, but the most important is the file
"jf.results". Output will look like this:

Contig.RightEdge-Contig.LeftEdge Probe Sequence %ID RightStart RightEnd E-Val BitScore
contig00013.right-contig00014.left TCGGGAGCAACAGAAACCGCTCCGCCGTCA 100.00 235 264 8e-12 60.0
contig00015.right-contig00016.left ATGTTGCCGTATTTGAGGATGATGTCCTGT 100.00 58 87 8e-12 60.0
contig00020.right-contig00021.left TCAGCTATTTAGAATAAaTTTtGAAaCTct 100.00 209 238 8e-12 60.0

The first column indicates that the right edge of contig00013 has a
potential join with the left edge of contig 00014. The Probe Sequence
column indicates the sequence that matches on both sides of the gap,
and can be used in a string search in consed. %ID, E-Val, and BitScore
are taken from the blast file "jf_blast.out". Right Start and Right
End are the coordinates of the probe sequence on the right side of the
gap in which a join appears to exist.

E) Please send bug reports and feature requests to
amr@broadinstitute.org. While join finder is provided "as-is" I will
try fix bugs or update features as time permits.
Broadie is offline   Reply With Quote
Old 04-06-2010, 04:24 AM   #2
sklages
Senior Member
 
Location: Berlin, DE

Join Date: May 2008
Posts: 628
Default

What is the advantage of using 'joinfinder' over using 'cross_match' in consed's AssemblyView? 'cross_match' is flexible, fast and output is integrated in consed.

But I probably missed something,
cheers,
Sven
sklages is offline   Reply With Quote
Old 04-06-2010, 05:04 AM   #3
Broadie
Member
 
Location: Boston, MA

Join Date: Oct 2009
Posts: 15
Default

Hi Sven,

Thats a good question. Personally, I haven't had much luck finding joins with cross match. It may be some of the settings we are using, but I very often find joins between neighboring contigs that crossmatch doesn't find. I also find the display to be cluttered and distracting when using cross match.

Aside from that, I find making joins this way to be more expedient. With cross match, you are continuously reloading the assembly view after each join. We have assemblies ranging from 20 gaps to 200 or more, so this gets tedious. With the join finder identified joins, you simply put the probe sequence in the string search window, bring up the two contigs, compare and then join them.

If you achieve success with cross match, I would encourage you to stick with what works for you, but I find this method to be easier and faster.
Broadie is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:50 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO