SEQanswers

Go Back   SEQanswers > General



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to compute exression ratio of Human Exon 1.0 ST array on probe level tujchl Bioinformatics 14 10-30-2012 07:30 AM
any in silico enzyme digestion program for genome sunsnow86 Bioinformatics 5 10-08-2012 07:09 PM
Tool for Identifying Restriction Sites at IUPAC SNP sites? brachysclereid Bioinformatics 5 02-09-2011 01:58 AM
PubMed: Genome-wide identification of human RNA editing sites by parallel DNA capturi Newsbot! Literature Watch 0 05-30-2009 05:07 AM

Reply
 
Thread Tools
Old 11-16-2010, 05:39 PM   #1
lunacab
Junior Member
 
Location: stockholm

Join Date: Oct 2010
Posts: 2
Default how to compute all restriction enzyme sites in the human genome?

Dear colleagues,
I have a very simple question to ask but I am struggling with it...
I have a restriction enzyme of 6 nucleotides and i want to find ALL sites within the human genome (hg19 for instance) where the restriction enzyme matches the sequence.
I was trying to use blast but it seems that I am using a too short sequence so it never returns a list.
Any recommendations on how to compute that?
thanks a lot in advance
lunacab is offline   Reply With Quote
Old 11-17-2010, 01:19 PM   #2
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 116
Default

This should be fun. A real classic bioinformatics task for beginners.

There are some good books out there for learning how to solve these problems.
Beginning Perl for Bioinformatics
Bioinformatics Programming in Python: A Practical Course for Beginners

For working environments you could try:
DNA Linux

This kind of task is also an excellent starting point for learning simple scripting tasks on your own. In other words, you could use this as an excuse to learn some Python, Perl, Regex, Awk, etc.

There are also packages/libraries of code that will have already solved many of these types of basic bioinformatics tasks. To name just a few of these: BioPerl, BioPython, EMBOSS, etc.
malachig is offline   Reply With Quote
Old 11-17-2010, 02:00 PM   #3
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 116
Default

In case you feel that my previous post was dodging your question ... attached is an example Perl script that you could use as a starting point. It uses regex to identify occurrences of one string (an RE sequence) within another string (a chromosome).

In this example if you want to get all the EcoRI sites on chromosome 22 you would do this (from a linux prompt):
./findRestrictionSites.pl --genome_version=hg19 --chromosome=22 --re_site=GAATTC

The output will be one site per line in the format: chr:start-end

There is also a list of online RE analysis tools here.
Attached Files
File Type: pl findRestrictionSites.pl (1.6 KB, 487 views)
malachig is offline   Reply With Quote
Old 11-17-2010, 02:04 PM   #4
obig
Member
 
Location: Berkeley

Join Date: Nov 2010
Posts: 12
Default

If you prefer to use R/Bioconductor, you might investigate the BSgenome and Biostrings packages. Here's a document walking your through the process:
http://www.bioconductor.org/packages...eSearching.pdf
obig is offline   Reply With Quote
Old 11-18-2010, 04:37 PM   #5
lunacab
Junior Member
 
Location: stockholm

Join Date: Oct 2010
Posts: 2
Default

Thanks a lot! Very very useful!
lunacab is offline   Reply With Quote
Old 08-19-2011, 11:59 AM   #6
ParthavJailwala
Member
 
Location: Maryland, USA

Join Date: Oct 2009
Posts: 27
Default

I have used BioStrings and BSgenome to find restriction sites in the mouse genome...it works great. The only caveat is that you have to use 'matchPattern()' on a per chromosome basis, and then append all the output files if a single per genome file is desired.
ParthavJailwala is offline   Reply With Quote
Old 01-22-2016, 03:03 AM   #7
Vandelnokk
Junior Member
 
Location: Finland

Join Date: Oct 2012
Posts: 1
Default HiCUP

Hi,

check out HiCUP digester in its pipeline:
http://www.bioinformatics.babraham.a...tion/#Digester

Best
Vandelnokk is offline   Reply With Quote
Old 02-23-2016, 06:20 AM   #8
craigdj
Junior Member
 
Location: Ohio

Join Date: Feb 2016
Posts: 1
Default

Hi lunacab,

Would you be willing to share your data regarding the restriction site coordinates in the human genome? It would be incredibly helpful!
craigdj is offline   Reply With Quote
Old 02-23-2016, 06:38 AM   #9
dariober
Senior Member
 
Location: Cambridge, UK

Join Date: May 2010
Posts: 310
Default

Quote:
Originally Posted by malachig View Post
It uses regex to identify occurrences of one string (an RE sequence) within another string (a chromosome).
Just as a comment, if I'm not mistaken your scripts reverse-complements the regular expression, which is something that cannot be done. I'd rather reverse complement the reference sequence even if it is more "expensive".
dariober is offline   Reply With Quote
Old 02-23-2016, 06:43 AM   #10
blancha
Senior Member
 
Location: Montreal

Join Date: May 2013
Posts: 367
Default

EMBOSS is an old program, but it works remarkably well for this type of task.
Don't be fooled by the dated website.
It is a very efficient program.
http://emboss.sourceforge.net/apps/c.../restrict.html
blancha is offline   Reply With Quote
Old 07-09-2017, 03:36 PM   #11
malachig
Senior Member
 
Location: WashU

Join Date: Aug 2010
Posts: 116
Default

Quote:
Originally Posted by dariober View Post
Just as a comment, if I'm not mistaken your scripts reverse-complements the regular expression, which is something that cannot be done. I'd rather reverse complement the reference sequence even if it is more "expensive".
I'm not sure I follow. It doesn't reverse-complement the "regular expression" it reverse complements the restriction enzyme sequence (a string) that is used in the regular expression. We can either (A) search for our string of interest in the reference sequence and its reverse complement, or (B) search for our string of interest and its reverse complement in the reference sequence.

These two approaches should be equivalent. The script uses option B.
malachig is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:24 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO