SEQanswers

Go Back   SEQanswers > Applications Forums > Sample Prep / Library Generation



Similar Threads
Thread Thread Starter Forum Replies Last Post
what does "edit distance" mean? efoss Bioinformatics 4 09-30-2013 03:50 AM
Edit distance in BWA Rachelly Bioinformatics 4 05-05-2012 11:19 PM
PEM data: edit distance meritxellop Bioinformatics 1 07-04-2010 08:57 AM
BWA concise format output -edit distance wrong biterbilen Bioinformatics 2 11-06-2009 02:55 PM
In Sequence: 1,001 Arabidopsis Genomes Will Enable Researchers to Link Genotypes to T Newsbot! SOLiD 0 10-07-2008 01:30 PM

Reply
 
Thread Tools
Old 01-06-2011, 10:52 AM   #1
bcf
Junior Member
 
Location: land of milk and honey

Join Date: Mar 2009
Posts: 2
Default Edit distance sequence tags available (link within)

Hi everyone,

I've generated several sets of edit (levenshtein) distance/metric sequence tags from 4nt to 10nt and edit distance 3 to edit distance 9. Within each sheet of the spreadsheet (or length/edit distance combination in csv format), all tags are at least the given edit distance from one other.

While not entirely exhaustive (for several reasons), these sequence tags approach the maximum number possible (after filtering) within each category of length.

I'm in the process of prepping a manuscript describing the code for tag generation (as part of a larger package on which i'm working - largely in a private branch of the repo.), but it seemed these tags might be useful to others in the meantime.

the tags are available from:

https://github.com/BadDNA/edittag/downloads

there's a blog post with a bit of description here:

http://b.atcg.us/blog/2010/11/20/a-p...ence-tags.html

and the code to generate the tags is here:

https://github.com/BadDNA/edittag/bl...shtein_tags.py

I generated the tags in the files provided using the following (to generate 10nt tags with a minimum edit distance of 3, for example):

Code:
python make_levenshtein_tags.py --tag-length=10 --edit-distance=3 \
--no-polybase --gc --comp --use-c --multiprocessing \
--min-and-greater | tee 10_nt.txt
Run time for this example was 80 hours or so on 6 cores of 8 (Mac Pro, 8 GB RAM) or approx. 480 hours wall-clock time. Run times for smaller tag sets (e.g. 8 nt) are about a minute.

if you use these tags, for the time being, please "cite" the repository link (https://github.com/baddna/edittag/). Once we get a manuscript out, I'll provide proper citation information in the README.

WORDS OF CAUTION: i'm undertaking a re-organization of the code in this repository, committing lots of unit tests, refactoring, documenting, etc. - so expect the code to change as i update and refactor.

please let me know if you have any questions.

best,
b

Last edited by bcf; 01-18-2011 at 09:58 PM. Reason: Updating URLs for repository move
bcf is offline   Reply With Quote
Old 02-12-2011, 04:43 PM   #2
bcf
Junior Member
 
Location: land of milk and honey

Join Date: Mar 2009
Posts: 2
Default edittag packaged up and on pypi

Hi everyone,

two quick things:

-b

Last edited by bcf; 02-14-2011 at 07:37 AM. Reason: preprint is now accessible
bcf is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 03:21 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO