SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Finding Unique sequences not shared between closely related species duartemolha Bioinformatics 0 03-11-2013 09:54 AM
Identifying intron sequences in de novo assembled transcripts? BobFreemanMA Bioinformatics 2 02-01-2013 06:39 AM
identifying motifs from set of up stream sequences a_mt De novo discovery 0 08-02-2012 01:34 AM

Reply
 
Thread Tools
Old 05-27-2013, 02:22 AM   #1
ddunbar
Junior Member
 
Location: Edinburgh

Join Date: May 2013
Posts: 2
Default Identifying unique aptamer sequences

Hello all.
A biologist colleague has generated sequencing libraries based on a SELEX type of enrichment of artificial aptamers that have bound to bacterial cells. He will have Ion Torrent sequence (short read, single end) output (millions of reads per sample) and would like help with identifying sequences that uniquely or preferentially bind each bacterial strain. The aptamers are 80 nucleotides long and are generated randomly. There are several rounds of enrichment, so there will be sequences represented multiple times. Biological replicates will help find true positives.

Ideally he would find sequences that are present exclusively in each bacterial strain's bound aptamer population. Initially we'll look at the full length aptamers but of course specific motifs present in different aptamers may be enriched.

Does anyone know if there is a Bioconductor (or other) package that will already do this kind of counting short reads and comparing between samples?

This can be done in Perl, for example, using hashes and counting each sequence (and potentially each kmer in the reads) but I suspect there will be a better way to do it. We don't wan to reinvent the wheel and would like to reuse anyone's good ideas and code.

Any help or thoughts would be greatly appreciated.

Donald
ddunbar is offline   Reply With Quote
Old 05-30-2013, 01:17 AM   #2
dawe
Senior Member
 
Location: 4530'25.22"N / 915'53.00"E

Join Date: Apr 2009
Posts: 258
Default

You may deal with aptamer sequences using only bash utilities with grep, awk, sort and uniq.
First of all you have to put all sequences (one per line) in a file, then

$ sort file | uniq -c | sort -k1,1n > counted_sequences

You will end up with hundreds (or thousands) out of millions with a power law enrichment count.
Once you have all files, counted, it's easy with grep to check counts across SELEX cycles or samples.

HTH
dawe is offline   Reply With Quote
Old 05-30-2013, 01:45 AM   #3
ddunbar
Junior Member
 
Location: Edinburgh

Join Date: May 2013
Posts: 2
Default

Many thanks for that dawe. Works nicely.
Best wishes,
Donald
ddunbar is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:43 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO