SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract sequence from multi fasta file with PERL andreitudor Bioinformatics 27 07-07-2019 08:45 AM
FASTA sequence From large BAM file mez Bioinformatics 9 01-13-2013 06:42 AM
Why do I find the sequencing primer sequence in my data? soev Bioinformatics 2 11-17-2011 03:07 AM
How to find sequence in fragmented assembly plants Bioinformatics 0 10-25-2010 03:10 AM
Where can I find the complete FASTA format sequence(human and mouse)? iloveneworleans Bioinformatics 5 02-24-2010 05:00 PM

Reply
 
Thread Tools
Old 12-05-2011, 04:48 PM   #1
dphansti
Member
 
Location: Bay Area

Join Date: May 2011
Posts: 28
Default Find all occurrences of a sequence in a fasta file

I have a fasta file with 16S sequences from many organisms. I want to find all occurrences of a certain ~20 bp sequence in this fasta file. I could do a simple text search but I would prefer to allow some flexibility in the matches.

For each match I would like the following information
1) fasta entry name
2) postion in the sequence
3) CIGAR string or some other representation of the alignment

A SAM file would be fine. I tried using bowtie2 with "-a" but it never seemed to finish. Through trial and error I found that setting "-k" to 150 worked fine but setting "-k" to 200 did not, indicating to me that there is probably some upper limit to the number of matches per query that it can report.

I am certain that what I want to do is commonly done by many people here on the site. What is the easiest/best way to go about it?

Thanks so much.
__________________
Doug
www.sharedproteomics.com
dphansti is offline   Reply With Quote
Old 12-06-2011, 12:49 AM   #2
gringer
David Eccles (gringer)
 
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 836
Default

Have you tried BLAST or BLAT? Those tools are designed for looking for a low number of sequences in a large database of many different sequences.
gringer is offline   Reply With Quote
Old 12-06-2011, 02:57 AM   #3
aloliveira
Member
 
Location: Brazil

Join Date: Aug 2010
Posts: 47
Default

You can use BLAST program and specify the parameter -W 20 (in that way BLAST will report just the hits with at least 20 pb of similarity). Ahhh, an important think deactivate the low complexity filter (-F F)

Thank's
André
aloliveira is offline   Reply With Quote
Old 12-06-2011, 07:11 AM   #4
vamosia
Member
 
Location: New York

Join Date: Mar 2009
Posts: 15
Default

I would suggest primer_match. It is rather simple to use and rather flexible in its output.

The primers in this specific case would be your 20bp sequence. The program would in turn allow you to manipulate the output displaying position, entry name, or counts

http://edwardslab.bmcb.georgetown.ed...mer_match.html

(I am by no means associated with edwards lab, just a frequent user)
vamosia is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 04:36 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO