Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract sequence from multi fasta file with PERL andreitudor Bioinformatics 27 07-07-2019 07:45 AM
FASTA sequence From large BAM file mez Bioinformatics 9 01-13-2013 05:42 AM
Why do I find the sequencing primer sequence in my data? soev Bioinformatics 2 11-17-2011 02:07 AM
How to find sequence in fragmented assembly plants Bioinformatics 0 10-25-2010 02:10 AM
Where can I find the complete FASTA format sequence(human and mouse)? iloveneworleans Bioinformatics 5 02-24-2010 04:00 PM

Thread Tools
Old 12-05-2011, 03:48 PM   #1
Location: Bay Area

Join Date: May 2011
Posts: 28
Default Find all occurrences of a sequence in a fasta file

I have a fasta file with 16S sequences from many organisms. I want to find all occurrences of a certain ~20 bp sequence in this fasta file. I could do a simple text search but I would prefer to allow some flexibility in the matches.

For each match I would like the following information
1) fasta entry name
2) postion in the sequence
3) CIGAR string or some other representation of the alignment

A SAM file would be fine. I tried using bowtie2 with "-a" but it never seemed to finish. Through trial and error I found that setting "-k" to 150 worked fine but setting "-k" to 200 did not, indicating to me that there is probably some upper limit to the number of matches per query that it can report.

I am certain that what I want to do is commonly done by many people here on the site. What is the easiest/best way to go about it?

Thanks so much.
dphansti is offline   Reply With Quote
Old 12-05-2011, 11:49 PM   #2
David Eccles (gringer)
Location: Wellington, New Zealand

Join Date: May 2011
Posts: 838

Have you tried BLAST or BLAT? Those tools are designed for looking for a low number of sequences in a large database of many different sequences.
gringer is offline   Reply With Quote
Old 12-06-2011, 01:57 AM   #3
Location: Brazil

Join Date: Aug 2010
Posts: 47

You can use BLAST program and specify the parameter -W 20 (in that way BLAST will report just the hits with at least 20 pb of similarity). Ahhh, an important think deactivate the low complexity filter (-F F)

aloliveira is offline   Reply With Quote
Old 12-06-2011, 06:11 AM   #4
Location: New York

Join Date: Mar 2009
Posts: 15

I would suggest primer_match. It is rather simple to use and rather flexible in its output.

The primers in this specific case would be your 20bp sequence. The program would in turn allow you to manipulate the output displaying position, entry name, or counts


(I am by no means associated with edwards lab, just a frequent user)
vamosia is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 03:05 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO