SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Using Bfast to align paired end Illumina reads gavin.oliver Bioinformatics 14 01-14-2012 07:51 AM
Query bam file and assemble reads dustar1986 Bioinformatics 10 09-29-2011 08:48 PM
Looking for assembler to align very short reads andylai Bioinformatics 0 06-02-2011 09:58 PM
Assembling De Novo 454 Transcriptome Contigs and Singletons with Illumina Short Reads Vickenstein Bioinformatics 7 03-05-2011 01:43 AM
PubMed: PASS: a Program to Align Short Sequences. Newsbot! Bioinformatics 0 02-17-2009 06:00 AM

Reply
 
Thread Tools
Old 11-17-2011, 01:07 PM   #1
hohllp
Junior Member
 
Location: Montreal

Join Date: Jan 2010
Posts: 4
Default align short query against 300M Illumina reads

Hi,

I have 8 short (~60nt) query sequences which I want to align to roughly 300M Illumina reads (100bp). Criteria are:

- Allow a few mismatches (< 8)
- Gaps are not allowed.
- Alignment should span the entire query.
- Report all matches, NOT just the best one.

Do any programs exists that can accomplish this task? I was thinking BWA, or other next-gen read mappers, but these usually work with whole genomes as their database and reads as the query. The case I have here is somewhat different and may not work with these programs.

Any ideas?

Thanks in advance for the help.

Last edited by hohllp; 11-17-2011 at 01:49 PM.
hohllp is offline   Reply With Quote
Old 11-17-2011, 08:00 PM   #2
Dario1984
Senior Member
 
Location: Sydney, Australia

Join Date: Jun 2011
Posts: 166
Default Using Bioconductor

You can simply do it in R.

1. Install the Biostrings package

source("http://bioconductor.org/biocLite.R")
biocLite("Biostrings")

2. Load the Biostrings package

library(Biostrings)

3. Use the matchPattern function. To get a detailed help page for it, type :

?matchPattern

You'll see it has all of the options that you need for your example.

matchPattern(pattern, subject, max.mismatch=0, min.mismatch=0, with.indels=FALSE, fixed=TRUE, algorithm="auto")
Dario1984 is offline   Reply With Quote
Old 11-18-2011, 06:46 AM   #3
hohllp
Junior Member
 
Location: Montreal

Join Date: Jan 2010
Posts: 4
Default

Thanks. I'll give it a go.
hohllp is offline   Reply With Quote
Old 11-18-2011, 11:46 AM   #4
GenoMax
Senior Member
 
Location: East Coast USA

Join Date: Feb 2008
Posts: 6,998
Default

Blat should work for this too: http://genome.ucsc.edu/FAQ/FAQblat.html#blat3
GenoMax is offline   Reply With Quote
Old 11-18-2011, 11:49 AM   #5
hohllp
Junior Member
 
Location: Montreal

Join Date: Jan 2010
Posts: 4
Default

Quote:
Originally Posted by GenoMax View Post
Blat should work for this too: http://genome.ucsc.edu/FAQ/FAQblat.html#blat3
I am currently trying BLAT. Cannot build a database that large so splitting the 300M reads into subsets.
hohllp is offline   Reply With Quote
Old 11-21-2011, 01:40 PM   #6
hohllp
Junior Member
 
Location: Montreal

Join Date: Jan 2010
Posts: 4
Default

FYI, BLAT worked well. I created a series of databases, each containing 7.5 million reads.
hohllp is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:20 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO