Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Aligning only unique reads in Bowtie gzentner Bioinformatics 6 07-07-2019 07:07 AM
aligning the reads kjaja Bioinformatics 2 07-07-2019 06:45 AM
align small reads on small sequences NicoBxl Bioinformatics 2 08-18-2011 04:21 AM
Can BWA align reads to references that are even shorter? asiangg Bioinformatics 4 04-06-2011 11:06 AM
Considering Quality scores of reads when aligning thinkRNA Bioinformatics 2 06-01-2010 07:40 AM

Thread Tools
Old 07-03-2010, 12:32 PM   #1
R diggity
Location: Tennessee

Join Date: Jun 2010
Posts: 12
Default Aligning numerous reads to several small references

Hi all,

I'm trying to assemble a large set of illumina reads (over 18 million) to a reference. My reference consists of multiple candidate sequences varying in size and location across the genome. I used Maq to map my paired-end reads to just one of these individual sequences but was only able to map around 10% of the reference. I just began this project and am wondering:

-How should I approach the preparation of my reference file(s)?
-Should I narrow my reads?
-What is the scope of a typical project in Maq in terms of read number and reference size?

Any advice or helpful tutorials/references would be greatly welcome!
R diggity is offline   Reply With Quote
Old 07-09-2010, 06:33 AM   #2
Location: Iowa City, IA

Join Date: Jul 2010
Posts: 95

Originally Posted by R diggity View Post
My reference consists of multiple candidate sequences varying in size and location across the genome.
Careful, if you are only aligning to your regions of interest you will often end up with false mappings. Generally the best approach is to map to the entire genome, and filter the results to your regions of interest.

10% mapping is not surprising for a hybridization based capture of a small region (I am assuming this is what you are doing). I did an Agilent capture / GA2 sequencing in human and got 16% mapping to the 0.3Mbase of target regions.
adamdeluca is offline   Reply With Quote
Old 07-10-2010, 11:40 AM   #3
R diggity
Location: Tennessee

Join Date: Jun 2010
Posts: 12

Thanks for the advice. I suppose I will have to construct my reference genome from quite a few separate linkage groups. Given that my reads are 75bp in length, will I have to manually manipulate the reference sequence such that it has gaps greater that 75bp between chromosomes?

Edit: I found a FASTA file containing the entire genome with the linkage groups treated as separate sequences. Does Maq understand this?

Edit2: I used easyrun to map paired ends to the genome, and only mapped 18.24%. I'm fairly certain I'm doing something incorrectly.

Last edited by R diggity; 07-10-2010 at 02:05 PM.
R diggity is offline   Reply With Quote
Old 07-10-2010, 12:26 PM   #4
Location: Southwest Florida

Join Date: Sep 2009
Posts: 24
Thumbs up multiple reference sequences

I do not know if you have tried the CLC bio software at all, but it should be able to handle your data in a variety of ways. First, you can easily map your Illumina reads to multiple reference sequences. If these reference sequences are a subset of a larger genome, you can also use our targeted resequencing tool to get a report of the mapping of your reads to the targeted area vs the non targeted area. The tools are pretty flexible, so there are a lot of different ways that you can apply them to your data. The software is commercial, but you can use the trial for two weeks to see if it is able to solve any of your problems. The download is available from the CLC web site: I hope you'll try it.

Note: I work for CLC.
Nomijill is offline   Reply With Quote

candidate, maq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 10:26 AM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO