SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
short read length on XL+ vlee2 454 Pyrosequencing 26 04-23-2012 06:43 AM
Mapping Short Reads with unequal length using MAQ TOLEN Illumina/Solexa 0 12-30-2010 07:57 PM
Seed length in Maq sam253 Bioinformatics 0 04-27-2010 09:17 PM
De novo fragment assembly with short mate-paired reads, Does the read length matter? strob Literature Watch 3 10-23-2009 01:36 PM
MAQ demands pairs have the same length? Zigster Bioinformatics 3 06-11-2009 01:52 AM

Reply
 
Thread Tools
Old 02-20-2009, 10:59 AM   #1
jms1223
Junior Member
 
Location: Chapel Hill, NC

Join Date: Feb 2009
Posts: 2
Default MAQ and short read length (DGE)

We are currently looking into the viability of Digital Gene Expression (DGE) or mRNA-seq as a possible replacement for expression microarrays in our breast cancer studies. DGE generates reads that are only 17 bases in length, and thus allowing for even 1 mismatch is a little questionable when aligning against the human genome. MAQ doesn't seem to allow you to specify the -n flag as anything less than 1 - is this something that can be altered easily? I would love to align my short reads via MAQ but only keep those that align perfectly.

Along those lines, if a read maps to more than 1 location, MAQ will randomly pick one of those locations for the placement of that read. Is there any way to customize this function so that it checks against a coordinate file or something like that so we can at least have MAQ select a location for that read that is only in the transcriptome to raise our chances of the placement being 'correct'?

Thank you for your help
jms1223 is offline   Reply With Quote
Old 02-20-2009, 11:06 AM   #2
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

I have looked at DGE data, and even with 16/17 bp, more than 90% map to the tag sequences (all possible 16mers with the enzyme specificity).
I am curious to see how MAQ can be modified as well.. quite a few other tools have specific tag algorithm to take care of such aspects..
bioinfosm is offline   Reply With Quote
Old 02-20-2009, 11:16 AM   #3
jms1223
Junior Member
 
Location: Chapel Hill, NC

Join Date: Feb 2009
Posts: 2
Default

You really see >90% mapping to "canonical" regions?
I've been aligning with MAQ with -n set to 1, and map >99% to the genome. I then extend all reads 4bp off the 5' end and only keep reads that contain CATG (we cut with NlaIII) - we're only keeping 50% of our mapped reads at this step. Then after that we check to see the overlap with genic regions, and it is certainly not as high as you report. What do you do differently?
jms1223 is offline   Reply With Quote
Old 02-20-2009, 12:57 PM   #4
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Technically you should not be trying to align your DGE reads to the genome. The tags may not exist as contiguous sequence in the genome; they may span splice sites or polyadenylation sites. To properly interpret DGE data you should first generate a complete set of predicted tags from the genome and transcriptome and then attempt to align your reads to that. To do this you need a well annotated genome. Please see this thread linked below for the software stack created by Ariel Paulson at the Stowers Institute for creating these tag tables and then scripts to interpret the Eland alignments.

http://seqanswers.com/forums/showthread.php?t=498

I have used this pipeline for a couple of DGE projects. In one project with Arabidopsis I was able to map 97% of my filtered reads to predicted tags. This was allowing for up to 2 mismatches in the alignment. Counting only perfect matches the hit rate was ~90%. Not all of these were mapped to annotated genes though. Roughly 63% were mapped to genes, the remainder were to intergenic or repetitive regions.

Last edited by kmcarr; 02-23-2009 at 02:29 PM. Reason: correct spelling error
kmcarr is offline   Reply With Quote
Old 02-22-2009, 03:57 PM   #5
Torst
Senior Member
 
Location: The University of Melbourne, AUSTRALIA

Join Date: Apr 2008
Posts: 275
Default

Quote:
Originally Posted by jms1223 View Post
Along those lines, if a read maps to more than 1 location, MAQ will randomly pick one of those locations for the placement of that read. Is there any way to customize this function so that it checks against a coordinate file or something like that so we can at least have MAQ select a location for that read that is only in the transcriptome to raise our chances of the placement being 'correct'?
If you only want to have reads mapped to your transcriptome, perhaps just make your reference sequences the transcripts themselves, rather than the genome sequence?

--Torst
Torst is offline   Reply With Quote
Old 02-23-2009, 07:07 AM   #6
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

kmarr and Torst answered that for me jms1223
bioinfosm is offline   Reply With Quote
Reply

Tags
alignment, dge, digital gene expression, maq, short read length

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:32 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO