Seqanswers Leaderboard Ad

**bioinfosm** · 02-20-2009, 12:06 PM

I have looked at DGE data, and even with 16/17 bp, more than 90% map to the tag sequences (all possible 16mers with the enzyme specificity).
I am curious to see how MAQ can be modified as well.. quite a few other tools have specific tag algorithm to take care of such aspects..

**jms1223** · 02-20-2009, 12:16 PM

You really see >90% mapping to "canonical" regions?
I've been aligning with MAQ with -n set to 1, and map >99% to the genome. I then extend all reads 4bp off the 5' end and only keep reads that contain CATG (we cut with NlaIII) - we're only keeping 50% of our mapped reads at this step. Then after that we check to see the overlap with genic regions, and it is certainly not as high as you report. What do you do differently?

**kmcarr** · 02-20-2009, 01:57 PM

Technically you should not be trying to align your DGE reads to the genome. The tags may not exist as contiguous sequence in the genome; they may span splice sites or polyadenylation sites. To properly interpret DGE data you should first generate a complete set of predicted tags from the genome and transcriptome and then attempt to align your reads to that. To do this you need a well annotated genome. Please see this thread linked below for the software stack created by Ariel Paulson at the Stowers Institute for creating these tag tables and then scripts to interpret the Eland alignments.

Gene Expression Tag Table Software Now Available - SEQanswers

http://seqanswers.com/forums/showthread.php?t=498

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

I have used this pipeline for a couple of DGE projects. In one project with Arabidopsis I was able to map 97% of my filtered reads to predicted tags. This was allowing for up to 2 mismatches in the alignment. Counting only perfect matches the hit rate was ~90%. Not all of these were mapped to annotated genes though. Roughly 63% were mapped to genes, the remainder were to intergenic or repetitive regions.

**Torst** · 02-22-2009, 04:57 PM

Originally posted by jms1223 View Post

Along those lines, if a read maps to more than 1 location, MAQ will randomly pick one of those locations for the placement of that read. Is there any way to customize this function so that it checks against a coordinate file or something like that so we can at least have MAQ select a location for that read that is only in the transcriptome to raise our chances of the placement being 'correct'?

If you only want to have reads mapped to your transcriptome, perhaps just make your reference sequences the transcripts themselves, rather than the genome sequence?

--Torst

**bioinfosm** · 02-23-2009, 08:07 AM

kmarr and Torst answered that for me jms1223

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 48 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

MAQ and short read length (DGE)

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News