Seqanswers Leaderboard Ad

**Greg** · 06-25-2010, 09:49 AM

Multimapping reads are definitely an issue. Tophat/bowtie will filter these for you if you set -g to 1. I think there are some papers that use some kind of likelyhood approach to assign ambiguous reads to one location. different people seem to do different things depending on their data and question.

**thinkRNA** · 06-25-2010, 10:41 AM

Originally posted by gen2prot View Post

Hello All,

Lets consider Gene A and Gene B. In the original sam file lets suppose gene A has 25 reads (all ambiguous) while gene B has the same 25 ambiguous reads plus an additional 75 unique reads. If I use my method, I drop the read count of gene A from 25 to zero, while the read-count of gene B drops from 100 to 75. In other words, gene A will have "no expression" however in reality it does. I'd like to know what you guys think, since it does not sound ok to me. But I don't know what to do.

Is there a method to incorporate ambiguous reads? Is there a score associated with each read in the sam file or elsewhere, that outlines the strength of the gene match? Was my idea of using the gene sequences as subject for alignment rather than the chromosome sequences flawed? The total number of reads is about 1000 Mb and I get about 100 Mb of unaligned reads when I run against the gene sequences. Can this might improve if I run the sequences against the chromosomes?

Please let me know.

Thanks,
Abhijit

Tophat is designed to align reads against chromosome/genomic sequence, so I really don't know how useful it will be to use tophat against mRNA sequences.

Finally, if gene A has 25 multireads and gene B has 75 uniq reads + 25 multireads, that is telling you can be more confident of the expression of gene B but not of gene A. However if you have sequenced deep enough (~40-60 million reads), and gene A has portions of it that are unique, you will notice uniq reads show up and you should be fine with throwing away multi reads. An important question is what proportion of genes have the entire sequence paralogous and these will have absolutely no reads mapping to them.

Throwing away multireads introduces non-uniformity in the coverage of the data on a gene. There are papers that have dealt with multireads by using this logic. If a gene X has certain coverage and multireads align gene X as well as gene Y which has no coverage, you can choose gene X over gene Y. Read this paper:http://www.ncbi.nlm.nih.gov/pmc/arti...7/?tool=pubmed

**mrawlins** · 06-25-2010, 10:45 AM

**Is there a method to incorporate ambiguous reads?**
There are a few. I think Cufflinks (part of the bowtie/tophat pipeline) will handle them. An important question to ask here is if the ambiguous reads actually tell you anything. How do you know which gene to assign the ambiguous reads to? Do the genes serve a different biological purpose? If so, why are they so similar? Are you sure that Gene A from your example *shouldn't* have 0 reads? Maybe it really does have zero expression. Maybe it doesn't. How do you know? Does it matter? Assigning the ambiguous reads is fraught with issues. There are enough interesting results from the un-ambiguous reads that it's probably not worth worrying about the ambiguous ones except under some very specific experimental conditions.
[The questions are rhetorical, and may not have good answers. I suspect that, because new papers are coming out on how to deal with multimapped reads, that there are no great answers yet, and/or that the answers vary by experiment.]

** Is there a score associated with each read in the sam file or elsewhere, that outlines the strength of the gene match? **
There is a count of mismatched basepairs, and a length of the read, which can be used to calculate a score. There is also a Phred-style quality, and you could add up the phred score of the matched base-pairs to generate a quality score. Those are the main methods used by mapping software to determine which reads map to which genes. This may be of use, but most likely was already taken into consideration by Bowtie, and there was no clear winner: hence the ambiguity of the reads.

**Was my idea of using the gene sequences as subject for alignment rather than the chromosome sequences flawed?**
You have assumed that running against the whole chromosome will give no additional useful information than running against the gene annotations. That may be justified and reasonable. That may be foolish. We run against the chromosomes in part because we don't trust the gene annotations for our organisms to be 100% right. If you have 100% trust in the drosophila annotations your method is probably reasonable. I personally would want some data to justify that decision, so I would recommend mapping at least one sample against the whole chromosome and comparing the results.

** The total number of reads is about 1000 Mb and I get about 100 Mb of unaligned reads when I run against the gene sequences. Can this might improve if I run the sequences against the chromosomes?**
Maybe. You should try and find out. It shouldn't take more than a day or two of computation to get those results, and comparing the outputs in IGV or another viewer should give you a good indication of how much maps to the intergenic sequences.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 31 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Question on counting number of reads per gene

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News