Seqanswers Leaderboard Ad

**N311V** · 07-29-2014, 05:11 PM

I've not worked with RNAseq data yet so I'm no expert but my opinion from a statistical stand point is to agree with your reasoning.

I particularly dislike option 3 if you're interested in differential expression. Lets say you have 10 million reads and we'll assume they can all be uniquely mapped, this gives you essentially 10 million data points, but if each read is counted multiple times due to the use of non-unique mapping you could end up with 100s of million of data points which in reality is not true. I think this approach would be asking for trouble. I agree that option 1 could introduce bias. Since you are trying all three anyway see how strongly correlated options 1 and 2 are for the miRNAs, if they're not strongly correlated I'd suspect the bias you predict is the reason.

**Mike2188** · 07-30-2014, 02:30 PM

Generally when you are aligning it is best to align to the genome for several reasons:
a) do not bias towards transcriptome or sequences you are aligning too
b) allows for discovery of novel miRNAs, other short RNAs, etc

What is your goal ultimately? If you are just looking for differential expression of miRNAs, perhaps you could get away with the first option, but I wouldn't recommend it.

I wouldn't recommend the third option at all. Tophat will attempt to align and choose the best match for each read. If two reads have equal matching to the genome then one will be pseudorandomly selected (it isn't completely random, as the same input will always yield the same output), and will align. So, if you have sufficient coverage the expression will still be recorded for repetitive elements. If you had two identical miRNA sequences in the genome and one or both were being expressed, you would expect them both to be detected.

Now miRNAs often only differ by a few basepairs from what I recall. If you had two miRNAs, and only one of them was expressed, but they were similar here is what you would expect from option 2 and option 3:

Option 2: The alignment would occur for both miRNA sequences, however, as only one would be a perfect match, and the other would have 1 or more mismatches, the expression of only the perfectly matched miRNA would be detected.

Option 3: As multiple alignments are allowed, both sequences would fall into the alignment criteria (assuming you aren't allowing a mismatch of 0), and the expression of both would be detected.

Also, if you were using option three then, you could probably only allow 0 mismatches, which may cause you to lose reads that you may be able to detect in option 2 allowing 1 or 2.

**lre1234** · 07-31-2014, 04:47 AM

Thanks for the reply's.

What is your goal ultimately? If you are just looking for differential expression of miRNAs, perhaps you could get away with the first option, but I wouldn't recommend it.

Our goals are to find differentially expressed miRNAs as well as any other short RNA species which is expressed. We would also like to do some novel miRNA searching. So based on this, aligning to the miRNAs is a bad choice. Although I have seen many papers do this method, but still I am highly against it.

I wouldn't recommend the third option at all. Tophat will attempt to align and choose the best match for each read. If two reads have equal matching to the genome then one will be pseudorandomly selected (it isn't completely random, as the same input will always yield the same output), and will align. So, if you have sufficient coverage the expression will still be recorded for repetitive elements. If you had two identical miRNA sequences in the genome and one or both were being expressed, you would expect them both to be detected.

I definitely agree with your points on the non-unique mapping, which is why I am trying to stay away from it.

Although you mention this:

Also, if you were using option three then, you could probably only allow 0 mismatches, which may cause you to lose reads that you may be able to detect in option 2 allowing 1 or 2.

This was an option that I was considering, essentially allowing a read to go to multiple places, but in all instances it allowing 0 mismatches. For the cases in which there are 2 miRNAs with the exact same sequences (e.g. miR-103a I believe is an example), we would get a read to map to both copies of it, but we would not know from which locus it is being expressed. This approach might also help with repetitive elements in genome when each has the same sequence.

Are samples are currently having the libraries made and should be on the machine next week. We'll see what happens. I'll try to post back with an update after trying a few different approaches.

**fanli** · 07-31-2014, 09:51 AM

There are a number of approaches for resolving multiply-mapped reads:

http://bioinformatics.oxfordjournals.org/content/30/5/644.long

http://bioinformatics.oxfordjournals.org/content/25/19/2613.short

You could also try using option 3 to discover new loci and then collapse redundant loci (such as the miR-103a in your example), then combine these with existing miR/smRNA databases and align using option 1.

Note that Bowtie is preferred over Bowtie2 for short reads, and that the default Tophat segment length is 25nt. You may want to tweak these things for smRNA alignment.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

mapping to the genome uniquely or non-uniquely?

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News