Seqanswers Leaderboard Ad

**OneManArmy** · 07-27-2009, 08:43 PM

We've been using AB's rna2map for some miRNA analysis.

From our experience in barcoded samples, we've been getting ~50-60% of usable reads that are mappable.

**OneManArmy** · 07-27-2009, 09:06 PM

On that note, I find that the mapping highly depends on the number of mismatches you set. When increasing the number of mismatches on such short reads, we tend to get multiple mappings for some of the reads. Anyone have any idea on what to set the mismatches to?

50-60% of usable reads is obtained using up to 3 mismatches for seeding, 6 mismatches after extension, and for genomic step, 2 mismatches for seeding and 5 mismatches after extension.

**Sheila** · 07-28-2009, 02:42 AM

Hi,
Thanks for your reply.
Do you mask the last positions of the reads and then use up to 6mm or you use all possitions (no masking)?
In addition, did you get homogeneous distribution of barcodes for all your samples? For us the number of tags per barcode were from 14 to 29M (usable tags not mappable). Not sure if that's normal.

Regards.

S.

**fishtank** · 07-28-2009, 10:29 AM

Originally posted by OneManArmy View Post

On that note, I find that the mapping highly depends on the number of mismatches you set. When increasing the number of mismatches on such short reads, we tend to get multiple mappings for some of the reads. Anyone have any idea on what to set the mismatches to?

50-60% of usable reads is obtained using up to 3 mismatches for seeding, 6 mismatches after extension, and for genomic step, 2 mismatches for seeding and 5 mismatches after extension.

I am trying out rna2map. I am wondering what fraction of total reads map to known miRNA. How does setting 0 mismatches affect that?
I am assuming "multiple mapping for some reads" means the same reads maps to 2 different miRNA. In such cases, does the read gets assigned to both miRNA. How do you get the statistics for such cases?
Is the 6 mismatch considered high for short reads? Since ABI uses colorspace coding, am I right to say 1 mismatch is generally a sequencing error that can be corrected, 2 mismatch is a base substitution so could be a SNP. Does the mismatch have to be adjacent in such cases?
So how does 3 mismatches compares to ? number of mismatch in basespace?
if I set to 1 mismatch, is this almost equivalent to 0 mismatch in base space?
Thanks for your help.

**westerman** · 07-28-2009, 11:54 AM

Originally posted by fishtank View Post

I am trying out rna2map. I am wondering what fraction of total reads map to known miRNA. How does setting 0 mismatches affect that?

Any reads with sequencing errors would get discarded. With luck that won't be a significant number. Also you will discard any reads with SNPs. Unless your reference is really evolutionarily close to your experimental sample you could discard some miRNA. This may bias your results.

I am assuming "multiple mapping for some reads" means the same reads maps to 2 different miRNA. In such cases, does the read gets assigned to both miRNA. How do you get the statistics for such cases?

I am not familiar with the miRNA pipeline but in the "normal" SNP-calling pipeline only unique reads are considered in the end results. If this holds true in miRNA/transcriptome counting then reads with multiple hits will eventually be discarded.

Is the 6 mismatch considered high for short reads?

For 50-mers the answer is 'no, 6 mismatch is ok although 5mm would be better'. In part it depends on the distance of your reference from the experiment. For the shorter miRNA reads there could be problems. For SNP calling the general recommendation is 2mm for 25mers, 3mm for 35mers. For transcriptome calling the program has different mismatch parameters for the 5' part of the 50mer and for the 3' part.

Since ABI uses colorspace coding, am I right to say 1 mismatch is generally a sequencing error that can be corrected, 2 mismatch is a base substitution so could be a SNP. Does the mismatch have to be adjacent in such cases?

Yes, 1mm is a sequencing error. Always. Adjacent 2mm is most likely a SNP although it can, obviously, also be 2 sequencing errors in a row, indel, etc. SNPs have to be adjacent.

So how does 3 mismatches compares to ? number of mismatch in basespace?
if I set to 1 mismatch, is this almost equivalent to 0 mismatch in base space?
Thanks for your help.

Yes, 1mm in CS is 0mm in BS. 3mm in CS is can be 1 SNP plus 1 error. Or 3 errors. Or something else. Base space usually has quality values and thus I am not sure that the questions can be definitively answered.

**fishtank** · 07-28-2009, 12:18 PM

[QUOTE=westerman;6898]Any reads with sequencing errors would get discarded. With luck that won't be a significant number. Also you will discard any reads with SNPs. Unless your reference is really evolutionarily close to your experimental sample you could discard some miRNA. This may bias your results.

I am puzzled: if 1 mismatch is always a sequence error that can be corrected. Why would any reads with sequencing errors get discarded?
Does the abi rna2map pipeline discards reads as a result of sequencing error before the alignment?

**westerman** · 07-28-2009, 12:30 PM

Originally posted by fishtank View Post

I am puzzled: if 1 mismatch is always a sequence error that can be corrected. Why would any reads with sequencing errors get discarded?
Does the abi rna2map pipeline discards reads as a result of sequencing error before the alignment?

That is a good point. With 1 or more mismatch the errors should get corrected and the reads retained.

Of course if you set mismatch to 0 (which is the part of the post I was responding to) then the reads will be discarded. But mismatch 1 or greater should not have the reads being discarded ... just corrected.

**OneManArmy** · 07-28-2009, 01:47 PM

Originally posted by westerman View Post

A
I am not familiar with the miRNA pipeline but in the "normal" SNP-calling pipeline only unique reads are considered in the end results. If this holds true in miRNA/transcriptome counting then reads with multiple hits will eventually be discarded.

The rna2map pipeline counts multiple mappings in the end result - they are not discarded. Thus setting the mismatch threshold too high can yield ambiguous results.

**OneManArmy** · 07-28-2009, 01:52 PM

Originally posted by Sheila View Post

Do you mask the last positions of the reads and then use up to 6mm or you use all possitions (no masking)?

Yes, that is how the pipeline does it. Start mapping 18-mers and extend the mapping until up to 6mm.

Originally posted by Sheila View Post

In addition, did you get homogeneous distribution of barcodes for all your samples? For us the number of tags per barcode were from 14 to 29M (usable tags not mappable). Not sure if that's normal.

No, obviously the barcode distribution will vary depending on how accurately you combined the barcoding samples in the wet lab, lab conditions, etc.. However, I am not too familiar with this part.
14-29M tags, how many barcodes were you running? It really depends on how many beads you loaded on the slide. From ABI's docs it seems that 300M is the number of beads you're supposed to load - if you are using the full 10 barcodes in SREK that seems about right.

**Sheila** · 07-29-2009, 12:12 AM

Originally posted by OneManArmy View Post

The rna2map pipeline counts multiple mappings in the end result - they are not discarded. Thus setting the mismatch threshold too high can yield ambiguous results.

In the configuration file you can choose between "all" or "unique".
all = all mapping positions
unique= unique mapping positions

S.

**Sheila** · 07-29-2009, 12:20 AM

[QUOTE=westerman;6898]For 50-mers the answer is 'no, 6 mismatch is ok although 5mm would be better'. In part it depends on the distance of your reference from the experiment. For the shorter miRNA reads there could be problems. For SNP calling the general recommendation is 2mm for 25mers, 3mm for 35mers. For transcriptome calling the program has different mismatch parameters for the 5' part of the 50mer and for the 3' part.

I'd rather use 35nt for miRNAs since their size varies between 19 and 25nt (Human).
6 mismatches seem quite a lot but bare in mind the last bases of the miRNA that are close to the adaptor have a high error rate. I wouldn't use 0 mismatches.

S.

**fishtank** · 07-29-2009, 11:14 AM

[QUOTE=Sheila;6925]

Originally posted by westerman View Post

For 50-mers the answer is 'no, 6 mismatch is ok although 5mm would be better'..In part it depends on the distance of your reference from the experiment. For the shorter miRNA reads there could be problems. For SNP calling the general recommendation is 2mm for 25mers, 3mm for 35mers. For transcriptome calling the program has different mismatch parameters for the 5' part of the 50mer and for the 3' part.

Can you clarify what you mean on the distance of your reference from the experiments?
Have anyone tried comparing different mismatches settings for the rna2map? I still can't get a sense of what is optimal for miRNA transcriptome. Does it make sense to set the seed mm to 0 and set extension mm to 3?

Originally posted by Sheila View Post

I'd rather use 35nt for miRNAs since their size varies between 19 and 25nt (Human).
6 mismatches seem quite a lot but bare in mind the last bases of the miRNA that are close to the adaptor have a high error rate. I wouldn't use 0 mismatches.

S.

What are the reasons for not using 0 or 1 mismatches besides the low counts thus missing low abundance miRNA. If you are comparing differential expression, these miRNA wouldn't be statistically significant most times. On the other hand if you can boost the counts for such low abundance miRNA allowing for higher mm, it becomes questionable the accuracy of the method.

Also how is "usable reads" defined? Where do I get statistics for those?

From my runs, it says 64540015 total beads but the uniquely placed beads are 0.31% for 0 mismatches. Is that low? It reports up to 2.04% for up to 6 mismatches.

**westerman** · 07-29-2009, 12:00 PM

Originally posted by fishtank View Post

Can you clarify what you mean on the distance of your reference from the experiments?

Evolutionary distance (or divergence) in million of years.

So many people, especially in the SOLiD camp, do experiments of DNA versus known and well annotated genomes. E.g., human DNA vs. the known human genome reference. When you do this type of experiment you can get away with low mismatch requirements because you expect your sequence DNA to be very close to the reference. 2 mismatches is great for SNP discovery since any given read is unlikely to have more than 1 SNP in it. Anything else can be discarded as error.

On the other hand some of us have to deal DNA from species only partially related to our known (and often incomplete) reference sequence. We then use larger mismatch parameters and are thankful for what information we do get back.

When I talk to the ABI they always are thinking in "perfect human reference" terms. Thus I try to be careful to couch my answers in terms of evolutionary distance. I.e., what works for me in the rough-and-ready world of plant genomics may not be strictly applicable to you if you are working in human genomics.

From my runs, it says 64540015 total beads but the uniquely placed beads are 0.31% for 0 mismatches. Is that low? It reports up to 2.04% for up to 6 mismatches.

That is low but it depends on your reference and your DNA and your organism. Which I do not think you have stated. But given this thread I presume that your reference is genomic and your DNA is microRNA. In that case you have to ask yourself, "how much of the genome do I expect to be miRNA as versus other RNA, genes, and structural?" If the answer is that you expect only 0.3% of your genome to be miRNA then your mapping is fine.

microRNA are so newly discovered -- i.e., since I've been out of school

-- that I am not sure how much of a genome should be miRNA. I could tell you roughly how much of genome should be gene and thus how much a mRNA experiment should have have as coverage but not for miRNAs.

**fishtank** · 07-29-2009, 01:46 PM

Originally posted by Sheila View Post

I'd rather use 35nt for miRNAs since their size varies between 19 and 25nt (Human).
6 mismatches seem quite a lot but bare in mind the last bases of the miRNA that are close to the adaptor have a high error rate. I wouldn't use 0 mismatches.
S.

I am wondering where you came to the conclusion that last bases of the miRNA that are close to the adaptor have a high error rate. Could these be due to miRNA editing?

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 48 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

miRNA analysis

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News