Seqanswers Leaderboard Ad

**chris** · 11-07-2008, 07:29 AM

Hi Leo,

A couple of questions:
What proportion of your have reads these 3' 'untemplated' bits?

If it's not too many, I wouldn't worry about it. For small RNA you tend to get high levels of coverage so throwing out a few is fine.

What are the quality scores like for your reads, esp. at the 3' end?
If the quality is poor, you could try clipping the reads in order to remove the low quality regions. We've had to do that in the past as for some datasets the quality was so poor that any reads longer than 25bp were virtually guaranteed to not match to the genome.

Regards,
Chris

**satp** · 11-07-2008, 11:47 PM

Hi chris,

Thanks for your answer.

Take sequences refer to hsa-let-7 as example, the total amount of sequences with 3' untemplated nucleotides is one fifth of sequences without untemplated nucleotides. In addition, most of the sequences with 3' untemplated nucleotides have 1 to 3 untemplated nucleotides. I don't know whether it is appropriate to discard these sequences with 3' untemplated nucleotides, since I want to compare the expression level of microRNA between two samples and as we know, some isomiRs indeed have a untemplated nucleotide in vivo.

Most of the quality scores of 3' untemplated nucleotides are ok, see the sequences as follows:

@I82_3_FC30HF2AAXX:6:1:11:1235
TGAGGTAGTAGGTTGTATAGTTAATCGTATGCCGT
+I82_3_FC30HF2AAXX:6:1:11:1235
hhghhhhhhhhhhhhhhhhhhhhhhhhhhh[hhhh
@I82_3_FC30HF2AAXX:6:1:15:1585
TGAGGTAGTAGGTTGTATAGTTATCGTATGCCGTC
+I82_3_FC30HF2AAXX:6:1:15:1585
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
@I82_3_FC30HF2AAXX:6:1:1334:984
TGAGGTAGTAGGTTGTATAGTTAAAATCGTATTCC
+I82_3_FC30HF2AAXX:6:1:1334:984
hhhhhhhchhhXhhhhhhhdhhhhhhhO_hhhDTS
@I82_3_FC30HF2AAXX:6:1:420:1736
TGAGGTAGTAGGTTGTATAGTTCATCGTATGCCGT
+I82_3_FC30HF2AAXX:6:1:420:1736
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh`Z_h
@I82_3_FC30HF2AAXX:6:1:511:1438
TGAGGTAGTAGGTTGTATAGTTGTCGTATGCCGTC
+I82_3_FC30HF2AAXX:6:1:511:1438
hhhhhhhhhhhhhhhhhhhhhhhhhShhhYhhVhF

The 3' adaptor sequence is TCGTATGCCGTCTTCTGCTTG

Regards,
Leo

**myrna** · 11-09-2008, 01:51 PM

untemplated 3' additions

The 3' variability of miRNAs is a headache for both mapping and quantitation. We have recently adopted Novoalign for mapping miRNA-seq reads, since it allows multiple mismatches while still finding the optimal alignment. The alignment process takes many, many CPU-hours, so I recommend collapsing your reads first (which means you can't use your quality values). Once you have all your alignments, you can sum up the tags that align to the same place in the genome (including any with mismatches at the 3' end or elsewhere). This is probably more appropriate than throwing away tags with mismatches, since some miRNAs might be more prone to these extensions than others.

Ryan

**satp** · 11-09-2008, 06:38 PM

Originally posted by myrna View Post

The 3' variability of miRNAs is a headache for both mapping and quantitation. We have recently adopted Novoalign for mapping miRNA-seq reads, since it allows multiple mismatches while still finding the optimal alignment. The alignment process takes many, many CPU-hours, so I recommend collapsing your reads first (which means you can't use your quality values). Once you have all your alignments, you can sum up the tags that align to the same place in the genome (including any with mismatches at the 3' end or elsewhere). This is probably more appropriate than throwing away tags with mismatches, since some miRNAs might be more prone to these extensions than others.

Ryan

Hi Ryan,

I haven't used Novoalign. Which one is faster when it compared to the megablast?

Do you mean that all sequences with 3' untemplated nucleotides can be used for the subsequence analysis? Then how these 3' untemplated nucleotides generate?

I have read your paper about analysing hESC microRNA which published in GR. I noticed that you had analysed the single nucleotide 3' extension in this paper. Could you please tell me how to analyse it since many microRNA sequences have more than one untemplated nucleotides in its 3' end.

Thanks for any help.

Leo

**myrna** · 11-09-2008, 09:36 PM

3' extensions

Hi Leo.
From what is known about miRNA target selection, the 3' extensions should not affect the interaction between a miRNA and its target. If you take this perspective, then the sum of all tags for a given miRNA (including any 3' variants) should tell you how much of the mature miRNA was in the cell. I used megablast for the hESC miRNA paper because there was no better option at the time. SOAP was the first aligner that really addressed the issue of variable length alignment (for next-gen sequence data). Novoalign is much faster than SOAP and allows more flexibility, so that is what we are using now.

Ryan

**chris** · 11-10-2008, 02:57 AM

We've tended to use vmatch for doing complete variable length matches in the past, but now bowtie seems to be ticking all the boxes for small RNA sequences. Never used any of the BLAST tools as they didn't seem to fit our needs and you need to mess around with gap penalties etc.

In terms of matching to known miRNAs, I've used vmatch to match the reads to the mature sequences by ignoring any 3' or 5' extensions. This gives me the complete set of matching reads.

**myrna** · 11-10-2008, 09:16 AM

Bowtie for miRNA alignment

How does Bowtie handle read trimming for miRNA data? Does it recognize the adaptor in advance and only align the pre-adaptor portion of the read? Or does it do a local alignment of the full read against the reference?

**chris** · 11-11-2008, 12:29 AM

No. The adaptors have to be removed prior to matching against the reference. To ensure that the majority of the adaptors are removed we also clip reads using quality score thresholds. i.e. moving from 5' to 3' if the mean quality drops below, say, 20 the read is clipped at that position.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 25 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

A question about the small RNA sequencing data

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News