Seqanswers Leaderboard Ad

**Melissa** · 06-11-2010, 01:35 AM

miRNA ranges from 16-29nt. Your 29 nt peak seems too high to ignore. Would advise to check for contaminating sequences. Make a nucleotide distribution plot of these 29 nt might give u some ideas.

Try mapping again with less stringent parameters. There will always be some sequences that just wouldn't align and remain a question.

**quicksand21** · 06-15-2010, 06:20 PM

when the technicians conducted the small RNA protocol, I believe they ran a gel and cut out two bands corresponding to the smallest sequence lengths...thus pre-selecting for the small RNAs, correct? Could the second band they cut out be related to this big peak at 29? Also, I did what Melissa suggested, and have what seems to be well-distributed nucleotides. I'll add an attachment to my original post.

Also, what do you think about removing any sequences that only have a copy number of 1 or 2? These obviously are not well-represented in the sample, and could be a result of some sort of machine error?

**epigen** · 06-16-2010, 03:02 AM

Just the other day we discussed the following paper about miRNA NGS in our group:
Deep sequencing reveals differential expression of microRNAs in favorable versus unfavorable neuroblastoma.
Schulte JH, Marschall T, Martin M, Rosenstiel P, Mestdagh P, Schlierf S, Thor T, Vandesompele J, Eggert A, Schreiber S, Rahmann S, Schramm A.
Nucleic Acids Res. 2010 May 13.
You might find it helpful since they find a large fraction of 35 bp reads, probably constituting tRNAs and pre-miRNAs. Less than 50% of all reads align uniquely to the human genome or known miRNAs, but repeats, mRNAs or contamination are rare. They didn't elaborate on the issue, but we think the low mapping rate results from the short length of miRNAs and RNA editing, leading to a high mismatch rate. I'm not familiar with Bowtie so I can't judge how you defined the mismatch rate.
As to discarding sequences with low copy number, the authors required the miRNAs to be present at least 5 times to be included in the following analyses.

**Nomijill** · 06-16-2010, 10:16 AM

new miRNA software

Hi,

I don't know if this would be helpful to you, but CLC just released a new version of our Genomics Workbench and the most significant added functionality is a suite of tools to support miRNA analysis. You would be welcome to use the free trial to see if it clarifies some of the questions that you have with your results.

Here is a link to download the new release.

Home - QIAGEN Digital Insights

http://www.clcbio.com/index.php?id=83

Welcome to QIAGEN Digital Insights LabCorp uses QCI and HGMD to improve identification and interpretation of genetic variants within inhereited diseases.Read...

Also, let me know if you want to just take a look at the tutorial, and I will send it to you.

Best of luck with your analysis!
Naomi

**quicksand21** · 06-16-2010, 01:42 PM

@ epigen: Thank you very much for this paper recommendation. I will read through it. It seems from your post that this may answer some of my issues.

@ Naomi: Thanks for this suggestion. I think I may take you up on your suggestion to try the trial version

**Simon Anders** · 06-17-2010, 08:29 AM

Originally posted by quicksand21 View Post

3. I aligned my unique reads to the human genome using BowTie. I am getting approximately 45-60% of the unique reads aligning to the genome, while the rest of the reads do not align. I have used the command:

./bowtie -n 1 -l 17 -k 200 --best --chunkmbs 128

If your fragments are shoryter than your reads, the reads will contain parts of the adapter on the 3' side of the fragments, which will confuse the aligner. You need to trim the reads by matching their ends against the (reverse-complemented) sequence of the ends of your 5' adapter and trimming off any matches.

Our HTSeq framework contains functionality for this. I can give you a mroe detailed explanation if needed.

Simon

**mmartin** · 09-14-2010, 10:05 AM

Hi, I'm one of the authors of the paper mentioned by epigen. I just made our software for adapter removal (that we used in the paper) available. See the project homepage at https://code.google.com/p/cutadapt/ .

**Simon Anders** · 09-15-2010, 06:33 AM

I know this is an old thread and the original poster is probably long gone, but still one relevant information for people still reading here. Quicksand21 wondered why he was only left with 300,000 reads of initially 22 millions after preprocessing. His preprocessing involved "removing redundancy", which, I suppose, means removing all reads with the same sequence. Now, as he sequenced microRNA, every miRNA that got properly sequenced will appear only once because each miRNA species can give rise to only one read sequence (namely, if all works correctly: the miRNA sequence, followed by the 3' adapter). It is debatable whether duplicate reads should be removed in mRNA-Seq (I'd say: no) but in miRNA reads is removes all signal.

Simon

**foxyg** · 09-15-2010, 05:07 PM

On a side question, will exon sequencing capture any miRNA?

**dnusol** · 09-28-2010, 04:14 AM

Dear Simon,

I just saw your comment. I am having the same issue as the OP. From a file having 7M reads I ended up with 370K reads after filtering (adapter trimming, singleton eliminating and quality of reads). I don´t understand when you say that each miRNA species should give only one sequenced read. Actually, from these 370K sequences representing miRNAs, there are 90K unique sequences, so I am assuming that this is my family of miRNA, and not the 370K list.
Regardless, I am also puzzled with such a reduction in data, since my QA filter and the singleton filter only removed 20% of the reads.

Dave

**Simon Anders** · 09-28-2010, 04:29 AM

Hi Dave

A miRNA transcript is typically 22 bp long. You add adapters to both ends and sequence from the P5 adapter onwards to the end, into the P7 adapter. After trimming that off, you are left with a read of 22 bp, and if you sequence many transcript molecules of the same miRNA, you will see many reads with this 22 bp sequence.

In ChIP-Seq, most poeple delete redundant reads, i.e., if they see several reads with exactly the same sequence, they would remove all but one. The original post sounds suspiciously like this was done there, too. All I wanted to point out is that doing so for miRNA-Seq data would be a very bad idea because you fully expect to see the same read sequence many times, and this is no artifact but your signal.

I guess you meant the opposite with "singleton filter", namely to remove those reads that occur only once.

Maybe you have been too aggressive with filtering. How much does each step of your filtering pipeline remove?

Simon

**dnusol** · 09-28-2010, 06:21 AM

Thanks for your comments Simon, yes, this time singleton removal is the other way round as you say, removing reads that occur only once.

For this sample, after adapter removal nearly 90% of the original reads were filtered out due to them being too short after adapter trimming (38 cycle run, minimum length cutoff 17nt), ending in about 370K reads. On another sample, starting with similar number of reads, only about 20% of the original reads were discarded.
I do quality and singleton filtering after adapter trimming, and the percentage of reads filtered out is roughly similar for both samples(15%)
So I see two possibilities, a technical problem in preparation of that sample or something biologically meaningful. I rather think it is something interesting, but what can it be?

**Simon Anders** · 09-28-2010, 06:29 AM

Why don't you try to align the reads with less than 17nt as well? Most of them will be too short to give a unique alignment, but if you tell your aligner to discard these (instead of discarding them yourself before alignment), you can check whether the remaining once actually map to miRNA loci, and this could help you figure out what happened in the library prep.

Also, look at a histogram of the sizes after adapter trimming. Are they just below 17 nt, or so short that they could be primer dimers?

S

**dnusol** · 09-29-2010, 07:25 AM

Hi again Simon,

they shouldn´t be primer dimers because they are not selected from the sample prep. Looking at the bioanalyzer electropherogram, primer dimers can be seen before the main peak. Nevertheless I will have a look.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 39 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 41 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 35 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

miRNA analysis..

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News