SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Small RNA data analysis asaleh SOLiD 2 06-23-2011 01:04 PM
Small RNA data analysis katussa10 De novo discovery 2 05-23-2011 07:00 AM
A question about the small RNA sequencing data satp Illumina/Solexa 8 11-10-2008 11:29 PM

Reply
 
Thread Tools
Old 01-24-2017, 10:49 PM   #1
ErikFas
Member
 
Location: Sweden

Join Date: Jun 2014
Posts: 86
Default Alignment of small RNA data

I was recently at a meeting about RNA-seq in general, and the topic of small RNA-seq came up, something with which I'm quite unfamiliar. The discussions were interesting, but seeing as I didn't know much about sRNA-seq (and I was the "RNA-seq"-guy at the meeting), they didn't get very far. I've since tried to learn a bit about it, and I wanted to ask some questions to clear up things I'm not sure about...

1) A general pipeline for sRNA-seq. As far as I understand it, the sequencing adapters are proportionally a much larger part of the reads than for normal RNA-seq. This would make adapter trimming more or less mandatory for any sRNA-seq analysis. Is this correct?

2) Seeing as sRNA is a lot smaller, would that mean that there are more duplicated reads in an sRNA-seq dataset? If so, would you remove them?

3) As far as alignment goes, I can't really understand if one should use one of the sRNA-specific aligners I seem to find by googling, or to use one of the normal RNA-seq aligners (STAR, Tophat, etc.). I seem to find information saying that you can use either...

4) Can you align to the normal human reference genome (such as GRCh38), or do you need to add some sRNA-specific database? I found miRBase, for example, which (as far as I can tell) is a database for miRNA sequences. I assume one could align to that, if one is only interested in miRNA? Or should those sequences be added to e.g. GRCh38 and then aligned to the collated reference?

Since I'm interested in this purely from a learning and knowledge perspective, I won't actually work with any sRNA-seq dataset. I did download a run from the SRA and put it through my standard alignment pipeline just to see what happened, though. I got around 80% ambigously alignments and about 10% duplicated reads using just a very simple STAR 2-pass alignment to GRCh38 without any sRNA-specific sequences added and no adapter/quality trimming. Do these numbers make sense for the non-optimised (from an sRNA perspective) pipeline used? What would be required to get a better alignment?
ErikFas is offline   Reply With Quote
Old 01-25-2017, 01:04 AM   #2
nanos
Member
 
Location: Vienna

Join Date: May 2010
Posts: 11
Default

Dear Eric, we are very often analyzing sRNA data and I can give you some insight.
1) Adaptor trimming is really a must. With the minimum sequencing length being 50 you always have adaptor remnants in the sequence.

2) removing duplicated reads would be a problem. The problem is, that you will in most of the cases have the full length sequence of you the sRNA sequenced. Therefore in contrast to RNAseq you do not have a random shifting in you sequence (hope this is understandable). Removing duplicates will leave you most likely with a very low and very similar count for all the miRNAs no matter how high/different they were expressed. You can use adaptors containing random nucleotides and then use these 8Ns in combination with the sRNA sequence to assess the duplication rate.

3) we use good old bowtie and it works perfectly fine for us (if there are different opinions on that one, any input is appreciated)


4) I guess the answer here largely depends on your question.

hope that helps as a start.
nanos is offline   Reply With Quote
Old 01-25-2017, 04:07 AM   #3
lre1234
Senior Member
 
Location: US

Join Date: Aug 2011
Posts: 105
Default

Hi,
I do a lot of short RNA-seq and here are some thoughts (but there are other ways of doing things that work well):

1. Agreed that adapter trimming is a must or most of your reads will not map. We use cutadapt which works really nice.
2. No duplicate read removing is needed nor should be done. You'll loose lots of things.

3. bowtie works well, I have also used BWA which also seemed to work well but usually default to bowtie. As far as I understand, STAR wouldn't work for short RNAs as it was designed for long RNA and specifically paired-end (but don't quote me here, I may be wrong). STAR is our goto aligner for long RNA.

4. As far as aliging. In my opinion, you should always align to the whole genome (GRCh37 or 38, which ever you choose). Afterwords intersect with miRBase or some other database of interest. Also, keep in mind, that the vast majority of miRNAs are 'unique' sequences in genome and should align uniquely. But there are cases, in which some miRNAs have duplicate sequences in the genome (e.g. miR-92a-3p, or miR-1302 which the same sequence is in 11 places in the genome). Also by mapping to the whole genome, you could do additional things like novel miR discovery. Some people do use miRBase sequences and align to those instead of the whole genome, but I personally think that is a bad idea, and will give a false-sense of what you are looking at. Essentially, you would be 'forcing' many reads to align to those regions, when in fact they would align better to other places in the genome, especially when you allow a mismatch in there.

Have fun with it. miRNAs do lots of interesting things and have many useful roles!
lre1234 is offline   Reply With Quote
Old 11-21-2017, 02:44 AM   #4
manwar
Junior Member
 
Location: UK

Join Date: Nov 2017
Posts: 1
Default Which GTFs to use for annotation of sRNA?

Hello everyone,

Following on from ErikFas's query about using the normal human reference genome for sRNA-seq analysis, I wanted to ask if a regular gtf/gff from Ensembl or UCSC can be used for annotation purposes of sRNA or are there specific gtfs?

Thanks a lot!
manwar is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:33 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO