SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
miRNA seq analysis - large numbers of non-aligning reads lbeltrame Bioinformatics 10 05-03-2013 08:30 AM
GATK to discover Single Nucleotide Variation in mature miRNA from miRNA-Seq Bioinfo83 Bioinformatics 0 01-31-2012 04:11 AM
miRNA-Seq with samples that have different % miRNA to Total RNA... DrDTonge Bioinformatics 0 01-12-2012 11:20 PM
Has anyone tried RUM for aligning/counting Illumina RNA-Seq data? fabrice RNA Sequencing 4 12-06-2011 07:50 AM
Isotigs counting nicedad General 0 10-31-2011 04:44 AM

Reply
 
Thread Tools
Old 01-15-2009, 05:16 AM   #1
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default miRNA aligning/counting

Hello all. Recently our core lab ran the Illumina small RNA protocol on a sample (human RNA) to collect miRNA sequence. The primary purpose is to analyze differential expression of the known miRNA species in the various samples run. It is now up to me to do the bioinformatics part. I know that I can get a set of known miRNA sequences from miRBASE. I also know that I will have to either mask or trim the adapter portion of the read prior to aligning. Can anyone who has done anything similar offer some advice? What tools do you think are best for the job? Should I be trying to align to just the miRNA sequences or the whole genome?

Thanks in advance.
kmcarr is offline   Reply With Quote
Old 01-15-2009, 07:05 PM   #2
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

Hi kmcarr,

I have a similar project on the go with A. thaliana where I initially aligned all my reads to the whole genome and then intersect those results with the known location of mirBASE mature and precursor positions. I found it to be easier this way because at a later stage I can look for potentially novel miRNA

I used novoalign (www.novocraft.com) to simultaneously align and strip off the 3' adaptor sequence. Parameters are

novoalign -d genome -f <reads in fastq|prb formt> -s<adaptor sequence> > output

SOAP2 and MAQ may also be used to for this purpose but I found that novoalign offered favourable performance and sensitivity. Bowtie may also do a good job but I havent tried this tool out for this work.

Once I got the alignments I sort up the read alignments by genome sequence and ascending position. I then cross reference these positions by the location of precursor microRNA with a perl script. At this stage I got counts for each mirBASE miRNA from my short reads and I can convert these to reads/million counts.

Contact me privately if you would like more info.

It would be nice if other people doing similar work could share their protocols for this type of bioinformatics analysis. We could all learn something new.

Last edited by zee; 01-15-2009 at 07:07 PM.
zee is offline   Reply With Quote
Old 01-22-2009, 07:20 AM   #3
myrna
Member
 
Location: Vancouver, Canada

Join Date: Feb 2008
Posts: 44
Default miRNA alignment

I use a very similar approach, but I first collapse identical reads before aligning (to avoid aligning the same let-7 and other abundant miRNA reads hundreds of thousands of times. You can then count the number of reads in the original file to generate counts. The only problem with this is that you lose the sequence quality information (if you have a need for that).

Ryan
myrna is offline   Reply With Quote
Old 01-29-2009, 05:18 AM   #4
chris
Member
 
Location: Dundee, Scotland

Join Date: Apr 2008
Posts: 52
Default

I agree. Collapsing the reads to unique examples is a very useful step as miRNA solexa runs are very over-sampled. e.g. 3M reads can often only represent 200k unique reads.

I tend to remove adaptor tags and quality filter reads before matching to miRBase. This also reduces the search space significantly.
chris is offline   Reply With Quote
Old 01-29-2009, 05:24 AM   #5
zee
NGS specialist
 
Location: Malaysia

Join Date: Apr 2008
Posts: 249
Default

U guys are correct, I forgot to add that after my first analysis I started to do read collapsing.
When I did my mirBase counts, i have an option to factor in the frequency of that tag.
I recently had a look at software for this purpose of counting tags overlapping miRNA. I found ERANGE and still trying to make it work on my genome of interest.
Anybody care to share what they're using? I have a very crude pipeline in perl that will automate the counting and graph miRNA matches.
zee is offline   Reply With Quote
Old 01-30-2009, 12:39 AM   #6
chris
Member
 
Location: Dundee, Scotland

Join Date: Apr 2008
Posts: 52
Default

I have my own perl scripts for handling the raw data and managing searches of the reads against mirBase, etc.

Then I load the data into MySQL for analysis. It allows the easy tracking of the 'abundance' of each read following collapsing of the data.
chris is offline   Reply With Quote
Old 04-16-2009, 12:14 PM   #7
demis001
Member
 
Location: USA

Join Date: Apr 2009
Posts: 10
Default

I also use my own script to process the result. I usually predict miRNA first and then group as known or Novel at last step. Alignining to mirbase is trivial issue once you know got candidate miRNA.

DD
demis001 is offline   Reply With Quote
Old 07-14-2009, 02:30 PM   #8
andrea_maso
Junior Member
 
Location: Rome - ITALY

Join Date: Jul 2009
Posts: 3
Default

Hi all,
I have a question similar to the one posted by kmcarr. We should align miRNA sequences obtained by the Solexa/Illumina platform and we are not interested (now) to discover new miRNA species. Is there a precompiled or assembled short sequence comprising the sequences of all miRNA species (mature and hairpin) that one can use for alignment instead of using all the genome? I am thinking to something like
----seqMir-1....seqMir2....seqmir3.....-----
I think that the alignment algorithm should work faster.

Have some of you thought to such a solution? Should it work? How can I assemble such a sequence in an automatic way?

Thanks.
Andrea
andrea_maso is offline   Reply With Quote
Old 07-15-2009, 05:26 AM   #9
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

Andrea,

miRBase has what you are looking for:

http://microrna.sanger.ac.uk

Go to the Download tab and you will find fasta files with either the hairpin or mature sequences. They also provide GFF files with the genome coordinates of the miRNAs.

Happy mapping.
kmcarr is offline   Reply With Quote
Old 07-15-2009, 08:06 AM   #10
andrea_maso
Junior Member
 
Location: Rome - ITALY

Join Date: Jul 2009
Posts: 3
Default

Dear kmcarr,
yes I know that mirbase has the sequences and GFF coordinates but they are multifasta sequences format and not a single sequence file (I am thinking to Mapview that requires a unique fasta sequence...).
I will try to use Bowtie and SAM tool to align and view the sequences and I do not know which format they require.
Do you have an idea?

Thanks and bye for now.
Andrea
andrea_maso is offline   Reply With Quote
Old 07-21-2009, 12:45 AM   #11
David_H
Junior Member
 
Location: Italy

Join Date: Jul 2009
Posts: 3
Default

I've also used SOAP to get rid of adapters and map reads, but right now I need something to do a fuzzy identification and trimming of adapters on WINDOWS (for teaching purposes). I've finally found a mapper that works on windows (PASS) but it wont cut the adapters.

Any ideas gratefully received

David
David_H is offline   Reply With Quote
Old 07-21-2009, 01:33 AM   #12
chris
Member
 
Location: Dundee, Scotland

Join Date: Apr 2008
Posts: 52
Default

Andrea,

If you're using mirBase to search for miRNAs I'd recommend you use the hairpin.fasta file only as many search algorithms cope badly where the search sequence is shorter than the query as is often the case when searching against the mature sequences. You then need to parse the miRNA.dat file to determine whether your hairpin matches align to known mature regions.

All this is simple to do with the data in a database.
Cheers,

Chris
chris is offline   Reply With Quote
Old 07-21-2009, 01:36 AM   #13
chris
Member
 
Location: Dundee, Scotland

Join Date: Apr 2008
Posts: 52
Default

David,

Have you tried cygwin on Windows? The vast majority of code is available for Linux only, so it's probably best to try that avenue rather than look for things available for Windows as you may miss out the best applications.
Cheers,

Chris
chris is offline   Reply With Quote
Old 07-21-2009, 05:21 AM   #14
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,178
Default

David,

The EMBOSS package contains a program called fuzznuc which does what you want, fuzzy identification of nucleotide sequences (http://embossgui.sourceforge.net/dem...l/fuzznuc.html).

EMBOSS is a huge package and primarily supported for unix and unix like environments but there is a native Windows port (ftp://emboss.open-bio.org/pub/EMBOSS/windows/). I have never used the windows port but if it is anything like the unix versions it will require some commitment to get it installed and running properly.
kmcarr is offline   Reply With Quote
Old 07-30-2009, 12:29 PM   #15
sgombar
Junior Member
 
Location: New York, NY

Join Date: Jul 2009
Posts: 3
Default

Hello,

What are you guys doing for the actual statistical model once you know the abundance of each miRNA in each sample? Are you doing a pooled comparison like sage or are you taking a linear model approach like limma?

If taking the second one what off the shelf programs are you using?
sgombar is offline   Reply With Quote
Old 10-24-2009, 02:59 PM   #16
joseph
Member
 
Location: ca

Join Date: Feb 2008
Posts: 39
Default

Quote:
Originally Posted by andrea_maso View Post
Dear kmcarr,
yes I know that mirbase has the sequences and GFF coordinates but they are multifasta sequences format and not a single sequence file (I am thinking to Mapview that requires a unique fasta sequence...).
I will try to use Bowtie and SAM tool to align and view the sequences and I do not know which format they require.
Do you have an idea?

Thanks and bye for now.
Andrea
Andrea,
have you tried bowtie to map the reads to miRbase as you said? If you did, can you share your findings?
joseph is offline   Reply With Quote
Old 11-09-2009, 01:26 PM   #17
bioinfosm
Senior Member
 
Location: USA

Join Date: Jan 2008
Posts: 482
Default

Can people share experience using http://mirexpress.mbc.nctu.edu.tw/usage.php
Seems many of you have your own versions from scripts ..
__________________
--
bioinfosm
bioinfosm is offline   Reply With Quote
Old 11-25-2009, 12:55 AM   #18
kolja
Junior Member
 
Location: Germany

Join Date: Nov 2009
Posts: 1
Default

Hi,

is there a nice tool that allows me to collapse identical reads from Illumina Genome Analyzer or FastA files. I have sequencing data from different small RNAs, with 10-20 million reads. What I want to do is find identical reads and cluster them together as well as count the number of reads in each cluster. Is there a tool that can do that?
kolja is offline   Reply With Quote
Old 12-14-2009, 07:02 AM   #19
yjhua2110
Member
 
Location: china

Join Date: Nov 2009
Posts: 67
Default

several tools for deep sequencing-derived small RNA were provided in deepBase(http://deepbase.sysu.edu.cn/), which were developed to map, store, retrieve, annotate, integrate and visualize deep sequencing-derived small RNAs, and facilitate transcriptomic research and the discovery of novel ncRNAs.
yjhua2110 is offline   Reply With Quote
Old 08-16-2010, 06:10 AM   #20
dukevn
Member
 
Location: RI

Join Date: Apr 2009
Posts: 50
Default

Quote:
Originally Posted by myrna View Post
I use a very similar approach, but I first collapse identical reads before aligning (to avoid aligning the same let-7 and other abundant miRNA reads hundreds of thousands of times. You can then count the number of reads in the original file to generate counts. The only problem with this is that you lose the sequence quality information (if you have a need for that).

Ryan
Hi all,

I have similar question and I found this post from google. Could anybody explain what are identical reads (after trimming I supposed?) and how to collapse them?

Thanks,

D.
dukevn is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:28 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO