Go Back   SEQanswers > Bioinformatics > Bioinformatics

Similar Threads
Thread Thread Starter Forum Replies Last Post
Fungal refseq & data analysis nguyendofx Bioinformatics 7 02-29-2012 06:06 AM
ChIP-seq: time needed for data analysis? Mela General 4 10-04-2011 12:45 AM
454 data analysis & Mapping Abishai3911 Bioinformatics 3 07-03-2011 03:27 AM
illumina smallRNA adapter sequence for downstram analysis + miRNA analysis steps ndeshpan Bioinformatics 2 06-14-2011 10:44 PM
SmallRNA analysis pipeline naluru SOLiD 5 12-13-2010 12:59 AM

Thread Tools
Old 06-13-2012, 02:17 PM   #1
Location: Houston TX USA

Join Date: Jun 2012
Posts: 13
Smile Help needed for smallRNA deepseq data processing & analysis

Dear Bioinformatics community,

I am recently put on project on identifying plant microRNAs in animal tissues. Overall, I would like to sift through mouse and human small RNA deepseq raw data and see if there are any plant (dietary) microRNAs present in animal tissues (For rationale of science, you can read: Cell Research (2012) 22:107126. doi:10.1038/cr.2011.158; published online 20 September 2011).

I am a plant biology postdoc and had limited scripting skills. But I have taken a bioinformatics class recently and am pretty family with Linux command line. I also learnt long time ago some programming with C, so I can probably use publicly available python or perl scripts on linux.

But I need some help with the overall work flow design and selection of tools for each step processing and analysis of the small RNA data.
Our mouse and human sRNA deepseq data is generated through illumina HighSeq 2000, using TruSeq sRNA library kit (50 cycles, single end sequencing). Our current data is in the fastq format. I think there has not been any QC done on them.

What I would like to achieve is as follows:
1) I want to do a QC and filter low quality sequence tags.

2) With the high quality tags, I want to:
A) trim 3' adaptors
B) cleanup really small tags (e.g. less 8bp).
C) remove tags that were resulted from adator dimmers.

3) With the small RNA tags from step 2), I want to:
A) cluster the identical tags and make a count.
B) Cluster homologous tags? (not sure, should I do this?)
C) Do a length distribution analysis

4) With the unique non-redundant small RNA tags, I would like to map them into:
A) known animal tRNA, rRNA, snRNA, snoRNA database
B) known animal microRNAs database

5) With the unique un-mapped small RNA tags, I would like to map them into plant microRNA database to see if any of them are plant miRNAs.

Can you suggest the tools (publicly available, we are kind of poor and don't have access to commercial tools) for each step?
I have read through many threads in this forum and have some general idea such as tools like miRdeep2 or miRkey for mapping etc. But I would like to get a better opnion or guidance from your experienced guys.

I have a basic laptop with linux 10.0.4 installed. I had the impression that for the type of analysis I want to do, there is no need of server. Is that true?
Thanks a lot!
And Sorry for the long post.

yangjianhunt is offline   Reply With Quote
Old 06-13-2012, 03:29 PM   #2
Junior Member
Location: United Kingdom

Join Date: Oct 2011
Posts: 6

Dear Jian,

You could try the UEA sRNA toolkit:

You should be able to analyse most of your data with this.

rvaerle is offline   Reply With Quote
Old 06-14-2012, 07:58 AM   #3
Location: Houston TX USA

Join Date: Jun 2012
Posts: 13

HI Ronny,

Thanks a lot! Wow, I didn't know that tool.
The web-based version of that tool has a limit on the size of fastq file (200M). But I saw they have a downloadable version which should not have that limit. (?)
I will defenitely try it out.

yangjianhunt is offline   Reply With Quote
Old 06-29-2012, 10:26 AM   #4
Location: Houston TX USA

Join Date: Jun 2012
Posts: 13

I have more or less figured out how to do the tasks, thanks to all your help.
I'd like to share some of my experiences for people who might have similar problems.

I used the "cutadapt" tool to trim adaptors.
In my opinion, "cutadapt" is extremely flexible and easy to use, and is fast.

I used the fastx_tools to convert the fastq to fasta, and also to collapse the identical reads. The read counts for each unique seq is appended to the seqID. The seqID is the numerical rank of reads based on read count.
If you have many files to process, you can write shell script to automate the pipeline, so that you don't have enter those command line for each tool and each file.
yangjianhunt is offline   Reply With Quote
Old 08-21-2012, 06:59 AM   #5
Junior Member
Location: denmark

Join Date: Mar 2012
Posts: 3
Post shortRan


You could do all the steps you have written using shortRan, article should be out any day in bioinformatics or you can write me on and I can send you the package.
vikas0633 is offline   Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

All times are GMT -8. The time now is 01:49 PM.

Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO